PythonKrit
This article is my key take away points from PythonKrit workshop, at Samskrit Bharti Bangalore during March 2025.
XML to Mindmap generation : https://sambhasha.ksu.ac.in/CompLing/tarkasangraha/live/
We can have special tag like
<PAA-LAXANAM>
<PAA-UDAA>
<PAA-VAKYAM>
Other Tools
https://sambhasha.ksu.ac.in/projects/
Aksharamukha
https://github.com/chaitanya-lakkundi/aksharamukha
Vaijayantīkośa Knowledge-Net https://sambhasha.ksu.ac.in/CompLing/VK_ACL.pdf
A directory of Indic (Indian) language computing projects and resources https://indic.page/
https://sambhasha.ksu.ac.in/CompLing/chandas/chandas.html
https://www.gitasupersite.iitk.ac.in/conceptmaps Good resource for Neo4J graph DB
https://sanskritlibrary.org/downloads.html
https://sanskritlibrary.org/projects.html
https://sanskritlibrary.org/tools.html
Krudanta Rupa: https://github.com/chaitanya-lakkundi/kridanta-rupa-android/blob/master/kridanta_rupa_samgraha.pdf
Aadi Shankaracharya : https://www.sankara.iitk.ac.in/ and https://www.advaita-vedanta.org/texts/index.html
https://www.gitasupersite.iitk.ac.in/
GitHub
https://github.com/chaitanya-lakkundi/
https://github.com/drdhaval2785
Useful Sanskrit Alphabet https://github.com/chaitanya-lakkundi/varnamala/blob/main/varnamala.py
https://github.com/drdhaval2785/SanskritVerb/
https://github.com/drdhaval2785/SanskritSubanta
For Kids
https://bala.sambhasha.ksu.ac.in/
https://www.samskritpromotion.in/samskrit-toys
Scholars
https://sanskrit.uohyd.ac.in/faculty/amba/ and https://www.sanskritstudiespodcast.com/1759898/episodes/12324157-16-amba-kulkarni-sanskrit-and-computers
https://web.stanford.edu/~kiparsky/ and https://en.wikipedia.org/wiki/Paul_Kiparsky
Python
List Tuple Dictionary
Ordered? Yes Yes No
Mutable? Yes No Yes
Different Data Types? Yes Yes Yes
Can be indexed? Yes Yes Yes by keys
Syntax [] () {}
Duplicate elements? Yes Yes Yes, but key must be unique
- Both List and tuple supports: Slicing and skipping index
- Tuple is immutable, so faster
Python Notebook
https://realpython.com/jupyter-notebook-introduction/
https://openclassrooms.com/en/courses/2304731-learn-python-basics-for-data-analysis/7978803-take-your-first-steps-with-jupyter-notebook
https://www.shiksha.com/online-courses/how-to-use-google-colab-for-python-course-grlel861
https://www.geeksforgeeks.org/how-to-use-google-colab/
Turn Off
Let me share my faviorte list
K8s : http://turnoff.us/geek/the-depressed-developer-44/
Container : http://turnoff.us/geek/kernel-economics/
Python :
http://turnoff.us/geek/the-depressed-developer-35/
http://turnoff.us/geek/python-private-methods/
http://turnoff.us/geek/math-class-2018/
Manager : http://turnoff.us/geek/the-realist-manager/
Social Media http://turnoff.us/geek/the-depressed-developer-23/
AI :
http://turnoff.us/geek/python-robots/
http://turnoff.us/geek/chatbot/
http://turnoff.us/geek/sad-robot/
http://turnoff.us/geek/when-ai-meets-git/
Debug: http://turnoff.us/geek/the-last-resort/
USB : http://turnoff.us/geek/tobbys-world/
CI/CD : http://turnoff.us/geek/deployment-pipeline/
GW API : http://turnoff.us/geek/distributed-architecture-drama/
Computer Science concepts
Process v/s thread : http://turnoff.us/geek/dont-share-mutable-state/
Btree: http://turnoff.us/geek/binary-tree/
Zombie Process http://turnoff.us/geek/zombie-processes/
Idle CPU : http://turnoff.us/geek/idle/
Python : collections module
1. namedtuple
Useful to contruct objects
2. Counter
It works with string, list, and sentence. Sentence should be split with ' ' to convert into list of words.
suppose
c = Counter(list)
then
c.values() gives only count. So sum(c.values()) givestotal of all counts
c.most_common() sort based on frequency and return list of tuples
c.most_common()[0][0] gives the item with maximum occurance
c.most_common()[:-2:-1] gives the item with maximum occurance
c.most_common()[:-n-1:-1] gives the item with n least common elements
c.substract(d) Here is d is another Counter. The result will be frequency for each element will be substraced as per its frequency in d.
3. defaultdict(object)
It gives default empty dictionary.
4. OrderedDict
OrderedDict(sorted(d.items(), key=lambda t: t[0])) to sort with key
OrderedDict(sorted(d.items(), key=lambda t: t[1])) to sort with value
5. deque
To add : append(), appendleft()
To remove : pop() , popleft()
to count: count()
To insert as specific index i : insert(i, x)
6. ChainMap
This is to join 2 dict as a list with 2 elements as dict
Python Virtual Enviornment
The venv module provides support for creating lightweight “virtual environments”. It is optionally isolated from system site directories.
A virtual environment is a Python environment such that
- the Python interpreter,
- libraries and
- scripts
installed into it are isolated from (1) those installed in other virtual environments, and (2) (by default) any libraries installed in a “system” Python, i.e., one which is installed as part of your operating system.
A virtual environment is a directory tree which contains Python executable files and other files which indicate that it is a virtual environment. The path to this folder can be printed with variable
sys.prefix
sys.exec_prefix
These variables are used to locate site packages directory.
Each virtual environment has its own
- Python binary (which matches the version of the binary that was used to create this environment)
- independent set of installed Python packages in its site directories.
Packages (Modules)
Check installed packages using
pip list
Check specific package is installed or not using
pip show "package name"
Two types of packages
1. System packages (installed as part of Python installation)
2. Site packages (3rd party libraries)
Reinstall using
pip install --upgrade --no-deps --force-reinstall "package name"
Pip can export a list of all installed packages and their versions using the freeze command:
pip freeze
This can be used to create requirements.txt file, that is used later as :
pip install -r requirements.txt
Commands
To create venv
python3 -m venv /path/to/new/virtual/environment
c:\>c:\Python35\python -m venv c:\path\to\myenv
To activate venv
source
C:\>
It modifies PATH variables. Add the venv path at the beginning.
To deactivate venv
source
C:\>
It reset back PATH variables
To change Python version
pyenv local 2.x
pyenv local 3.x
pyenv global 2.x
pyenv global 3.x
Python Lambda
- Lambda is a special type of Python function that have limited capabilities.
- A function that will be used only once in your program. These functions are called anonymous or unbound functions.
Syntax
lambda arguments: expression
- Lambda functions can accept zero or more arguments but only one expression
- Here return statement is implicit.
- Lambda function can be declared and invoke together also
What kind of things can I, and can I not, put into a lambda?
Don't or Limitations
- If it doesn’t return a value,
- If it isn’t an expression and can’t be put into a lambda
- You can only use expressions, but not statements.
- lambda function with multiple lines/expressions/statements is not possible
- You cannot declare and use local variables.
- If you can imagine it in an assignment statement, on the right-hand side of the equals sign, it is an expression and can be put into a lambda.
- lambda can be member of list, dictionary
- You can only use one expression.
Use cases
1. map and reduce Python built-in function
2. filter Python built-in function
3. sort and sorted Python built-in function
4. Print formating
5. Use in WhLambda functions can accept zero or more arguments but only one expressionile statement.
6. if for loop is invoking a function, then it can replaced with map.
Multiple expressions
Use tuples, and implement your own evaluation order
If you are a coward and fear that evaluation order will change in a future release of python,
you can use eval
map(eval,("f1()","f2()",...,"fn()"))
You can also use apply
map(lambda x:apply(x,()),(f1,f2,f3))
General Notes
- One can pass functions as arguments to other functions
- A function can be the return value of another function.
- The expression returns (or evaluates to) a value, whereas a statement does not.
https://pythonconquerstheuniverse.wordpress.com/2011/08/29/lambda_tutorial/
http://p-nand-q.com/python/lambda.html
https://www.bogotobogo.com/python/python_functions_lambda.php
Python itertools module
itertools is a collection of “fast, memory efficient tools”. If you have not used "itertools" module, then most likely you might have written code, that was unnecessary. Its name is itertools as, it consists of functions made using iterator building blocks. All of the functions in the itertools module have yield keyword. Remember ? Every generator is an iterator but not every iterator is a generator.
Split
islice
to slice list like [start:stop] we can optionally pass step also. If we pass only one argument then it is value for stop and start = 0 by default. If you want to specify start, then you must specify stop. to indicate stop = 'till end', use None as stop value
combinatoric
accumulate
Here default is sum (for int) and concatenation (for string). Default function : "operator.sum". One can pass custom function also. E.g. operator.mul.
Example: add new item and repeat last two items. So each item will be repeated 3 times.
def f(a, b):
print(a, b)
return a[-2:] + b
def fn_accumulate():
print(list(itertools.accumulate('abcdefg', f)))
This can be written in better way, using second order recurrence relation, as mentioned in https://realpython.com/python-itertools/
permutations = npr
combinations = ncr
combinations_with_replacement = ncr + pair with self
For all above three functions, r can be passed optionally.
product to generate Cartesian product. It can be used to generate binary numbers.
map(list, itertools.product("01", repeat = 5))
Merge
chain
merging two lists, quicker. it can use to add anything as prefix in list. As suffix in list.
chain.from_iterable
Here we need to first put all iterables in a list and then pass the list (iterables )
zip_longest
similar to zip. Add 'None' for unpair items. We can also pass fillvalue to replace None with something else.
we can use zip to add index number to all.
for i in zip(count(1), ['a', 'b', 'c']):
print i
Infinite operator
count is an infinite iterator
To stop (1) use if condition OR (2) use islice
optionally step can be passed to count. This step can be fractions.Fraction too.
cycle is also infinite iterator. It will cycle through a series of values infinitely
To stop use if condition
repeat takes object as input, instead of itertable.
To stop pass optional argument 'times'
Filter
All functions in filter category takes predicate function as an argument.
compress
Here second argument is list of Booleans. Accordingly the first argument will be compressed, i.e. absent or present. It is like masking with boolean & (AND) operator. Here the Boolean list can be output of cycle to create a pattern. Note: these values in list are consider as False. 0, False, None, ''
For example
def fn_compress():
mask = [1, 0, 0]
res = itertools.compress(range(20), itertools.cycle(mask))
for i in res:
print(i)
dropwhile
instead of list of Booleans, we can pass predicate as first argument here. Here if predicate return true, then drop. if predict return false then do not drop + disable predict.
takewhile
it is opposite of dropwhile. Here if predicate return true then take. if predict return false then do not take + siable predict.
filterfalse
Same as dropwhile. Here the predict will not be disable.
It will return all values where predicate function returns false. It takes predicate function and itertable as input. This is opposite to built-in-function filter.
We can use built-in function filter, to list out all the factors of given integer and find out prime or not also.
Grouping
groupby
if sort tuple, then it will be sorted as per first element. Then if use groupby for this sorted tuple then it will return itertato of "key , iterator" combinations. Here key is first memeber of tuple, which repeat in several tuple. and iterator is list of all tuples, where that perticular key is used.
This can be used to remove duplicate character in string
>>> foo = "SSYYNNOOPPSSIISS"
>>> import itertools
>>> ''.join(ch for ch, _ in itertools.groupby(foo))
'SYNOPSIS'
Note:
groupby will not do sorting. So if we call without calling sorted then it will group by only locally. As such no use of groupby without calling sorted. in sorted we can pass key.
sorted function by default sort on first value of tuple. This can be change by passing
1. key = lambda x: x[1]
2. key = operator.itemgetter(1)
Miscellaneous
starmap
It is similar to map. Here, we can pss list, or, list of tuple. We can use to convert co-ordinate systems, Cartesian to Polar and vica-versa. with map function, we can pass two list. Those two list should be zip and pass as list of tuple in starmap. for map if we want to pass constant in stead of one of the iterators, we can use repeat from itertools. The map can also use to create list of objects, where constructor is passed as function.
tee
it will generate multiple independant iterators. It is like copy function. It can also create multiple and identical tuples (iterators) from a string. Default = 2 copies. When you call tee() to create n independent iterators, each iterator is essentially working with its own FIFO queue.
Possible software using itertools
1. Veda Chanting pattern. GHAN - PAATH etc.
2. Convolution coding.
3. Turbo coding
4. Gray code generator.
5. Fibonacci series.
Reference
Blog
https://medium.com/discovering-data-science-a-chronicle/itertools-to-the-rescue-427abdecc412
https://www.blog.pythonlibrary.org/2016/04/20/python-201-an-intro-to-itertools/
https://pymotw.com/3/itertools/index.html
https://realpython.com/python-itertools/
Github : similar tools
https://github.com/erikrose/more-itertools
https://github.com/topics/itertools
Jupiter Notebook
https://github.com/ericchan24/itertools/blob/master/itertools.ipynb
https://github.com/dineshsonachalam/Python_Practice/blob/master/6.Itertools.ipynb
Official Documentation
https://docs.python.org/3/library/itertools.html
Python Documentation
1.
https://en.wikipedia.org/wiki/Sphinx_(documentation_generator)
http://www.sphinx-doc.org/en/master/
2.
https://en.wikipedia.org/wiki/Epydoc
http://epydoc.sourceforge.net/
3.
https://en.wikipedia.org/wiki/HeaderDoc
4.
https://en.wikipedia.org/wiki/HeaderDoc
http://epydoc.sourceforge.net/stdlib/pydoc-module.html
Interesting Python Modules
Orange and Plot.ly are for data visualization
VPython is the Python programming language plus a 3D graphics module called Visual
GUI
Appjar for cross platform GUI
Kivy for Mobile app and desktop app
wxPython for GUI widgets
simplegui to develop GUI application
HTML/XML
Beautiful SOAP for parsing
Network
Construct to generate packets. construction and deconstruction of data structure
Twisted is an event-driven network programming framework
lcapy to teach electronics
PySerial: Gives the ability to use serial communication
ipadress: https://docs.python.org/3/library/ipaddress.html
Multimedia
PIL Python Image Library for image manipulation
scikit-image : Image processing library.
SndObj (Sound Object) for music and audio.
Panda3D is a game engine that includes graphics, audio, I/O, collision detection, and other abilities relevant to the creation of 3D games
Python-Ogre, Soya3D are also for 3D engine
Python plugin can be written for Rhythmbox. https://wiki.gnome.org/Apps/Rhythmbox/Plugins/Python%20Plugin%20Examples
PyGame and Pyglet for gaming
Template
Cheetah is template engine
Genshi is a template engine for XML-based vocabularies written in Python.
Kid is simple template engine.
Topsite Templating System allows simple melding of Python code with static content
Jinja is web template engine
Machine Learning and Data Analysis
Panda for data manipulation and analysis
PyTorch ML Library based on Torch
SageMath Computer algebra system
Scikit-learn ML library
statsmodel: Explore data and statistics
Theano for matrix
SymPy for symbolic computation
The Computational Geometry Algorithms Library (CGAL) for computational geometry algorithms.
Misc
pyflakes check the Python source code files
faulthandler for debugging
Deep Learning
Chainer is an open source deep learning framework written purely in Python on top of Numpy and CuPy
Keras neural-network library
Gensim is for unsupervised topic modeling and natural language processing,
NLTK for NLP
SpaCy for advanced NLP
PyBrain: Helps to build artificial intelligence
HTML
requests and lxml
import requests
from lxml import html
r = requests.get(url, headers={
'User-Agent':
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36'
})
tree = html.fromstring(r.text)
One can use Google MAP API by passing JSON object
params = {'sensor': 'false', 'address': 'Bangalore'}
url = 'https://maps.googleapis.com/maps/api/geocode/json'
r = requests.get(url, params=params)
Business
Cubes is a light-weight multidimensional modelling and OLAP toolkit for development reporting applications and browsing of aggregated data
QAL for transforming data interfacing MongoDB
RPyC for RPC
SQLAlchemy
SQLObject and Storm: Python Object to relational mapper
Reference
https://en.wikipedia.org/wiki/Category:Python_scientific_libraries
https://en.wikipedia.org/wiki/Category:Python_libraries
Python Cross Compilers
- Jython : Python to Java ByteCode, that runs on JVM
- IronPython : Python to .NET, that runs on CLR
- RPython to build PyPy interpreter.
- - Python to Java Byte Code
- - Python to .NET CLR
- - Python to C
- Pyjs: Python to JavaScript
- Cython, Pyrex: Python to C
- Cython, Pythran, Shed Skin: Python to C+
- Google's Grumpy: Python to Go
- MyHDL: Python to VHDL
- Stakeless Python: CPython for coroutines
- MicroPython: Python for Microcontroller
- Unladen Swallow: Performance oriented CPython
Kafka : Communication among Micro Services
1. RPC based and
2. Enterprise Service Bus which has root from SOA.
Implementation
RPC based is implemented using
1. REST API
It needs:
-load balancer
-service discovery
2. gRPC
Enterprise Service Bus is implemented using
1. Messaging Queue
1.1 RabbitMQ
1.2 ActiveMQ
1.3 ZeroMQ
2. Kafka
1. Synchronous protocol using RESTful API over HTTP
2. Asynchronous protocol AMQP. smart endpoints and dumb pipe pattern.
Kafka is distributed stream processing platform with high resilience and fault tolerance. Kafka replaces Java Message Service (JMS), Advanced Message Queuing Protocol (AMQP), etc. Kafka supports both Pub-sub and queuing feature
Streaming platform capabilities
1. Publish and subscribe
2. Store stream of record in fault-tolerant way
3. Process stream of records
Kafka Features
- low latency,
- auto-scaling,
- centralized management,
- proven high availability
- unified platform,
- high-throughput,
for handling real-time data feeds
Apache Kafka Architecture
Kafka is combination of messaging, storage, and stream processing
Messages are written to a log-style stream called a topic. It is like a list of chronological events broken up into multiple streams, known as “topics”.
Two types of topics
1. Regular topic
- time or space bound
- default 7 days.
2. compacted
- never expire
- can be updated
- to delete write : tombstone message with null-value for a specific key.
Topic has records
Record has key, value and timestamp
Kafka topics are divided into various partitions. Partitions enable parallelization of topics. As many consumer that many partitions. Partitions are distributed and replicated across multiple brokers (servers). The flag "replication-factor" determines how many copies of the topic partition has to be made. This is how fault tolerance is achieved.
Broker has partitions, that can be leader or replica for given topic.
JSON, AVRO, or Protobufs as their serialization format. For effective use of n/w bandwidth, Kafka supports GZIP, Snappy, LZ4 and ZStandard compression protocol
ACL for read and write topics
Kafka APIs types
1. consumer
2. producer
3. connector : for import and export
4. Streams : To develop, stateful, scalable stream processing app. Kafka offers the Streams API that allows writing Java applications that consume data from Kafka and write results back to Kafka. Apache Kafka also works with external stream processing systems such as
- Apache Apex,
- Apache Flink,
- Apache Spark, and
- Apache Storm,
- Samza,
- Spout,
- SpartStreaming,
- IBM Streams,
- Spring Cloud Stream.
5. Admin Client API
Two types of Stream APIs
4.1 DSL API. Stream processing DSL (Domain specific language) offers filter, map, grouping, windowing, aggregation, joins, and the notion of tables
4.2 Processor API.
Controller
Controller is also broker with more responsibilities of partition management, that includes
* Leader selection
* Leader switch
* New topic and partition
* New broker
Kafka Monitoring tools
1. Burrow
2. Datadog
Advantage with micro-services
* Messages are ordered chronologically and delivery is guaranteed
* Strong durability, resilience and performance
Steps
1. Add Kafka producer code to existing service in monolith
2. Develop new service with Kafka consume code and store everything in DB
3. Test. How far the new consumer service is time lagging.
4. Enhance newly added service with some relevant code from existing monolith. It should fetch data from DB. Temporarily disable some external calls.
5. Now filter events in producer at monolith. Send only actionable events to consumer new service, via Kafka.
6. Enhance the new service further. Send events back to monolith via Kafka.
7. Test
8. Remove the code from monolith that was added to new service.
9. Repeat.
Dependency
1. Gradel build tool
2. Java
Python and Kafka
Three alternatives
1. kafka-python : https://github.com/dpkp/kafka-python
from time import sleep from json import dumps, loads from kafka import KafkaProducer, KafkaConsumer
producer = KafkaProducer(bootstrap_servers=['localhost:9092'], value_serializer=lambda x: dumps(x).encode('utf-8'))
for e in range(1000): data = {'number' : e} producer.send('numtest', value=data) sleep(5)
consumer = KafkaConsumer( 'numtest', bootstrap_servers=['localhost:9092'], auto_offset_reset='earliest', enable_auto_commit=True, group_id='my-group', value_deserializer=lambda x: loads(x.decode('utf-8'))
for message in consumer: message = message.value collection.insert_one(message) print('{} added to {}'.format(message, collection))
2. pykafka https://github.com/Parsely/pykafka
3. confluent-kafka-python https://github.com/confluentinc/confluent-kafka-python
Reference
Kafka eco system : https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
Kafka Documentation
http://kafka.apache.org/documentation/
Python and Kafka
https://towardsdatascience.com/kafka-python-explained-in-10-lines-of-code-800e3e07dad1
https://blog.heroku.com/monolithic-applications-into-services
strimzi Kafka on Minikube https://strimzi.io/quickstarts/
PyCharm : My Notes
Distilled Python
He programms in Python since 1998. His speaking record includes PyCon US, OSCON, OSCON-EU, PythonBrasil, RuPy and an ACM Webinar. All the slidedecks are available at :
https://speakerdeck.com/ramalho
Generators and Iteration
Generator allows lazy data processing. Here data is loaded to main memory, as and when needed. In Haskell programming language, almost everything is lazy data processing. At opposite, numpy Python module is about fast data processing. Here all data is loaded in memory for vector arithmetic. Intel introduced new instructions MMX (Multimedia eXtension) in mid 1990s.
import sys
for arg in sys.argv:
print arg
- str : unicode char
- bytes: int 0 to 255
- tuple : individual fields
- dict: keys
- set: elements
- io.TextIOWrapper: Unicode lines
- models.query.QuerySet : DB rows
- numpy.ndarray : Multidimensional array , elements, rows
This can be achieved with dictionary also
- all : boolean
- any : boolean
- max
- min
- sum
- enumerate : returns first is number which increments and second is as per the input
- filter: Python2 returns list. in Python3 one can go over data that does not fit in memory, using "filter" generator.
- map
- reversed
- zip: consumes iterables and generate tuple. if one iterable is shorter than zip will stop at shortest without any exception. In Python2 zip generator returns tuples that can be passed to list() constructor. it can be passed to dict() also.
vals = [expression
for value in collection
if condition]
without list comprehension
vals = []
for value in collection:
if condition:
vals.append(expression)
genex are perfect for when you know you want to retrieve data from a sequence, but you don’t need to access all of it at the same time.
obj.__class__.__name__
Type code | C Type | Python Type | Minimum size in bytes |
---|---|---|---|
'c' | char | character | 1 |
'b' | signed char | int | 1 |
'B' | unsigned char | int | 1 |
'u' | Py_UNICODE | Unicode character | 2 (see note) |
'h' | signed short | int | 2 |
'H' | unsigned short | int | 2 |
'i' | signed int | int | 2 |
'I' | unsigned int | long | 2 |
'l' | signed long | int | 4 |
'L' | unsigned long | long | 4 |
'f' | float | float | 4 |
'd' | double | float | 8 |
- infinite generators
- generators that consume multiple iterables
- generators that filter or bundle items
- generators that rearrange items