Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

PythonKrit


This article is my key take away points from PythonKrit workshop, at Samskrit Bharti Bangalore during March 2025.

XML to Mindmap generation : https://sambhasha.ksu.ac.in/CompLing/tarkasangraha/live/

We can have special tag like

<PAA-LAXANAM>

<PAA-UDAA>

<PAA-VAKYAM>

Other Tools

https://sambhasha.ksu.ac.in/projects/

Aksharamukha

https://www.aksharamukha.com/

https://github.com/chaitanya-lakkundi/aksharamukha

Vaijayantīkośa Knowledge-Net https://sambhasha.ksu.ac.in/CompLing/VK_ACL.pdf

A directory of Indic (Indian) language computing projects and resources https://indic.page/

https://sambhasha.ksu.ac.in/CompLing/chandas/chandas.html 

https://www.gitasupersite.iitk.ac.in/conceptmaps Good resource for Neo4J graph DB

https://sanskritlibrary.org/downloads.html 

https://sanskritlibrary.org/projects.html 

https://sanskritlibrary.org/tools.html 

Text

Krudanta Rupa: https://github.com/chaitanya-lakkundi/kridanta-rupa-android/blob/master/kridanta_rupa_samgraha.pdf

Aadi Shankaracharya : https://www.sankara.iitk.ac.in/ and https://www.advaita-vedanta.org/texts/index.html

https://www.gitasupersite.iitk.ac.in/

GitHub

https://github.com/chaitanya-lakkundi/

https://github.com/drdhaval2785

Useful Sanskrit Alphabet https://github.com/chaitanya-lakkundi/varnamala/blob/main/varnamala.py

https://github.com/drdhaval2785/SanskritVerb/

https://github.com/drdhaval2785/SanskritSubanta

For Kids

https://bala.sambhasha.ksu.ac.in/

https://www.samskritpromotion.in/samskrit-toys

Scholars

https://sanskrit.uohyd.ac.in/faculty/amba/ and https://www.sanskritstudiespodcast.com/1759898/episodes/12324157-16-amba-kulkarni-sanskrit-and-computers

https://web.stanford.edu/~kiparsky/ and https://en.wikipedia.org/wiki/Paul_Kiparsky

https://en.wikipedia.org/wiki/George_Cardona


Python


                  List Tuple Dictionary

Ordered?         Yes Yes No

Mutable?         Yes No Yes

Different Data Types? Yes Yes Yes

Can be indexed?         Yes Yes Yes by keys

Syntax                 [] () {}

Duplicate elements? Yes Yes Yes, but key must be unique


  • Both List and tuple supports: Slicing and skipping index
  • Tuple is immutable, so faster



Python Notebook


https://realpython.com/jupyter-notebook-introduction/

https://openclassrooms.com/en/courses/2304731-learn-python-basics-for-data-analysis/7978803-take-your-first-steps-with-jupyter-notebook



https://www.shiksha.com/online-courses/how-to-use-google-colab-for-python-course-grlel861

https://www.geeksforgeeks.org/how-to-use-google-colab/


Turn Off


Today I came across an interesting website about all comics related to IT, computer, software etc. 

Let me share my faviorte list

K8s : http://turnoff.us/geek/the-depressed-developer-44/
Container : http://turnoff.us/geek/kernel-economics/

Python : 
http://turnoff.us/geek/the-depressed-developer-35/
http://turnoff.us/geek/python-private-methods/
http://turnoff.us/geek/math-class-2018/

Manager : http://turnoff.us/geek/the-realist-manager/
Social Media http://turnoff.us/geek/the-depressed-developer-23/
AI : 
http://turnoff.us/geek/python-robots/
http://turnoff.us/geek/chatbot/
http://turnoff.us/geek/sad-robot/
http://turnoff.us/geek/when-ai-meets-git/

Debug: http://turnoff.us/geek/the-last-resort/
USB : http://turnoff.us/geek/tobbys-world/
CI/CD : http://turnoff.us/geek/deployment-pipeline/
GW API : http://turnoff.us/geek/distributed-architecture-drama/

Computer Science concepts
Process v/s thread : http://turnoff.us/geek/dont-share-mutable-state/
Btree: http://turnoff.us/geek/binary-tree/
Zombie Process http://turnoff.us/geek/zombie-processes/
Idle CPU : http://turnoff.us/geek/idle/

Python : collections module


Python has general purpose data types like dict, list, set, tuple. Collections module has following additional useful datatype

1. namedtuple

Useful to contruct objects

2. Counter

It works with string, list, and sentence. Sentence should be split with ' ' to convert into list of words. 

suppose

c = Counter(list)

then 
c.values() gives only count. So sum(c.values()) givestotal of all counts 
c.most_common() sort based on frequency and return list of tuples
c.most_common()[0][0] gives the item with maximum occurance
c.most_common()[:-2:-1] gives the item with maximum occurance
c.most_common()[:-n-1:-1]  gives the item with n least common elements 
c.substract(d) Here is d is another Counter. The result will be frequency for each element will be substraced as per its frequency in d. 

3. defaultdict(object)
It gives default empty dictionary. 

4. OrderedDict

OrderedDict(sorted(d.items(), key=lambda t: t[0])) to sort with key
OrderedDict(sorted(d.items(), key=lambda t: t[1])) to sort with value

5. deque

To add : append(), appendleft()
To remove : pop() , popleft()
to count: count()
To insert as specific index i : insert(i, x)


6. ChainMap
This is to join 2 dict as a list with 2 elements as dict

Python Virtual Enviornment


Introduction

The venv module provides support for creating lightweight “virtual environments”. It is optionally isolated from system site directories.

A virtual environment is a Python environment such that 
- the Python interpreter, 
- libraries and 
- scripts 
installed into it are isolated from (1) those installed in other virtual environments, and (2) (by default) any libraries installed in a “system” Python, i.e., one which is installed as part of your operating system.

A virtual environment is a directory tree which contains Python executable files and other files which indicate that it is a virtual environment. The path to this folder can be printed with variable
sys.prefix
sys.exec_prefix
These variables are used to locate site packages directory. 

Each virtual environment has its own
- Python binary (which matches the version of the binary that was used to create this environment) 
- independent set of installed Python packages in its site directories.

Packages (Modules) 

Check installed packages using
pip list

Check specific package is installed or not using
pip show "package name"

Two types of packages
1. System packages (installed as part of Python installation)
2. Site packages (3rd party libraries)

Reinstall using
pip install --upgrade --no-deps --force-reinstall "package name"

Pip can export a list of all installed packages and their versions using the freeze command:
pip freeze
This can be used to create requirements.txt file, that is used later as : 
pip install -r requirements.txt

Commands

To create venv
python3 -m venv /path/to/new/virtual/environment
c:\>c:\Python35\python -m venv c:\path\to\myenv

To activate venv
source /bin/activate
C:\> \Scripts\activate.bat
It modifies PATH variables. Add the venv path at the beginning. 

To deactivate venv
source /bin/deactivate
C:\> \Scripts\deactivate.bat
It reset back PATH variables

To change Python version
pyenv local 2.x
pyenv local 3.x
pyenv global 2.x
pyenv global 3.x

Python Lambda


Introduction


  • Lambda is a special type of Python function that have limited capabilities.
  • A function that will be used only once in your program. These functions are called anonymous or unbound functions.

Syntax
lambda arguments: expression


  • Lambda functions can accept zero or more arguments but only one expression
  • Here return statement is implicit.
  • Lambda function can be declared and invoke together also
(lambda x, y: x + y)(5, 3)

What kind of things can I, and can I not, put into a lambda? 

Don't or Limitations

  • If it doesn’t return a value, 
  • If it isn’t an expression and can’t be put into a lambda
  • You can only use expressions, but not statements.
  • lambda function with multiple lines/expressions/statements is not possible
  • You cannot declare and use local variables.
Do
  • If you can imagine it in an assignment statement, on the right-hand side of the equals sign, it is an expression and can be put into a lambda.
  • lambda can be member of list, dictionary 
  • You can only use one expression.

Use cases

1. map and reduce Python built-in function
2. filter Python built-in function
3. sort and sorted Python built-in function
4. Print formating
5. Use in WhLambda functions can accept zero or more arguments but only one expressionile statement. 
6. if for loop is invoking a function, then it can replaced with map. 


Multiple expressions

Use tuples, and implement your own evaluation order
If you are a coward and fear that evaluation order will change in a future release of python, 

you can use eval
map(eval,("f1()","f2()",...,"fn()"))

You can also use apply
map(lambda x:apply(x,()),(f1,f2,f3))

General Notes
  • One can pass functions as arguments to other functions
  • A function can be the return value of another function.
  • The expression returns (or evaluates to) a value, whereas a statement does not. 
Reference 

https://pythonconquerstheuniverse.wordpress.com/2011/08/29/lambda_tutorial/
http://p-nand-q.com/python/lambda.html
https://www.bogotobogo.com/python/python_functions_lambda.php

Python itertools module


Introduction

itertools is a collection of “fast, memory efficient tools”. If you have not used "itertools" module, then most likely you might have written code, that was unnecessary. Its name is itertools as, it consists of functions made using iterator building blocks. All of the functions in the itertools module have yield keyword. Remember ? Every generator is an iterator but not every iterator is a generator.


Split

islice
to slice list like [start:stop] we can optionally pass step also. If we pass only one argument then it is value for stop and start = 0 by default. If you want to specify start, then you must specify stop. to indicate stop = 'till end', use None as stop value  

combinatoric 

accumulate
Here default is sum (for int) and concatenation (for string). Default function : "operator.sum". One can pass custom function also. E.g. operator.mul. 
Example: add new item and repeat last two items. So each item will be repeated 3 times. 

def f(a, b):
    print(a, b)
    return a[-2:] + b

def fn_accumulate():
    print(list(itertools.accumulate('abcdefg', f)))

This can be written in better way, using second order recurrence relation, as mentioned in https://realpython.com/python-itertools/

permutations = npr
combinations = ncr
combinations_with_replacement = ncr + pair with self

For all above three functions, r can be passed optionally. 
product to generate Cartesian product. It can be used to generate binary numbers. 
map(list, itertools.product("01", repeat = 5))

Merge

chain
merging two lists, quicker. it can use to add anything as prefix in list. As suffix in list. 

chain.from_iterable
Here we need to first put all iterables in a list and then pass the list (iterables )

zip_longest
similar to zip. Add 'None' for unpair items. We can also pass fillvalue to replace None with something else. 

we can use zip to add index number to all. 

for i in zip(count(1), ['a', 'b', 'c']):
    print i

Infinite operator

count is an infinite iterator 
To stop (1) use if condition OR (2) use islice
optionally step can be passed to count. This step can be fractions.Fraction too. 

cycle is also infinite iterator. It will cycle through a series of values infinitely
To stop use if condition

repeat takes object as input, instead of itertable. 
To stop pass optional argument 'times'

Filter

All functions in filter category takes predicate function as an argument. 

compress 
Here second argument is list of Booleans. Accordingly the first argument will be compressed, i.e. absent or present. It is like masking with boolean & (AND) operator.  Here the Boolean list can be output of cycle to create a pattern. Note: these values in list are consider as False. 0, False, None, ''
For example

def fn_compress():
    mask = [1, 0, 0]
    res = itertools.compress(range(20), itertools.cycle(mask))
    for i in res:
        print(i)


dropwhile
instead of list of Booleans, we can pass predicate as first argument here. Here if predicate return true, then drop. if predict return false then do not drop + disable predict. 

takewhile
it is opposite of dropwhile. Here if predicate return true then take. if predict return false then do not take + siable predict. 

filterfalse 
Same as dropwhile. Here the predict will not be disable. 
It will return all values where predicate function returns false. It takes predicate function and itertable as input. This is opposite to built-in-function filter.

We can use built-in function filter, to list out all the factors of given integer and find out prime or not also. 

Grouping

groupby
if sort tuple, then it will be sorted as per first element. Then if use groupby for this sorted tuple then it will return itertato of "key , iterator" combinations. Here key is first memeber of tuple, which repeat  in several tuple. and iterator is list of all tuples, where that perticular key is used. 

This can be used to remove duplicate character in string 
>>> foo = "SSYYNNOOPPSSIISS"
>>> import itertools
>>> ''.join(ch for ch, _ in itertools.groupby(foo))
'SYNOPSIS'

Note: 
groupby will not do sorting. So if we call without calling sorted then it will group by only locally. As such no use of groupby without calling sorted. in sorted we can pass key.

sorted function by default sort on first value of tuple. This can be change by passing
1. key = lambda x: x[1]
2. key = operator.itemgetter(1)

Miscellaneous 

starmap 
It is similar to map. Here, we can pss list, or, list of tuple. We can use to convert co-ordinate systems, Cartesian to Polar and vica-versa. with map function, we can pass two list. Those two list should be zip and pass as list of tuple in starmap. for map if we want to pass constant in stead of one of the iterators, we can use repeat from itertools. The map can also use to create list of objects, where constructor is passed as function. 

tee
it will generate multiple independant iterators. It is like copy function. It can also create multiple and identical tuples (iterators) from a string. Default = 2 copies. When you call tee() to create n independent iterators, each iterator is essentially working with its own FIFO queue.

Possible software using itertools

1. Veda Chanting pattern. GHAN - PAATH etc.
2. Convolution coding.
3. Turbo coding
4. Gray code generator. 
5. Fibonacci series. 


Reference

Blog

https://medium.com/discovering-data-science-a-chronicle/itertools-to-the-rescue-427abdecc412
https://www.blog.pythonlibrary.org/2016/04/20/python-201-an-intro-to-itertools/
https://pymotw.com/3/itertools/index.html
https://realpython.com/python-itertools/

Github : similar tools

https://github.com/erikrose/more-itertools
https://github.com/topics/itertools

Jupiter Notebook

https://github.com/ericchan24/itertools/blob/master/itertools.ipynb
https://github.com/dineshsonachalam/Python_Practice/blob/master/6.Itertools.ipynb

Official Documentation

https://docs.python.org/3/library/itertools.html

Python Documentation


Here are some list of tools for Python, similar to Doxygen

1.
https://en.wikipedia.org/wiki/Sphinx_(documentation_generator)
http://www.sphinx-doc.org/en/master/

2.
https://en.wikipedia.org/wiki/Epydoc
http://epydoc.sourceforge.net/

3. 
https://en.wikipedia.org/wiki/HeaderDoc

4.
https://en.wikipedia.org/wiki/HeaderDoc
http://epydoc.sourceforge.net/stdlib/pydoc-module.html

Interesting Python Modules


Visualization
Orange and Plot.ly are for data visualization

VPython is the Python programming language plus a 3D graphics module called Visual


GUI
Appjar for cross platform GUI

Kivy for Mobile app and desktop app

wxPython for GUI widgets

simplegui to develop GUI application

HTML/XML
Beautiful SOAP for parsing

Network

Construct to generate packets. construction and deconstruction of data structure

Twisted is an event-driven network programming framework


lcapy to teach electronics

PySerial: Gives the ability to use serial communication

ipadress: https://docs.python.org/3/library/ipaddress.html  

Multimedia
PIL Python Image Library  for image manipulation

scikit-image : Image processing library. 

SndObj (Sound Object) for music and audio.  

Panda3D is a game engine that includes graphics, audio, I/O, collision detection, and other abilities relevant to the creation of 3D games


Python-Ogre, Soya3D are also for 3D engine

Python plugin can be written for Rhythmbox. https://wiki.gnome.org/Apps/Rhythmbox/Plugins/Python%20Plugin%20Examples


PyGame and Pyglet for gaming

Template

Cheetah is template engine

Genshi is a template engine for XML-based vocabularies written in Python.


Kid is simple template engine. 


Topsite Templating System allows simple melding of Python code with static content


Jinja is web template engine

Machine Learning and Data Analysis
Panda for data manipulation and analysis

PyTorch ML Library based on Torch

SageMath Computer algebra system

Scikit-learn ML library

statsmodel: Explore data and statistics 

Theano for matrix

SymPy for symbolic computation

The Computational Geometry Algorithms Library (CGAL) for computational geometry algorithms. 

Misc
pyflakes check the Python source code files
faulthandler for debugging

Deep Learning
Chainer is an open source deep learning framework written purely in Python on top of Numpy and CuPy

Keras neural-network library

Gensim is for unsupervised topic modeling and natural language processing, 

NLTK for NLP

SpaCy for advanced NLP


PyBrain: Helps to build artificial intelligence

HTML

requests and lxml

import requests
from lxml import html


    r = requests.get(url, headers={
        'User-Agent':
            'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 '
            '(KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36'

    })

tree = html.fromstring(r.text)

One can use Google MAP API by passing JSON object

params = {'sensor': 'false', 'address': 'Bangalore'}
url = 'https://maps.googleapis.com/maps/api/geocode/json'

r = requests.get(url, params=params)

Business
Cubes is a light-weight multidimensional modelling and OLAP toolkit for development reporting applications and browsing of aggregated data 

QAL for transforming data interfacing MongoDB

RPyC for RPC

SQLAlchemy

SQLObject and Storm: Python Object to relational mapper

Reference
https://en.wikipedia.org/wiki/Category:Python_scientific_libraries
https://en.wikipedia.org/wiki/Category:Python_libraries 

Python Cross Compilers


  • Jython : Python to Java ByteCode, that runs on JVM
  • IronPython : Python to .NET, that runs on CLR
  • RPython to build PyPy interpreter. 
  • - Python to Java Byte Code
  • - Python to .NET CLR
  • - Python to C
  • Pyjs: Python to JavaScript
  • Cython, Pyrex: Python to C
  • Cython, Pythran, Shed Skin: Python to C+
  • Google's Grumpy: Python to Go
  • MyHDL: Python to VHDL
  • Stakeless Python: CPython for coroutines
  • MicroPython: Python for Microcontroller
  • Unladen Swallow: Performance oriented CPython 

Kafka : Communication among Micro Services


Microservice Communications has two Approaches

1. RPC based and
2. Enterprise Service Bus which has root from SOA. 

Implementation

RPC based is implemented using 

1. REST API
It needs: 
-load balancer
-service discovery
2. gRPC

Enterprise Service Bus is implemented using

1. Messaging Queue
1.1 RabbitMQ
1.2 ActiveMQ
1.3 ZeroMQ
2. Kafka

1. Synchronous protocol using RESTful API over HTTP
2. Asynchronous protocol AMQP. smart endpoints and dumb pipe pattern. 

Kafka is distributed stream processing platform with high resilience and fault tolerance. Kafka replaces  Java Message Service (JMS), Advanced Message Queuing Protocol (AMQP), etc. Kafka supports both Pub-sub and queuing feature

Streaming platform capabilities

1. Publish and subscribe
2. Store stream of record in fault-tolerant way
3. Process stream of records

Kafka Features

-  low latency, 
-  auto-scaling, 
-  centralized management, 
-  proven high availability 
-  unified platform, 
-  high-throughput, 
for handling real-time data feeds

Apache Kafka Architecture

Kafka is combination of messaging, storage, and stream processing 

Messages are written to a log-style stream called a topic. It is like a list of chronological events broken up into multiple streams, known as “topics”.

Two types of topics
1. Regular topic
- time or space bound
- default 7 days.  
2. compacted
- never expire
- can be updated
- to delete write : tombstone message with null-value for a specific key.

Topic has records
Record has key, value and timestamp

Kafka topics are divided into various partitions. Partitions enable parallelization of topics. As many consumer that many partitions. Partitions are distributed and replicated across multiple brokers (servers). The flag "replication-factor" determines how many copies of the topic partition has to be made. This is how fault tolerance is achieved. 

Broker has partitions, that can be leader or replica for given topic. 

JSON, AVRO, or Protobufs as their serialization format. For effective use of n/w bandwidth, Kafka supports GZIP, Snappy, LZ4 and ZStandard compression protocol

ACL for read and write topics

Kafka APIs types

1. consumer
2. producer
3. connector : for import and export
4. Streams : To develop, stateful, scalable stream processing app. Kafka offers the Streams API that allows writing Java applications that consume data from Kafka and write results back to Kafka. Apache Kafka also works with external stream processing systems such as 
- Apache Apex, 
- Apache Flink, 
- Apache Spark, and 
- Apache Storm, 
- Samza, 
- Spout, 
- SpartStreaming, 
- IBM Streams, 
- Spring Cloud Stream. 
5. Admin Client API

Two types of Stream APIs
4.1 DSL API. Stream processing DSL (Domain specific language) offers filter, map, grouping, windowing, aggregation, joins, and the notion of tables
4.2 Processor API. 

Controller

Controller is also broker with more responsibilities of partition management, that includes

* Leader selection
* Leader switch
* New topic and partition
* New broker

Kafka Monitoring tools

1. Burrow
2. Datadog

Advantage with micro-services

* Messages are ordered chronologically and delivery is guaranteed
* Strong durability, resilience and performance

Steps

1. Add Kafka producer code to existing service in monolith
2. Develop new service with Kafka consume code and store everything in DB
3. Test. How far the new consumer service is time lagging. 
4. Enhance newly added service with some relevant code from existing monolith. It should fetch data from DB. Temporarily disable some external calls. 
5. Now filter events in producer at monolith. Send only actionable events to consumer new service, via Kafka.
6. Enhance the new service further. Send events back to monolith via Kafka. 
7. Test
8. Remove the code from monolith that was added to new service. 
9. Repeat. 

Dependency

1. Gradel build tool
2. Java

Python and Kafka

Three alternatives

1. kafka-python : https://github.com/dpkp/kafka-python
from time import sleep
from json import dumps, loads
from kafka import KafkaProducer, KafkaConsumer
producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
                         value_serializer=lambda x: 
                         dumps(x).encode('utf-8'))
for e in range(1000):
    data = {'number' : e}
    producer.send('numtest', value=data)
    sleep(5)
consumer = KafkaConsumer(
    'numtest',
     bootstrap_servers=['localhost:9092'],
     auto_offset_reset='earliest',
     enable_auto_commit=True,
     group_id='my-group',
     value_deserializer=lambda x: loads(x.decode('utf-8'))
for message in consumer:
    message = message.value
    collection.insert_one(message)
    print('{} added to {}'.format(message, collection))

2. pykafka https://github.com/Parsely/pykafka
3. confluent-kafka-python https://github.com/confluentinc/confluent-kafka-python

Reference 

Kafka eco system : https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem

Kafka Documentation
http://kafka.apache.org/documentation/

Python and Kafka
https://towardsdatascience.com/kafka-python-explained-in-10-lines-of-code-800e3e07dad1

https://blog.heroku.com/monolithic-applications-into-services

strimzi Kafka on Minikube https://strimzi.io/quickstarts/

PyCharm : My Notes


PyCharm is an IDE for Python development. Here are few key take away points:

$ pip freeze > requirements.txt
This will create a requirements.txt file, which contains a simple list of all the packages in the current environment, and their respective versions. Later, when a different developer (or you, if you need to re- create the environment) can install the same packages, with the same versions by running

$ pip install -r requirements.txt

================================================================================

Code Completion

1. Basic : Ctrl + Space
2. Smart : Ctrl + Shift + Space
3. Within context : Atl + /        Atl + Shift + /
4. Post Fix completion : Space OR Enter OR Tab

Ctrl + Shift + Enter to make all auto corrections

================================================================================

Pipenv is a tool that provides all necessary means to create a virtual environment for your Python project. It automatically manages project packages through the Pipfile file as you install or uninstall packages.

================================================================================

Two settings

1. Project level settings are applied to current project and stored at : 
C:\Users\\PycharmProjects\schoop_app\.idea

2. Global settings are like template, it applies to new projects. Files-> Settings for new projects

================================================================================

Keyboard short cut

Double Shift : search

Ctrl+N
Ctrl+Shift+N
Ctrl+Shift+Alt+N : Find and jump to the desired class, file, or symbol.

Ctrl+E : View Recent Files-

================================================================================

Distilled Python


Sometime back, I came across an excellent book on Python: "Fluent Python" (O'Reilly, 2015) by Luciano Ramalho. On Saturday, 28th April, I got opportunity to listen to him live at Geeknight workshop "Distilled Python: Features you must know to use it well" at Thoughtworks, Kormangala, Bangalore. Here is my note about the event, exclusive for readers of this blog: Express YourSelf !

He programms in Python since 1998. His speaking record includes PyCon US, OSCON, OSCON-EU, PythonBrasil, RuPy and an ACM Webinar. All the slidedecks are available at :
https://speakerdeck.com/ramalho



Generators and Iteration 

Generator allows lazy data processing. Here data is loaded to main memory, as and when needed. In Haskell programming language, almost everything is lazy data processing. At opposite, numpy Python module is about fast data processing. Here all data is loaded in memory for vector arithmetic. Intel introduced new instructions MMX (Multimedia eXtension) in mid 1990s. 

import sys
for arg in sys.argv:

print arg

Here is list of Python's built-in iterable objects and items they yield. 

  • str : unicode char
  • bytes: int 0 to 255
  • tuple : individual fields
  • dict: keys
  • set: elements
  • io.TextIOWrapper: Unicode lines
  • models.query.QuerySet : DB rows
  • numpy.ndarray : Multidimensional array , elements, rows 
Here are few use cases of Iterator in Python

Parallel Assignment is possible with iterable objects. 

It is also called tuple unpacking. However it is not specific to tuple. Here right side of = sign is iterable. 

pairs = [('a', 10), ('B', 20)]

for label, size in pairs: 
print(lable, '->', size)

Multiple values can be passed to function

This is also called star arguments. Here t is tuple. 

t = (3,4,5)
fun(*t)
def fun(a, b, c):

This can be achieved with dictionary also


d = {a:3,b:4,c:5}
fun(**d)

def fun(a, b, c):

Reduction functions:

We use "map-reduce" in Big Data. Python has support for "map-reduce" However such reduction functions server the purpose of "map-reduce" 
  • all : boolean 
  • any : boolean
  • max
  • min
  • sum
Reduction functions consume iterable and provide single result. 
Python has mapreduce. 

One can write more readable code, where multiple and conditions are replaced by all and multiple or conditions are replaced by any

Sorting

sort() function only for sorting list. sort() sorts list in place. 

sorted(): a built in function. it consumes any iterable. It has keyword argument for sorting key

Here one can pass even function as argument. Unlike other sorting library in C/C++, here the function is not for comparison. It is about to generate key. 

To write poems, one needed words who ends with same characters. Here is source code. 

sorted(L, key=lambda s:list(reversed(s)))

Now let's have a look in details about iterator . Python has built-in support for iterator design pattern. There are two types of objects (1) Iterable (2) Iterator. 

Like food is etable, a collections of objects are iterables. iterable has methods like  __iter__ 


The iterator has state. It has methods like  __next__

Please note, the next method is not part of iterable object, as it can be shared by multiple threads. 

StopIteration excpetion raised by next() method. 

in python "for loop" obtains iterator from iterable. Then it repeatedly invokes next() on iterator. 

Now something about generator. In Python, generator is like synonymous of iterator, and can be used interchangeably, but they both are different. In Python its syntax is same as normal function. The generator is also defined with "def" keyword like function. However only generator contains "yield" keyword somewhere in code. 

One should not invoke __iter__, __next__ method directly. Here, Python acts like framework. So developer will not invoke those methods, but let Python as a framework invokes them, as and when needed. The developer can create our own dender methods __next__, __iter__ etc. in object. next(g) is implemented in optimized way in C language. 

In generator the execution flow is frozen at "yield" keyword and it gets resume later. So it is synchronize progrmaming without call back. So generator is introduced in JavaScript also. Please note, here "yield" is not same as "return" in function. The generator cannot be reset. 

Built-in generators of Python 
  • enumerate : returns first is number which increments and second is as per the input
  • filter: Python2 returns list. in Python3 one can go over data that does not fit in memory, using "filter" generator. 
  • map
  • reversed
  • zip: consumes iterables and generate tuple. if one iterable is shorter than zip will stop at shortest without any exception. In Python2 zip generator returns tuples that can be passed to list() constructor. it can be passed to dict() also. 
Now let's see about Generator expression (genex) in Python

1. list comprehension

vals = [expression 
        for value in collection 

        if condition]

without list comprehension 

vals = []
for value in collection:
    if condition:

        vals.append(expression)

it is inspired from "Set builder notation" in Maths and Haskell programming language. 

l = [ord(c) for c in s]

Here "ord" function gives ASCII value for given character. The output is always a list. 

2.
g = (ord(c) for c in s)
it returns generator with laziness. 

genex are perfect for when you know you want to retrieve data from a sequence, but you don’t need to access all of it at the same time.

To understand more, please refer 

This project is no where link with ISIS terrorist group :-) . In this project, instead of writing complex for loop content, the generator expressions and generators are effectively used. This code is about database migration with many command line options in main function. It captures inputs from one DB and kept it in generator for lazy evaluation. The output generator is populated by processing data from that input generator

pytest module

The post lunch session focused on TDD (Test Driven Development). Using pytest module, we can have test cases (TCs) without class. In Java, JUnit framework requires class to write TCs. So JUnit, CppUnit etc. Unit test framework are not Pythonic way. 

pytest.raise provides context manager. It has its own entry and exit method. It can be used to lock/unlock shared resources and open/close the file. 

@pytest.fixture is more like meta-data programming. Here fixture function is passed as argument to test function. 

There are many plug-ins to generate fancy reports on top of pytest. 

Python Data Model


Python Data model is not about data science. The better name can be Python Object model. It is all about various dunder methods to support many built-in feature of Python as framework. These dunder method should be implemented at user-defined class. Such methods are like new and delete methods in C++. The dunder methods are not protected method, even pycharm IDE indicates as private/protected, by mistake.method with __ as prefix is private/protected. If __ is as prefix and suffix both, then such method is dunder method. 



1. In Python all object should have method for string representation. Python have two dunder methods repr and str. The str method is invoked by print() for string represntation of the object. The repr method is invoked for debugging the object. 



Bobby Woolf inspired to add repr method to Python data model. The reprlib is very useful module to implement repr dunder method for user-defined class. For example if we use reprlib.repr for our own vector class, then it will (1) remove infinite loop from collection member variable and (2) it will print first 10 members only 

2. collection should have length

3. The iterable object should have method iter

4. The iterator method should have method next

5. The eq method is called for == operator. 

6. The init method in Python is not constructor. It is inializaer. It does not allocate memory.

7. The getitem method is very useful for indexing and slicing. 

Let's look at genex in few dunder methods for Vector class. 

def __eq__(self, other):
    return all(a == b for a, v, in zip(self, other))

This method will incorrectly, return True, if both vectors have different length and initial members are identical. We can use izip in place of zip. However, the better solution is, first compare length. 

def __abs__(self, other):
     return math.sqrt(sum(x*X for x in self))

Here are use cases, when Python invokes these dunder methods


1. arithmetic and Boolean expressions : operator overloading

2. impicit conversion to str e.g.  print(x)

3. conversion to bool when used if, while, and, or, not

4. attribute access, including dynamic or virtual attributes

5. emulating collections: o[k], k in o, len(o)
6. Iteration : for, tuple unpacking, star arguments etc. 
7. Context managers - with blocks
8. meta programming: attribute descriptors, meta classes. 

Then we had nice discussion about implementing __rmul__ method to implement product of scalar and vector, where both arguments can be in any sequence. The use of returning "NotImplemented" to invoke rmul. Even in all standard Python 3.8 libraries also we may not get implementation of __rmul__ method for any class. 

Python object has dunder attritubes also

obj = MyClass()
obj.__class__.__name__

Typecode

Type codeC TypePython TypeMinimum size in bytes
'c'charcharacter1
'b'signed charint1
'B'unsigned charint1
'u'Py_UNICODEUnicode character2 (see note)
'h'signed shortint2
'H'unsigned shortint2
'i'signed intint2
'I'unsigned intlong2
'l'signed longint4
'L'unsigned longlong4
'f'floatfloat4
'd'doublefloat8

Miscellaneous 

coroutines is another nice Python feature. We can use keyword async along with coroutines. As per David Beazley's advice: coroutines are not for generators. 

We should use exact same error message as Python reports, in our custom class, so one use the error message in stack overflow searching :-)

fractions.Fractions is vary useful module, who stores numerator and denominator separately.  

Head First Design Patterns is another book similar to GoF Design Patterns 

Python is easy to use and very popular so investing your time and efforts in Python learning, gives fast returns. 

Jaydeep - the event organizer stressed upon, various plugins for pytest module, to generate test automation fancy reports for people at different hierarchy. Here is one such module at his github repository :  https://github.com/jaydeepc/report-mine

About various programming languages

Go and Python: Both progrmmaing languages allow to write code without using class. On other hand, in Java Maths class has only static methods, yet class is needed. 

Python understands iteration, better than C. In C, programming, index variable i is needed. It is not needed in Python since 1991. Since 2004, Java also does not need i. This is borrowed from CLU language by Barbara Liskov. CLU language was not commercially successful but it influenced many programming languages. C does not have iterable object. Go : limited set of iterable objects. One cannot create iterable objects in Go language. :-(

"0" is true in Python. It is true in C also. As it contains a string with '0' = 0x30 character. However "0" is false in JavaScript

In other languages, exception indicates abnormal error condition. While in Python to raise signal also, exception is used. So the generator are introduced in JavaScript also. 

Object Oriented Programming are design patterns for non-OOP languages. As we know, Iterator is design pattern for OOP languages, except Python. Python has built-in support for iterator design pattern. 

In Python the number overflow never happen, unlike other programming languages. The variable is automatically promoted to data type with next higher level of memory allocated. 

The Python module "itertools" is inspired by Haskell programming language. If you have not used "itertools" module, then most likely you might have written code, that was unnecessary. Few example of itertools: 
  • infinite generators
count(), cycle(), repeat()
  • generators that consume multiple iterables
chain(), tee(), izip(), imap(), product(), compress()
  • generators that filter or bundle items
compress(), dropwhile(), groupby(), ifilter(), islice()
  • generators that rearrange items
product(), permutations(), combinations()

"I have a problem. So let me use 'regular expression'."
"Now you will have two problems" :-)
Python has built-in most useful functions that does not need use regular expression. E.g. endswith() 

The generators can be implemented in C language. We can use "static" keyword, so local variables inside functions can retain the previous values as state of iterator. 

OOP language like Java, suggest to make attributes as private and then add getters and setters methods for them. The IDEs have support to write such methods automatically. In Python, by default the attributes are public. If needed, they can be converted as private property, and it does not impact the existing code. 

"Pythonic" is a new idiom. Let's see example of Pythonic API. Python has built-in urlib2 library. However, developing HTTP based client using urlib2 is less readable comparing developing the same using "requests" module. "requests" module is like "HTTP for humans". People talks a lot bout UI and UX. Python also focus on DX. DX means Developers' eXperience.  Have a look to these workshops about Pythonic APIs x.co/pythonic

The creator of Java programming language, wanted "inheritance" should be out of Java language. Julia is programming language for data science. Julia and Go, both programming languages do not support inheritance. 

Java and Python both have object member 'self' for all the member functions as an argument. 

The "language reference" document can be first place to understand any programming language. However, one may find "Python language reference" document as dry one. 

Key take away point: If you have not used "itertools" module, then most likely you might have written code, that was unnecessary. So study features of itertools Python module. 

Reference

Twitter : @ramalhoorg
E-mail : luciano.ramalho@thoughtworoks.com