Distilled Python
Sometime back, I came across an excellent book on Python: "Fluent Python" (O'Reilly, 2015) by Luciano Ramalho. On Saturday, 28th April, I got opportunity to listen to him live at Geeknight workshop "Distilled Python: Features you must know to use it well" at Thoughtworks, Kormangala, Bangalore. Here is my note about the event, exclusive for readers of this blog: Express YourSelf !
He programms in Python since 1998. His speaking record includes PyCon US, OSCON, OSCON-EU, PythonBrasil, RuPy and an ACM Webinar. All the slidedecks are available at :
https://speakerdeck.com/ramalho
Generators and Iteration
Generator allows lazy data processing. Here data is loaded to main memory, as and when needed. In Haskell programming language, almost everything is lazy data processing. At opposite, numpy Python module is about fast data processing. Here all data is loaded in memory for vector arithmetic. Intel introduced new instructions MMX (Multimedia eXtension) in mid 1990s.
import sys
for arg in sys.argv:
print arg
This can be achieved with dictionary also
obj = MyClass()
obj.__class__.__name__
He programms in Python since 1998. His speaking record includes PyCon US, OSCON, OSCON-EU, PythonBrasil, RuPy and an ACM Webinar. All the slidedecks are available at :
https://speakerdeck.com/ramalho
Generators and Iteration
Generator allows lazy data processing. Here data is loaded to main memory, as and when needed. In Haskell programming language, almost everything is lazy data processing. At opposite, numpy Python module is about fast data processing. Here all data is loaded in memory for vector arithmetic. Intel introduced new instructions MMX (Multimedia eXtension) in mid 1990s.
import sys
for arg in sys.argv:
print arg
Here is list of Python's built-in iterable objects and items they yield.
- str : unicode char
- bytes: int 0 to 255
- tuple : individual fields
- dict: keys
- set: elements
- io.TextIOWrapper: Unicode lines
- models.query.QuerySet : DB rows
- numpy.ndarray : Multidimensional array , elements, rows
Here are few use cases of Iterator in Python
Parallel Assignment is possible with iterable objects.
It is also called tuple unpacking. However it is not specific to tuple. Here right side of = sign is iterable.
pairs = [('a', 10), ('B', 20)]
for label, size in pairs:
print(lable, '->', size)
Multiple values can be passed to function
This is also called star arguments. Here t is tuple.
t = (3,4,5)
fun(*t)
def fun(a, b, c):
This can be achieved with dictionary also
d = {a:3,b:4,c:5}
fun(**d)
def fun(a, b, c):
Reduction functions:
We use "map-reduce" in Big Data. Python has support for "map-reduce" However such reduction functions server the purpose of "map-reduce"
- all : boolean
- any : boolean
- max
- min
- sum
Reduction functions consume iterable and provide single result.
Python has mapreduce.
One can write more readable code, where multiple and conditions are replaced by all and multiple or conditions are replaced by any
Sorting
sort() function only for sorting list. sort() sorts list in place.
sorted(): a built in function. it consumes any iterable. It has keyword argument for sorting key
Here one can pass even function as argument. Unlike other sorting library in C/C++, here the function is not for comparison. It is about to generate key.
To write poems, one needed words who ends with same characters. Here is source code.
sorted(L, key=lambda s:list(reversed(s)))
Now let's have a look in details about iterator . Python has built-in support for iterator design pattern. There are two types of objects (1) Iterable (2) Iterator.
Like food is etable, a collections of objects are iterables. iterable has methods like __iter__
The iterator has state. It has methods like __next__
Please note, the next method is not part of iterable object, as it can be shared by multiple threads.
StopIteration excpetion raised by next() method.
in python "for loop" obtains iterator from iterable. Then it repeatedly invokes next() on iterator.
Now something about generator. In Python, generator is like synonymous of iterator, and can be used interchangeably, but they both are different. In Python its syntax is same as normal function. The generator is also defined with "def" keyword like function. However only generator contains "yield" keyword somewhere in code.
One should not invoke __iter__, __next__ method directly. Here, Python acts like framework. So developer will not invoke those methods, but let Python as a framework invokes them, as and when needed. The developer can create our own dender methods __next__, __iter__ etc. in object. next(g) is implemented in optimized way in C language.
In generator the execution flow is frozen at "yield" keyword and it gets resume later. So it is synchronize progrmaming without call back. So generator is introduced in JavaScript also. Please note, here "yield" is not same as "return" in function. The generator cannot be reset.
Built-in generators of Python
- enumerate : returns first is number which increments and second is as per the input
- filter: Python2 returns list. in Python3 one can go over data that does not fit in memory, using "filter" generator.
- map
- reversed
- zip: consumes iterables and generate tuple. if one iterable is shorter than zip will stop at shortest without any exception. In Python2 zip generator returns tuples that can be passed to list() constructor. it can be passed to dict() also.
Now let's see about Generator expression (genex) in Python
1. list comprehension
vals = [expression
for value in collection
if condition]
without list comprehension
vals = []
for value in collection:
if condition:
vals.append(expression)
vals = [expression
for value in collection
if condition]
without list comprehension
vals = []
for value in collection:
if condition:
vals.append(expression)
it is inspired from "Set builder notation" in Maths and Haskell programming language.
l = [ord(c) for c in s]
Here "ord" function gives ASCII value for given character. The output is always a list.
2.
g = (ord(c) for c in s)
it returns generator with laziness.
genex are perfect for when you know you want to retrieve data from a sequence, but you don’t need to access all of it at the same time.
genex are perfect for when you know you want to retrieve data from a sequence, but you don’t need to access all of it at the same time.
To understand more, please refer
This project is no where link with ISIS terrorist group :-) . In this project, instead of writing complex for loop content, the generator expressions and generators are effectively used. This code is about database migration with many command line options in main function. It captures inputs from one DB and kept it in generator for lazy evaluation. The output generator is populated by processing data from that input generator.
pytest module
The post lunch session focused on TDD (Test Driven Development). Using pytest module, we can have test cases (TCs) without class. In Java, JUnit framework requires class to write TCs. So JUnit, CppUnit etc. Unit test framework are not Pythonic way.
pytest.raise provides context manager. It has its own entry and exit method. It can be used to lock/unlock shared resources and open/close the file.
@pytest.fixture is more like meta-data programming. Here fixture function is passed as argument to test function.
There are many plug-ins to generate fancy reports on top of pytest.
Python Data Model
Python Data model is not about data science. The better name can be Python Object model. It is all about various dunder methods to support many built-in feature of Python as framework. These dunder method should be implemented at user-defined class. Such methods are like new and delete methods in C++. The dunder methods are not protected method, even pycharm IDE indicates as private/protected, by mistake.method with __ as prefix is private/protected. If __ is as prefix and suffix both, then such method is dunder method.
1. In Python all object should have method for string representation. Python have two dunder methods repr and str. The str method is invoked by print() for string represntation of the object. The repr method is invoked for debugging the object.
Bobby Woolf inspired to add repr method to Python data model. The reprlib is very useful module to implement repr dunder method for user-defined class. For example if we use reprlib.repr for our own vector class, then it will (1) remove infinite loop from collection member variable and (2) it will print first 10 members only
2. collection should have length
3. The iterable object should have method iter
4. The iterator method should have method next
5. The eq method is called for == operator.
6. The init method in Python is not constructor. It is inializaer. It does not allocate memory.
7. The getitem method is very useful for indexing and slicing.
Let's look at genex in few dunder methods for Vector class.
def __eq__(self, other):
return all(a == b for a, v, in zip(self, other))
This method will incorrectly, return True, if both vectors have different length and initial members are identical. We can use izip in place of zip. However, the better solution is, first compare length.
def __abs__(self, other):
return math.sqrt(sum(x*X for x in self))
Here are use cases, when Python invokes these dunder methods
1. arithmetic and Boolean expressions : operator overloading
2. impicit conversion to str e.g. print(x)
3. conversion to bool when used if, while, and, or, not
4. attribute access, including dynamic or virtual attributes
5. emulating collections: o[k], k in o, len(o)
6. Iteration : for, tuple unpacking, star arguments etc.
7. Context managers - with blocks
8. meta programming: attribute descriptors, meta classes.
Then we had nice discussion about implementing __rmul__ method to implement product of scalar and vector, where both arguments can be in any sequence. The use of returning "NotImplemented" to invoke rmul. Even in all standard Python 3.8 libraries also we may not get implementation of __rmul__ method for any class.
Python object has dunder attritubes also
obj.__class__.__name__
Typecode
Type code | C Type | Python Type | Minimum size in bytes |
---|---|---|---|
'c' | char | character | 1 |
'b' | signed char | int | 1 |
'B' | unsigned char | int | 1 |
'u' | Py_UNICODE | Unicode character | 2 (see note) |
'h' | signed short | int | 2 |
'H' | unsigned short | int | 2 |
'i' | signed int | int | 2 |
'I' | unsigned int | long | 2 |
'l' | signed long | int | 4 |
'L' | unsigned long | long | 4 |
'f' | float | float | 4 |
'd' | double | float | 8 |
Miscellaneous
coroutines is another nice Python feature. We can use keyword async along with coroutines. As per David Beazley's advice: coroutines are not for generators.
We should use exact same error message as Python reports, in our custom class, so one use the error message in stack overflow searching :-)
fractions.Fractions is vary useful module, who stores numerator and denominator separately.
Python is easy to use and very popular so investing your time and efforts in Python learning, gives fast returns.
Jaydeep - the event organizer stressed upon, various plugins for pytest module, to generate test automation fancy reports for people at different hierarchy. Here is one such module at his github repository : https://github.com/jaydeepc/report-mine
About various programming languages
Go and Python: Both progrmmaing languages allow to write code without using class. On other hand, in Java Maths class has only static methods, yet class is needed.
Python understands iteration, better than C. In C, programming, index variable i is needed. It is not needed in Python since 1991. Since 2004, Java also does not need i. This is borrowed from CLU language by Barbara Liskov. CLU language was not commercially successful but it influenced many programming languages. C does not have iterable object. Go : limited set of iterable objects. One cannot create iterable objects in Go language. :-(
"0" is true in Python. It is true in C also. As it contains a string with '0' = 0x30 character. However "0" is false in JavaScript
In other languages, exception indicates abnormal error condition. While in Python to raise signal also, exception is used. So the generator are introduced in JavaScript also.
Object Oriented Programming are design patterns for non-OOP languages. As we know, Iterator is design pattern for OOP languages, except Python. Python has built-in support for iterator design pattern.
In Python the number overflow never happen, unlike other programming languages. The variable is automatically promoted to data type with next higher level of memory allocated.
The Python module "itertools" is inspired by Haskell programming language. If you have not used "itertools" module, then most likely you might have written code, that was unnecessary. Few example of itertools:
- infinite generators
- generators that consume multiple iterables
- generators that filter or bundle items
compress(), dropwhile(), groupby(), ifilter(), islice()
- generators that rearrange items
product(), permutations(), combinations()
"I have a problem. So let me use 'regular expression'."
"Now you will have two problems" :-)
Python has built-in most useful functions that does not need use regular expression. E.g. endswith()
The generators can be implemented in C language. We can use "static" keyword, so local variables inside functions can retain the previous values as state of iterator.
OOP language like Java, suggest to make attributes as private and then add getters and setters methods for them. The IDEs have support to write such methods automatically. In Python, by default the attributes are public. If needed, they can be converted as private property, and it does not impact the existing code.
"Pythonic" is a new idiom. Let's see example of Pythonic API. Python has built-in urlib2 library. However, developing HTTP based client using urlib2 is less readable comparing developing the same using "requests" module. "requests" module is like "HTTP for humans". People talks a lot bout UI and UX. Python also focus on DX. DX means Developers' eXperience. Have a look to these workshops about Pythonic APIs x.co/pythonic
The creator of Java programming language, wanted "inheritance" should be out of Java language. Julia is programming language for data science. Julia and Go, both programming languages do not support inheritance.
Java and Python both have object member 'self' for all the member functions as an argument.
The "language reference" document can be first place to understand any programming language. However, one may find "Python language reference" document as dry one.
Key take away point: If you have not used "itertools" module, then most likely you might have written code, that was unnecessary. So study features of itertools Python module.
Reference
Slidedeck : https://speakerdeck.com/ramalho/pythonic-apis-1
Twitter : @ramalhoorg
E-mail : luciano.ramalho@thoughtworoks.com
Event Details : https://www.meetup.com/ThoughtWorks-Bangalore/events/249979933/
2 comments:
Thank you so much for this beautiful detailed write up Manish. Thank you for taking out time on a saturday and joining us. Just a small note. The repository for pytest report was :
https://github.com/jaydeepc/report-mine
Instead of
https://github.com/jaydeepc/pytest-json
Thanks again. Hope to see you again for the upcoming meet ups .
Cheers,
Jaydeep
thanks Jaydeep, I have corrected in the article.
Post a Comment