Service Catalog


Introduction

The Service Catalog offers :
- powerful abstractions to make services available in a Kubernetes cluster. 
These services are:
- typically third-party managed cloud offerings OR
- self-hosted services 

Developers can focus on the applications without the need for managing complex services deployment.

Service Catalog is list of (1) service class (or service offering) E.g services : database, messaging queue, API gateway, a log drain, or something else. Services are associated with (2) service plan The service plan is variant of service in terms of cost, size etc.

Service Broker
Typically, a third party Service provider will expose a Service Broker on their own infrastructure

Service configuration options using JSON schema to configure service and plan. Automatic form building is also possible. 

Service Broker use cases

1. Service breaker handle interactions from modern cloud native apps to legacy system for valueable data stored at legacy system.


2. OSBAPI allows interactions with multiple cloud provider. 

Service broker can also implement web-based service dashboard

A Service Broker is a piece of software (server) that implements the Open Service Broker APIs. These APIs are for
- listing catalog
- provisioning 
- deprovisioning
- binding service instance with application
- unbinding service instance from application

These are secure APIs. Service broker implements OAuth etc on interface with application / container / platform. a service broker proxy can be used to support custom authentication flow. 

For time consuming provisioning and deprovisioning, OSBAPI supports asynchronous operation. 

The service broker first register with K8s.
K8s platform is like client software, that request service broker. The first request may be to get service catalog. Then K8s platform will ask to create new service instance

on-demand : service instance is provisioned, when requested.

multi-tenant : service instance are pre-provisioned.

The service instance will be bind to K8s application/pod using Service BindingsTypically, a Service Binding will also provide credentials and connectivity information (IP and port) to the application to access the Service. These credentials will be managed by K8s as secrets. 

Here, service catalog is K8s resource and it has corresponding (1) Service Catalog Controller and (2) Service Catalog API Gateway. The end-user will interact with Service Catalog API gateway. The gateway will ask service broker to list out all the services (service offering/class + service plan). Then K8s will update (remove duplicate, add, delete, modify) the master service catalog and respond to application/platform client. 

Comparison


Open Service Broker API             Kubernetes servicecatalog.k8s.io Kinds
Platform                            Kubernetes
Service Broker                     ClusterServiceBroker
Service                             ClusterServiceClass
Plan                                ClusterServicePlan
Service Instance                    ServiceInstance
Service Binding                     ServiceBinding

Reference 

Docker Volume


Create Volume

There are multiple options

1. With -v flag. 

docker run -v /data

Here /data folder can be accessed from inside the running container. It is mapped to 

/var/lib/docker/volumes/8e0b3a9d5c544b63a0bbaa788a250e6f4592d81c089f9221108379fd7e5ed017/_data

4. we can also specify the path 
docker run -v /home/usr/docker_data:/data

3. Using DOCKERFILE
One can add

VOLUME /data

Notes:
3.1. Here we can not specify path on host. 
3.2. After creating VOLUME, we cannot add file

RUN useradd foo
VOLUME /data
RUN touch /data/x
RUN chown -R foo:foo /data

This will not work. 

3.3. We can do the same in DOCKERFILE, before creating VOLUME

RUN useradd foo
RUN mkdir /data
RUN touch /data/x
RUN chown -R foo:foo /data
VOLUME /data

This will work

4. Create a volume and attached it

docker volume create --name my-vol
docker run -v my-vol:/data

It is similar to earlier approach. Here we just specify the name to volume.

Earlier instead of creating volume, data container was used.

5. we can specify volume of other container, even if the container is not running. 

docker run --volumes-from"container name"

The option -v for persistent volume and the option --volumes-from is for ephemeral volume

Delete volume

* docker rm command will not delete volume. Orphan volume will remain
* Remove parent docker docker rm -v If the volume is not referred by any other container, then it will be removed. 
* Volumes linked to user specified host directories are never deleted by docker.
* To have a look at the volumes in your system use docker volume ls:
docker volume ls
* To delete all volumes
docker volume rm $(docker volume ls -q)

Cloud Native


Cloud Native
- Promotes OpenSource
- MicroService Architecture
- Containers and container orchestration tools (Read: Docker and K8s)
- Agility
- Automation

Cloud Computing
- On demand computing on Internet
- Minimal Mgmt efforts
- Cost effective due to economies of scale.

Serverless
- cloud-computing execution model
- dynamically managing resources.
- Pricing is based on resources consumed
- application, run in managed, ephemeral containers on a “Functions as a Service” (FaaS) platform. ” - Reduced operational cost, complexity and engineering lead time

MicroService
- Software development technique
- A Variant of SOA
- loosely couples services
- fine grained services
- lightweight protocols
- modular design

Service Oriented Architecture
- Service reusability
- Easy maintenance
- Greater reliability
- Platform independence

MicroService  Benefits 
- Agility
- Fine-grained Scaling
- Technology Independence
- Parallel Development
- Better Fault Isolation
- Easier to Refactor
- Easy to Understand
- Faster Developer On boarding

Micro Service Architecture Challenges
- Operational Complexity
- Performance Hit Due to Network Latency
- Increased Configuration Management
- Unsafe Communication Medium
- Harder to Troubleshoot
- Architectural Complexity
- Higher Costs
- Duplication of Developer Effort

Cross cutting concerns
- Externalized configuration
- Logging
- Health checks
- Distributed tracing
- Boilerplate code for integrations with message broker, etc.

API Design
REST over HTTP using JSON is common choice

* One can have shared database among all services OR database per service. 

Fault Tolerance 
The Circuit Breaker pattern can prevent an application from repeatedly trying to execute an operation that's likely to fail. It allows services to recover and limit cascading failures across multiple systems

Log Aggregation
ELK
1. Logstash
2. Elastic Serach
3. Kibana

Distributed Tracing
The correlation ID helps to understand flow of events across services. 

Securing Micro Services
1. JSON Web Token (JWT) 
2. OAuth2. About resource access and sharing. 
3. OpenID Connect. About user authentication. It is on top of OAuth2

K8S Deployment
https://container-solutions.com/kubernetes-deployment-strategies/
https://github.com/ContainerSolutions/k8s-deployment-strategies/

Bengaluru Tech Summit : Day 3 - Part 2


Next panel discussion was on "Society 5.0 : Moving towards smart society by Japan". The investment in infrastructure projects needs patience, while US investors are impatients. So Japan is better choice with its excess capital. In Bangalore, all metro project, sewage project etc infrastructure projects are funded by Japan.  Human species evolved from  hunting to agriculture to industry revolution. Japan had huge contribution during the  industry revolution with its unique quality processes/tools etc. Even today they are applicable in software/IT industry as DevOps practices. India's strength is neither manufacturing nor hardware. India strength is knowledge. Its skilled, English speaking manpower in knowledge driven industry of today's. India also has massive data for machine learning of around 1.2 billion UID. China has even more data, but under full control of government, so no use. "Bengaluru Tokyo Technology Initiative" is worth to explore. Bangalore is hub for technology, efficiency and entrepreneurship. Japan welcomes Indians. 

Dolphin tank is an interesting initiative at Bangalore. It proposes to mentor and guide the start-ups through the next stage of journey for 6 months.  Just like the dolphin who is a friend in the ocean, Dolphin guides the person through the journey in the rough sea till the person becomes independent. India had cost arbitrage earlier, now India has skill arbitrage. Large organisations depends on India for innovation. Now such organisation does not increase head count at Indian center. They effectively utilizes the startup ecosystem of India.  

Panel discussion about "Decoding industry 4.0 and digitization: from vision to reality by Germany". Today if one cannot learn, unlearn and relearn then he/she is illiterate. There will be 200 billion devices by year 2025. There are four pillars of IoT: 
1. Hardware, 
2. Extremely complex algorithms, 
3. Software = Algorithm + data. Software takes action based on them. 
4. Cyber Security. 
It requires risk taking attitude for testing and trial for IoT project. German has more corporate culture. An average German person is skillful in analog, slow and perfect in engineering. While India has startup culture, risk taking attitude, software skill etc. So both are supplementary to each other. 

The industry 4.0 revolution is happening right now. There is no time of 8 years of fundamental research. The capable heavy industry needs supports from SMEs for smart factories. Today with AR and VR, one can get feel of being in middle of dangerous machine, aviation and heavy sector. All regions of world has different lead time to manufacture IC in small quantity. China/Tiwan never accept order in small quantity. German quotes high price with long delivery time. The IoT and Industry 4.0 revolution is happening now, here in college curriculum. During QA, three challenges for startups were discussed:
1. Skillset
2. Finance 
3. Scalability
"Who is better? A German leader or Modi?" "Nelson Mandela." 

It is possible to write book / biography by "ghost writer" using NLP based software. 

Panel discussion about "Geospatial innovations in the times of disruptive changes by KSRSAC" Karnataka Geographic Information System is worth to explore. Geospatial analysis and deep learning based GeoAI are also interesting fields. Bharat Lohani explains, how his company Geokno use LiDAR technology to capture data. They cover 300 square km per day. They have developed alogrihtms, that can detect terrain even in dense forest. They can detect wild creatures in forest. they can detect water channel, very useful to divide water between two states and to help better irrigation. They can create 3d maps. The shadow analysis throughout a day, is very useful for establishing solar energy plant. One more interesting talk by Laxmi Prasad Putta from Vassar Labs. 

Panel discussion about " Emerging Technologies areas in the Indic language-technology Industry by FICCI". Vivekanand Pani talked about his company Reverie Language Technologies's product "Gopal". It is a virtual assistant that can speak many Indian languages. Thirumalai Anandanpillai mentioned that Microsfot's Azure cloud exposes we service API for speech to text for Indian languages. Vinay Chhajlani's WebDunia.com turns 19 years old company in September 2018, who survived through dot com boom. In 2014, DNS was supporting 15 different languages. In 2019, Kannada will be also added in DNS, that is called IDN (International Domain Name). In the world 20 % people knows English and in India 12 % people knows English. So IDN is needed for non-English domain name. On 1st May 2018, a legal framework for Indic language was established. Since 2014 Gmail supports IDN based e-mail addresses. 4 million users uses Hindi email address by Rajsthan Government. Today more Internet data traffic is about entertainment content. However it will change, as government will give more online services in regional languages. The emerging opportunity is driving emerging of technology. In year 2010, VCs thought that Indian language speaking customers has no money so they were reluctant to invest for such language based startups. 22 years ago, we had only 1 % of PC penetration, so no need of Hindi. In 2010, 50% of mobile penetration, so Hindi was needed. So in 2011, first mobile phone launched with Hindi supports. Today many people are first time Internet user with mobile. In 2002, the way people were behaving at Yahoo chat room, today this first time user may behave same way. 

During QA, I asked about Sanskrit, Sage Panini's Sanskrit grammar and its relevant with NLP. Everyone nominated Thirumalai Anandanpillai from Microsoft to answer the question. May be because he has nice TILAK on his forehead. He replied, that today translation happens by Neural Network. It is not rule based. Panini's grammar is relevant and useful for rule based services. 

So overall "Bengaluru Tech Summit" was worth to attend event, with many thought provoking ideas, updates about recent trends, startups and insight to upcoming futures. There was also exhibition with stalls from established companies and startups both. 

Service Meshes + Kubernetes: Solving Service-Service Issues


Saturday 23rd March Kubernetes Day India 2019 event is happening in Bangalore. Mr. Ben Hall has arrived here, in Bangalore as speaker for this event. He is founder of https://katacoda.com/ The "DigitalOcean Bangalore" Meetup group, grabbed this opportunity and organized meetup event "Service Meshes + Kubernetes: Solving Service-Service Issues" today evening, where MR. Ben Hall shared his knowledge. Here is my notes exclusively for readers of this blog "Express YourSelf !"

Digital Ocean announced about another event do.co/tide2019 on April 4th and encourage all to participate. They also promote new offering about managed Kubernetes on Digital Ocean. 

Kubernetes is great tool. However we need service mesh for security, scaling, communication among pods. It is about TLS, certification roations Auth/Security, Rate Limit, Retry Logic, Fail Over etc. It provides more control for A/B testing, Canary releases, collecting system metrics and to verify, is this trusted caller pod? The service mesh can be implemented using: 

  • ASPEN MESH
  • linkerd
  • HasiCorp Consul
  • istio

Istio provides four key capabilities
1 connect. service discovery, load balancing 
2 secure. e.g protect the system against fake payment service. encryption, authentication, authorization 
3 control
4 observe

Istio adds/extends some more capabilities to kubernetes APIs, by adding YAML files. Grafana and Prometheus installation is part of istio installation. 


istio is all about just configuring Envoy proxy. Istio uses three major components. 1. Pilot 2. Mixer and 3. Citadel

He demonstrated istio on his own website / learning platform katacoda. He testing using curl to generate ingress traffic. He also mentioned and demonstrated "scope" tool. It is for monitoring, visualization and management for Docker and k8s. It is not part of istio. 

We had interesting QA sessions.

* Yes, istio is adding little latency, when we use HTTP 1.0 based RESTful APIs. However, we get performance gain, when we use gRPC or HTTP 2.0 based RESTful APIs. It is tradeoff between, performance at production environment for given hardware, or gain in terms of developer's productivity. 

* Prometheus  is used to store all matrices of cluster. 

* How to configure different time out values for 2 different consumers, who consumes the same service? Well, we can duplicate service with different names, or modify some application level logic. 

* for pod communication, one can use either RPC (gRPC, RESTful API) based or enterprise service bus based (message queue and kafka). If one takes the second approach, then istio may or may not provide additional values. It depends. 

* These was a quick comparison between SDN based routing and istio based approach. 

* during informal QA, we discussed about Gloo. Gloo is a feature-rich, Kubernetes-native ingress controller, and next-generation API gateway, powered by Envoy. 

At the end we had tea and light snacks of cookies and Samosa. 

Reference 

https://www.meetup.com/DigitalOceanBangalore/events/259864782/
https://www.slideshare.net/BenHalluk/presentations
https://medium.com/google-cloud/understanding-kubernetes-networking-ingress-1bc341c84078
https://blog.aquasec.com/istio-service-mesh-traffic-control
https://gloo.solo.io/
https://github.com/solo-io/gloo

https://layers7.blogspot.com/2019/03/kafka-communication-among-micro-services.html
https://layers7.blogspot.com/2017/12/istio.html

Disclaimer

This blog post is NOT verbatim of the speech. I captured this note as per my understanding. It may not necessarily indicate the speaker's intention. So corrections/suggestions/comments are welcome.

MicroService Structure


1. API

* Operation
- Command to modify data
- Query to retrieve data

* Types
- Synchronous
- Asynchronous 

* Protocol
- gRPC
- RESTful

Here the microservice is like server. Customer will invoke the service using API

2. API client

Here the microservice is like client. It will invoke another microservice using its API. 

3. Event Publisher

4. Event Consumer

Event is typically DDD event.  

5. Business logic

6. Private Database

MicroServices : common characteristics



- Automated Deployment

- Componentization via Services : 
-- component is a unit of software that is independently replaceable and upgradeable.
-- A service may consist of multiple processes that will always be developed and deployed together
-- services are independently deployable.
-- Microservices have their own domain logic

- Organized around Business Capabilities
-- No 3-tier

- Products not Projects

- Smart endpoints and dumb pipes
-- microservices aim to be as decoupled and as cohesive as possible
-- Microservices receiving a request, applying logic as appropriate and prois ducing a response
-- No Enterprise Service Bus (ESB), with sophisticated facilities for message routing, choreography, transformation, and applying business rules.

- Decentralized Governance
-- difference component, different langugae
-- Patterns: Tolerant Reader and Consumer-Driven Contracts

- Decentralized Data Management
-- Domain-Driven Design DDD divides a complex domain up into multiple bounded contexts and maps out the relationships between them.
-- Polyglot Persistence : Each service owns its database
-- Results in simpler upgrade of application. 
1. User sessions : Redis
2. Financial Data and Reporting : RDBMS
3. Shopping Cart : Riak
4. Recommendation : Neo4j
5. Product Catalog : MongoDB
6. Analytics and User activity logs : Cassandra

- Infrastructure Automation
-- CI/CD Pipeline

- Design for failure
-- The application should able to tolerate the failure of services
-- Real time monitoring (of circuit breaker status, current throughput and latency) and auto restore of services.


- Evolutionary Design
-- How to divide monolith : 
1. The key property of a component: independent replacement and upgradeability
2. drive modularity through the pattern of change: Most frequent changed code should be in one service. Least frequent changed, stable code in another service. The modules often changed to gather should be merged in single service. 
-- Avoid using versing of services
-- Example The Guardian website

Reference
https://martinfowler.com/articles/microservices.html
https://martinfowler.com/bliki/PolyglotPersistence.html

Python Lambda


Introduction


  • Lambda is a special type of Python function that have limited capabilities.
  • A function that will be used only once in your program. These functions are called anonymous or unbound functions.


Syntax
lambda arguments: expression


  • Lambda functions can accept zero or more arguments but only one expression
  • Here return statement is implicit.

What kind of things can I, and can I not, put into a lambda? 

Don't or Limitations

  • If it doesn’t return a value, 
  • If it isn’t an expression and can’t be put into a lambda
  • You can only use expressions, but not statements.
  • lambda function with multiple lines/expressions/statements is not possible
  • You cannot declare and use local variables.
Do
  • If you can imagine it in an assignment statement, on the right-hand side of the equals sign, it is an expression and can be put into a lambda.
  • lambda can be member of list, dictionary 
  • You can only use one expression.

Use cases

1. map and reduce Python built-in function
2. filter Python built-in function
3. sort and sorted Python built-in function
4. Print formating
5. Use in WhLambda functions can accept zero or more arguments but only one expressionile statement. 
6. if for loop is invoking a function, then it can replaced with map. 


Multiple expressions

Use tuples, and implement your own evaluation order
If you are a coward and fear that evaluation order will change in a future release of python, 

you can use eval
map(eval,("f1()","f2()",...,"fn()"))

You can also use apply
map(lambda x:apply(x,()),(f1,f2,f3))

General Notes
  • One can pass functions as arguments to other functions
  • A function can be the return value of another function.
  • The expression returns (or evaluates to) a value, whereas a statement does not. 
Reference 

https://pythonconquerstheuniverse.wordpress.com/2011/08/29/lambda_tutorial/
http://p-nand-q.com/python/lambda.html
https://www.bogotobogo.com/python/python_functions_lambda.php

Python itertools module


Introduction

itertools is a collection of “fast, memory efficient tools”. If you have not used "itertools" module, then most likely you might have written code, that was unnecessary. Its name is itertools as, it consists of functions made using iterator building blocks. All of the functions in the itertools module have yield keyword. Remember ? Every generator is an iterator but not every iterator is a generator.


Split

islice
to slice list like [start:stop] we can optionally pass step also. If we pass only one argument then it is value for stop and start = 0 by default. If you want to specify start, then you must specify stop. to indicate stop = 'till end', use None as stop value  

combinatoric 

accumulate
Here default is sum (for int) and concatenation (for string). Default function : "operator.sum". One can pass custom function also. E.g. operator.mul. 
Example: add new item and repeat last two items. So each item will be repeated 3 times. 

def f(a, b):
    print(a, b)
    return a[-2:] + b

def fn_accumulate():
    print(list(itertools.accumulate('abcdefg', f)))

This can be written in better way, using second order recurrence relation, as mentioned in https://realpython.com/python-itertools/

permutations = npr
combinations = ncr
combinations_with_replacement = ncr + pair with self

For all above three functions, r can be passed optionally. 
product to generate Cartesian product. It can be used to generate binary numbers. 
map(list, itertools.product("01", repeat = 5))

Merge

chain
merging two lists, quicker. it can use to add anything as prefix in list. As suffix in list. 

chain.from_iterable
Here we need to first put all iterables in a list and then pass the list (iterables )

zip_longest
similar to zip. Add 'None' for unpair items. We can also pass fillvalue to replace None with something else. 

we can use zip to add index number to all. 

for i in zip(count(1), ['a', 'b', 'c']):
    print i

Infinite operator

count is an infinite iterator 
To stop (1) use if condition OR (2) use islice
optionally step can be passed to count. This step can be fractions.Fraction too. 

cycle is also infinite iterator. It will cycle through a series of values infinitely
To stop use if condition

repeat takes object as input, instead of itertable. 
To stop pass optional argument 'times'

Filter

All functions in filter category takes predicate function as an argument. 

compress 
Here second argument is list of Booleans. Accordingly the first argument will be compressed, i.e. absent or present. It is like masking with boolean & (AND) operator.  Here the Boolean list can be output of cycle to create a pattern. Note: these values in list are consider as False. 0, False, None, ''
For example

def fn_compress():
    mask = [1, 0, 0]
    res = itertools.compress(range(20), itertools.cycle(mask))
    for i in res:
        print(i)


dropwhile
instead of list of Booleans, we can pass predicate as first argument here. Here if predicate return true, then drop. if predict return false then do not drop + disable predict. 

takewhile
it is opposite of dropwhile. Here if predicate return true then take. if predict return false then do not take + siable predict. 

filterfalse 
Same as dropwhile. Here the predict will not be disable. 
It will return all values where predicate function returns false. It takes predicate function and itertable as input. This is opposite to built-in-function filter.

We can use built-in function filter, to list out all the factors of given integer and find out prime or not also. 

Grouping

groupby
if sort tuple, then it will be sorted as per first element. Then if use groupby for this sorted tuple then it will return itertato of "key , iterator" combinations. Here key is first memeber of tuple, which repeat  in several tuple. and iterator is list of all tuples, where that perticular key is used. 

This can be used to remove duplicate character in string 
>>> foo = "SSYYNNOOPPSSIISS"
>>> import itertools
>>> ''.join(ch for ch, _ in itertools.groupby(foo))
'SYNOPSIS'

Note: 
groupby will not do sorting. So if we call without calling sorted then it will group by only locally. As such no use of groupby without calling sorted. in sorted we can pass key.

sorted function by default sort on first value of tuple. This can be change by passing
1. key = lambda x: x[1]
2. key = operator.itemgetter(1)

Miscellaneous 

starmap 
It is similar to map. Here, we can pss list, or, list of tuple. We can use to convert co-ordinate systems, Cartesian to Polar and vica-versa. with map function, we can pass two list. Those two list should be zip and pass as list of tuple in starmap. for map if we want to pass constant in stead of one of the iterators, we can use repeat from itertools. The map can also use to create list of objects, where constructor is passed as function. 

tee
it will generate multiple independant iterators. It is like copy function. It can also create multiple and identical tuples (iterators) from a string. Default = 2 copies. When you call tee() to create n independent iterators, each iterator is essentially working with its own FIFO queue.

Possible software using itertools

1. Veda Chanting pattern. GHAN - PAATH etc.
2. Convolution coding.
3. Turbo coding
4. Gray code generator. 
5. Fibonacci series. 


Reference

Blog

https://medium.com/discovering-data-science-a-chronicle/itertools-to-the-rescue-427abdecc412
https://www.blog.pythonlibrary.org/2016/04/20/python-201-an-intro-to-itertools/
https://pymotw.com/3/itertools/index.html
https://realpython.com/python-itertools/

Github : similar tools

https://github.com/erikrose/more-itertools
https://github.com/topics/itertools

Jupiter Notebook

https://github.com/ericchan24/itertools/blob/master/itertools.ipynb
https://github.com/dineshsonachalam/Python_Practice/blob/master/6.Itertools.ipynb

Official Documentation

https://docs.python.org/3/library/itertools.html

Python Documentation


Here are some list of tools for Python, similar to Doxygen

1.
https://en.wikipedia.org/wiki/Sphinx_(documentation_generator)
http://www.sphinx-doc.org/en/master/

2.
https://en.wikipedia.org/wiki/Epydoc
http://epydoc.sourceforge.net/

3. 
https://en.wikipedia.org/wiki/HeaderDoc

4.
https://en.wikipedia.org/wiki/HeaderDoc
http://epydoc.sourceforge.net/stdlib/pydoc-module.html

Interesting Python Modules


Visualization
Orange and Plot.ly are for data visualization

VPython is the Python programming language plus a 3D graphics module called Visual


GUI
Appjar for cross platform GUI

Kivy for Mobile app and desktop app

wxPython for GUI widgets

simplegui to develop GUI application

HTML/XML
Beautiful SOAP for parsing

Network

Construct to generate packets. construction and deconstruction of data structure

Twisted is an event-driven network programming framework


lcapy to teach electronics

PySerial: Gives the ability to use serial communication

Multimedia
PIL Python Image Library  for image manipulation

scikit-image : Image processing library. 

SndObj (Sound Object) for music and audio.  

Panda3D is a game engine that includes graphics, audio, I/O, collision detection, and other abilities relevant to the creation of 3D games


Python-Ogre, Soya3D are also for 3D engine

Python plugin can be written for Rhythmbox. https://wiki.gnome.org/Apps/Rhythmbox/Plugins/Python%20Plugin%20Examples


PyGame and Pyglet for gaming

Template

Cheetah is template engine

Genshi is a template engine for XML-based vocabularies written in Python.


Kid is simple template engine. 


Topsite Templating System allows simple melding of Python code with static content


Jinja is web template engine

Machine Learning and Data Analysis
Panda for data manipulation and analysis

PyTorch ML Library based on Torch

SageMath Computer algebra system

Scikit-learn ML library

statsmodel: Explore data and statistics 

Theano for matrix

SymPy for symbolic computation

The Computational Geometry Algorithms Library (CGAL) for computational geometry algorithms. 

Misc
pyflakes check the Python source code files

Deep Learning
Chainer is an open source deep learning framework written purely in Python on top of Numpy and CuPy

Keras neural-network library

Gensim is for unsupervised topic modeling and natural language processing, 

NLTK for NLP

SpaCy for advanced NLP


PyBrain: Helps to build artificial intelligence


Business
Cubes is a light-weight multidimensional modelling and OLAP toolkit for development reporting applications and browsing of aggregated data 

QAL for transforming data interfacing MongoDB

RPyC for RPC

SQLAlchemy

SQLObject and Storm: Python Object to relational mapper

Reference
https://en.wikipedia.org/wiki/Category:Python_scientific_libraries
https://en.wikipedia.org/wiki/Category:Python_libraries 

Python Cross Compilers


  • Jython : Python to Java ByteCode, that runs on JVM
  • IronPython : Python to .NET, that runs on CLR
  • RPython to build PyPy interpreter. 
  • - Python to Java Byte Code
  • - Python to .NET CLR
  • - Python to C
  • Pyjs: Python to JavaScript
  • Cython, Pyrex: Python to C
  • Cython, Pythran, Shed Skin: Python to C+
  • Google's Grumpy: Python to Go
  • MyHDL: Python to VHDL
  • Stakeless Python: CPython for coroutines
  • MicroPython: Python for Microcontroller
  • Unladen Swallow: Performance oriented CPython 

Hadoop


a software framework for distributed storage and processing of big data using the MapReduce programming model. 

Hadoop consists of
1. Hadoop Common : JAR and scripts to start Hadoop
2. HDFS
3. Hadoop YARN (Yet Another Resource Negotiator) : 2 daemons : job tracking (resource manager) and progress monitoring (application master)
4. Hadoop MapReduce

and other tools in Hadoop Ecosystem
* Apache Pig, 
* Apache Hive : 
- Data Warehouse for data query and analysis. 
- input : HiveQL
- output : queries to MapReduce, Apache Tez, Spark Jobs
1. Metastore : Apache Derby RDBMS
2. Driver : controller
3. Compiler : HiveQL Query -> Abstract Syntax Tree AST -> Directed Acyclic Graph DAG
4. Optimizer : optimized DAG
5. Executor : interact with Hadoop Job Tracker
6. CLI / UI / Thrift Server over network like ODBC/JDBC
- (ACID): Atomicity, Consistency, Isolation, and Durability.
* Apache HBase : 
- Database
- Features : compression, in-memory operation, and Bloom filters on a per-column basis
- input and output for MapReduce jobs
- Accessed through Java API, REST, Avro, 
* Apache Phoenix : SQL Layer for HBase. 
* Apache Spark : Analytics Engine
* Apache ZooKeeper, 
* Cloudera Impala, 
* Apache Flume, 
* Apache Sqoop, 
* Apache Oozie,
* Apache Storm.
* Apache Mahout : ML for
- collaborative filtering, 
- clustering and 
- classification

Architecture 

* Master Node: 
Job Tracker, 
Task Tracker, 
NameNode, (Primary and Secondary) 
DataNode. 

* A slave or worker node :
- DataNode and Task Tracker. Task Tracker in separate JVM.  
- DataNode only
- Compute only

Job Tracker and Task Tracker expose status and information over Jetty web server.  

HDFS

default replication value, 3, data is stored on three nodes: two on the same rack, and one on a different rack.

HDFS was designed for mostly immutable files. Not suitable for concurrent write operations.

File access by
- native Java API, 
- the Thrift API (Binary RPC protocol)
over
- CLI
- HDFS-UI application over HTTP
- 3rd party network client libraries 

Monitoring
- Hortonworks
- Cloudera
- Datadog

File Systems
- HDFS
- FTP file system
- Amazon S3 (Simple Storage Service) object storage
- Windows Azure Storage Blobs (WASB) file system
- IBM General Parallel File System
- Parascale file system
- CloudIQ Storage product by Appistry
- location-aware IBRIX Fusion file system driver by HP
- MapR FS by MapR Technologies Inc

Hadoop Major version
Hadoop 1
Hadoop 2
- YARN
Hadoop 3
- multiple name nodes
- container
- decreases storage overhead with erasure coding.
- GPU hardware for deep learning. 

Hadoop on AWS
Amazon Elastic MapReduce EMR
Amazon Elastic Compute Cloud EC2
Amazon Simple Storage Service S3

Apache Spark


Spark is lightning-fast unified analytics engine for large-scale data processing.

Dataset API 2.x
Dataframe API
resilient distributed dataset (RDD) 1.x. RDD is any type of Python, Java, or Scala objects.

Spark Cluster Manager
- native Spark cluster
- Hadoop YARN
- Apache Mesos
- K8s

Spark Distributed storage
- Alluxio, 
- Hadoop Distributed File System (HDFS),
- MapR File System (MapR-FS),
- Cassandra,
- OpenStack Swift, 
- Amazon S3, 
- Kudu, 
- a custom solution

Languages
- Java, 
- Scala, 
- Python, 
- R, and 
- SQL.

Spark Components
1. Spark Core
- RDD centric functional programming
2. Spark SQL
- DSL Domain Specific Language to manipulate dataset
- supports CLI and JDBC/ODBC server
3. Spark Streaming
- Consume from Kafka, Flume, Twitter, ZeroMQ, Kinesis, and TCP/IP sockets
4. MLlib
5. GraphX

Spark can perform

- batch processing (similar to MapReduce) 
- streaming, 
- interactive queries, and 
- machine learning

Reference

https://spark.apache.org/third-party-projects.html
https://spark.apache.org/docs/latest/quick-start.html
https://github.com/apache/spark/tree/master/examples/src/main/python