Managing Storage with Containers

Flex Volume Drivers in Kubernetes and CSI, Peeyush Gupta, IBM

Storage in container

- stateless / stateful
- volumes
- dynamic provisioning
- PVC, PV and Storage Class

* Storage Class refers to dynamic provisioning
* PVC refers to Storage Class
* POD refers to PVC for volume. 

Kubelets running on host, request Flex Driver. Flex Driver implements vendor specific APIs for storage/volume : 1. Mount 2. Unmount 3. attach 4. detach. 

This binary need to be placed(copied) at specific path for each POD. 
For CNI also cadico driver need to be placed(copied) at specific path for each POD. The better alternative is CSI = Container Storage Interface. 

CO = Container Orachstrator. Example: Kubernetes (K8S), MESOS, Cloud Foundray, OpenShift (by RedHat). 
CO has 1. node 2. controller 3. identity

There is a single binary for node and controller. Based on Identity either node or controller role can be played. 

1. Indemotent APIs
2. Sidecar container
2.1 Driver register (Identity Service)
2.2 Extended  Provisoner (watch create volume)
2.3 Extended Attacher (watch attach)
2.4 Liveness Probe
2.5 Extended snapshotter (started in mid July 2018)

Containerized Gluster Storage in Kubernetes - Saravanakumar, Red Hat

GlusterFS was born by oil industry. Oil industry need to process data from different hosts to detect presence of oil.  Now it is more than 10 years old. 

Steps (all steps as sudo)

1. install and start glusterd service on all host. 
2. gluster peer status
3. gluster volume create
4. gluster volume start
This will start gluster on all host. 
5. gluster volume status
6. mount -t glusterfs

PVC access mode
1. ROX: Read only by many nodes
2. RWO: Read/Write by single node
3. RWX: Read/Write by many nodes

Heketi provides a RESTful management interface which can be used to manage the life cycle of GlusterFS volumes

Storage requirements for running Spark workloads on Kubernetes, Rachit Arora

Spark core engine runs over 1. Yarn, 2. MESOS, 3. Standalone Schedular, 4. K8S
1. Spark SQL 2. Spark Streaming 3. Spark Machine Learning Lib 4. GraphX runs over Spark Core Engine

* Data engineer 1. Ingest and store data from multiple source 2. Prepare Data. 
* Data Scientis 2. Prepare Data 3. Analysze Data build model 
* Application Developer 4. Visulize Data

Now new trend is serverless analytics. 

'Spark over K8S' provides Jupyter-Kernel gateway for data scientist to analyze data

Distributed FS
1. NFS and BigNFS
3. DBFS (Data Briks FS)
4. S3 / Object Storage
5. Portworx
6. GlusterFS


Tweeter Handle : @k8sBLR


Let me share key take away points from meetup event "Google Machine Learning Study Jam

In Feb/March 2018, Google announced MLCC Machine Learning crash course In July 2018, MLCC Study Jam series comes to India. Click here and click here to know more. I attended one such event with my friend, by Industry 5.0 meetup.

Here are few useful links

TensorFlow Content Bundle, Spring 2018

Gradients and Partial Derivatives : YouTube Video 
Later on, I found the Maths play list is good. All videos  by  Eugene Khutoryansky are excellent

Another YouTube video by  Christopher Gondek. Here also, the playlist about 'Machine Learning Visualization' is good. 

Microsoft announced about FPGA based Edge Computing : Brainwave project

AutoML and transfer learning. At present, they are at nascent stage. Once let it fully evolved then we may not need people who know AI/ML. The machine themselves will learn. I did little Googling and found few links : and

As per my knowledge, after completing any Machine Learning course till one completely switch his/her career path, Kaggle is the only platform to get hands-on experience. I came to know one more such platform Seedbank  I found one seed about 'Piano Transcription' quite interesting. We discussed with Sanjay Chitnis about creating similar seed to recognize Indian Raga

There is an interesting book 'Pattern Recognition and Machine Learning (Information Science and Statistics)' by Chrisopher M. Bishop is an excellent, browser based Neural network tool. It is also used as part of MLCC We discussed about L1 regularization, L2 regularization, confusion matrix, precision, accuracy, recall, F1 score, Receiver operating characteristic etc. Precision is all about how many positive case, the algorithm could able to detect out of all positive cases. Recall is about how good is the diagnostic test? 

CNN is combination of filter and dimension reduction. RNN is a special case of LSTM. GAN is widely used to creation. The GAN Zoo has list of all variations of GAN

We also discussed about Semi-Supervised Learning , Topic Learning OR Keyword Learning, that is beyond supervised learning, Gold standard etc. 

At the end, Sanjay drew out attention to an interesting trend that now, product cost is keep reducing. Features in products are keep increasing. Service cost is keep increasing ! India has lots of data available. There is good scope of data analytics and machine learning for General Election 2019 at India.