Managing Storage with Containers
https://www.meetup.com/Docker-Bangalore/events/253542738/
Flex Volume Drivers in Kubernetes and CSI, Peeyush Gupta, IBM
=============================================================
Storage in container
- stateless / stateful
- volumes
- dynamic provisioning
- PVC, PV and Storage Class
* Storage Class refers to dynamic provisioning
* PVC refers to Storage Class
* POD refers to PVC for volume.
Kubelets running on host, request Flex Driver. Flex Driver implements vendor specific APIs for storage/volume : 1. Mount 2. Unmount 3. attach 4. detach.
This binary need to be placed(copied) at specific path for each POD.
For CNI also cadico driver need to be placed(copied) at specific path for each POD. The better alternative is CSI = Container Storage Interface.
CO = Container Orachstrator. Example: Kubernetes (K8S), MESOS, Cloud Foundray, OpenShift (by RedHat).
CO has 1. node 2. controller 3. identity
There is a single binary for node and controller. Based on Identity either node or controller role can be played.
1. Indemotent APIs
2. Sidecar container
2.1 Driver register (Identity Service)
2.2 Extended Provisoner (watch create volume)
2.3 Extended Attacher (watch attach)
2.4 Liveness Probe
2.5 Extended snapshotter (started in mid July 2018)
Containerized Gluster Storage in Kubernetes - Saravanakumar, Red Hat
====================================================================
GlusterFS was born by oil industry. Oil industry need to process data from different hosts to detect presence of oil. Now it is more than 10 years old.
Steps (all steps as sudo)
1. install and start glusterd service on all host.
2. gluster peer status
3. gluster volume create
4. gluster volume start
This will start gluster on all host.
5. gluster volume status
6. mount -t glusterfs
PVC access mode
1. ROX: Read only by many nodes
2. RWO: Read/Write by single node
3. RWX: Read/Write by many nodes
Heketi provides a RESTful management interface which can be used to manage the life cycle of GlusterFS volumes
Storage requirements for running Spark workloads on Kubernetes, Rachit Arora
============================================================================
Spark core engine runs over 1. Yarn, 2. MESOS, 3. Standalone Schedular, 4. K8S
1. Spark SQL 2. Spark Streaming 3. Spark Machine Learning Lib 4. GraphX runs over Spark Core Engine
* Data engineer 1. Ingest and store data from multiple source 2. Prepare Data.
* Data Scientis 2. Prepare Data 3. Analysze Data build model
* Application Developer 4. Visulize Data
Now new trend is serverless analytics.
'Spark over K8S' provides Jupyter-Kernel gateway for data scientist to analyze data
Distributed FS
1. NFS and BigNFS
2. HDFS
3. DBFS (Data Briks FS)
4. S3 / Object Storage
5. Portworx
6. GlusterFS
URLs:
datascience.ibm.com
www.ibm.com/analytics/us/en/watson-data-platform/tutorial
Tweeter Handle : @k8sBLR
Flex Volume Drivers in Kubernetes and CSI, Peeyush Gupta, IBM
=============================================================
Storage in container
- stateless / stateful
- volumes
- dynamic provisioning
- PVC, PV and Storage Class
* Storage Class refers to dynamic provisioning
* PVC refers to Storage Class
* POD refers to PVC for volume.
Kubelets running on host, request Flex Driver. Flex Driver implements vendor specific APIs for storage/volume : 1. Mount 2. Unmount 3. attach 4. detach.
This binary need to be placed(copied) at specific path for each POD.
For CNI also cadico driver need to be placed(copied) at specific path for each POD. The better alternative is CSI = Container Storage Interface.
CO = Container Orachstrator. Example: Kubernetes (K8S), MESOS, Cloud Foundray, OpenShift (by RedHat).
CO has 1. node 2. controller 3. identity
There is a single binary for node and controller. Based on Identity either node or controller role can be played.
1. Indemotent APIs
2. Sidecar container
2.1 Driver register (Identity Service)
2.2 Extended Provisoner (watch create volume)
2.3 Extended Attacher (watch attach)
2.4 Liveness Probe
2.5 Extended snapshotter (started in mid July 2018)
Containerized Gluster Storage in Kubernetes - Saravanakumar, Red Hat
====================================================================
GlusterFS was born by oil industry. Oil industry need to process data from different hosts to detect presence of oil. Now it is more than 10 years old.
Steps (all steps as sudo)
1. install and start glusterd service on all host.
2. gluster peer status
3. gluster volume create
4. gluster volume start
This will start gluster on all host.
5. gluster volume status
6. mount -t glusterfs
PVC access mode
1. ROX: Read only by many nodes
2. RWO: Read/Write by single node
3. RWX: Read/Write by many nodes
Heketi provides a RESTful management interface which can be used to manage the life cycle of GlusterFS volumes
Storage requirements for running Spark workloads on Kubernetes, Rachit Arora
============================================================================
Spark core engine runs over 1. Yarn, 2. MESOS, 3. Standalone Schedular, 4. K8S
1. Spark SQL 2. Spark Streaming 3. Spark Machine Learning Lib 4. GraphX runs over Spark Core Engine
* Data engineer 1. Ingest and store data from multiple source 2. Prepare Data.
* Data Scientis 2. Prepare Data 3. Analysze Data build model
* Application Developer 4. Visulize Data
Now new trend is serverless analytics.
'Spark over K8S' provides Jupyter-Kernel gateway for data scientist to analyze data
Distributed FS
1. NFS and BigNFS
2. HDFS
3. DBFS (Data Briks FS)
4. S3 / Object Storage
5. Portworx
6. GlusterFS
URLs:
datascience.ibm.com
www.ibm.com/analytics/us/en/watson-data-platform/tutorial
Tweeter Handle : @k8sBLR
MLCC
Posted by
Manish Panchmatia
on Saturday, August 18, 2018
Labels:
ArtificialIntelligence,
Bangalore,
Education,
Innovation,
MachineLearning,
Meetup
/
Comments: (0)
Full article...>>
Let me share key take away points from meetup event "Google Machine Learning Study Jam"
In Feb/March 2018, Google announced MLCC Machine Learning crash course In July 2018, MLCC Study Jam series comes to India. Click here and click here to know more. I attended one such event with my friend, by Industry 5.0 meetup.
Here are few useful links
TensorFlow Content Bundle, Spring 2018
Gradients and Partial Derivatives : YouTube Video
Later on, I found the Maths play list is good. All videos by Eugene Khutoryansky are excellent
Another YouTube video by Christopher Gondek. Here also, the playlist about 'Machine Learning Visualization' is good.
In Feb/March 2018, Google announced MLCC Machine Learning crash course In July 2018, MLCC Study Jam series comes to India. Click here and click here to know more. I attended one such event with my friend, by Industry 5.0 meetup.
Here are few useful links
TensorFlow Content Bundle, Spring 2018
Gradients and Partial Derivatives : YouTube Video
Later on, I found the Maths play list is good. All videos by Eugene Khutoryansky are excellent
Another YouTube video by Christopher Gondek. Here also, the playlist about 'Machine Learning Visualization' is good.
Microsoft announced about FPGA based Edge Computing : Brainwave project
AutoML and transfer learning. At present, they are at nascent stage. Once let it fully evolved then we may not need people who know AI/ML. The machine themselves will learn. I did little Googling and found few links : http://www.ml4aad.org/automl/ and https://automl.info/
As per my knowledge, after completing any Machine Learning course till one completely switch his/her career path, Kaggle is the only platform to get hands-on experience. I came to know one more such platform Seedbank I found one seed about 'Piano Transcription' quite interesting. We discussed with Sanjay Chitnis about creating similar seed to recognize Indian Raga
There is an interesting book 'Pattern Recognition and Machine Learning (Information Science and Statistics)' by Chrisopher M. Bishop
http://playground.tensorflow.org is an excellent, browser based Neural network tool. It is also used as part of MLCC We discussed about L1 regularization, L2 regularization, confusion matrix, precision, accuracy, recall, F1 score, Receiver operating characteristic etc. Precision is all about how many positive case, the algorithm could able to detect out of all positive cases. Recall is about how good is the diagnostic test?
CNN is combination of filter and dimension reduction. RNN is a special case of LSTM. GAN is widely used to creation. The GAN Zoo has list of all variations of GAN
We also discussed about Semi-Supervised Learning , Topic Learning OR Keyword Learning, that is beyond supervised learning, Gold standard etc.
At the end, Sanjay drew out attention to an interesting trend that now, product cost is keep reducing. Features in products are keep increasing. Service cost is keep increasing ! India has lots of data available. There is good scope of data analytics and machine learning for General Election 2019 at India.