Keynote 2 : Kubecon India 2024
Shopify has very large scale deployment with AI use cases algorithm :
- Vector relations of products.
- Credit Card frauds
- Many GPUs
* GPU utilization v/s developer productivity is trade off.
Challenges
1. Build v/s buy
2. Dev experience : skypilot and rocket ML
==========================
Shadow role in K8s release team is best place to start contributing at K8s
Cato is for AI. This is another good place to start with.
He showed many Indian architectures like Taj Mahal (Agra), Jantar Mantar (Jaipur) and inspire Indian to have largest contributors in the world
==========================
TajMahal also built with diversity.
Conscious and continuous effort for diversity is invisible, important.
Now many meetings started and will start in APAC friendly timezone
Very hard to justify open source contribution to employer.
Contributors shall be move to maintainers.
==========================
2014 Stateless
2017 Stateful
2019 Serverless
2023 AI
Cloud Native AI (CNAI) working group : Streamline the integration of AI with cloud native ecosystem.
Whitepaper CNAI
CN is ideal platform for AI
- Cost efficiency
- Scalability
- Containerization
- Harmony among dev, test, staging and production
- High Availability
- Microservice Architecture
CNAI from 3 perspective
1. K8s:
- DRA Dynamic Resource Allocation. inspired by PV/PVC (1.26, 1.32 beta)
2. ML engineers
- Kubeflow has many projects for different use cases
- Queue for ML batch processing
3. App Developer
- OPEA - Open Platform for Enterprise AI
website: opea.dev
1. Data Prep
2. Embedding *
3. LLM/SLM *
4. Vector DB *
6. Receiver
7. Reranking
* OPEA provides recipes for all options. 20+ GenAI recipes
They are validated at Intel, ARM, AMD architecture
MongoDB / Neo4J Graph Database. no need of Vector DB.
Minio is common data layer
OPEA is available on Azure, AWS
CNAI has its own landscape on CNCF website
WG
- Scheduling
- Security
- Sustainability
AI Playground validate OPEA samples on ARM with free Oracle Credit. CNAI needs people.
==========================
1980 data Spreadsheet
1990 Information DataBase
2000 Knowledge Data Warehouse
2010 Insight Analytics (Hadoop, Spark)
2020 Intelligence AIML
2025+ Action Agents
Analogy
- Agents Apps
- GenAI OS
- LLM Kernel
Characteristics
1. Decision Making
2. Memory
3. Reasoning
4. Action
Analogy
Container - Agent
OCI runtime - LLM
Deterministic Logic - Adaptive Logic
stateless by default - stateful by nature
static resource limit - dynamic resource
Identical replicas - Unique instance
Docker run -> compose -> K8s
Agent -> Multiple agents that needs orchestration. Here K8s fits
K8s is universal control plane for VM, DB, Iot edge, docker, WA. Agent will be yet another workload type.
Arch : Agent Operator
1. Agent controller
2. Schedular
3. CR
LLM will tell Agent Controller what agent to create.
Agent CR YAML will have Task, model, memory, tools, person
AI : Crewai, metaflow, airflow,
CN: Argo, dapar, Numaflow, KServe (came out of Kubeflow)
K8s KEPs
They are not just feature proposal. It is about
- Feature Design
- Different Alternatives
- Testing plan
Feature LCM = Feature, Deprecation, update. Captured in KEP
What is KEP? KEP-0000 K8s Enhancements Proposals
A unit of work: combined features or single feature or major changes
Capture decisions in well crafted artifacts
KEP has YAML based metadata.
/enhancements/keps
Stages:
1. Alpha: Disable by default. Enable with Feature Gate
2. Beta: API groups disable by default. Feature is enable.
3. GA: All enables
Demo
Git Issue : 114465 Add Sleep for pod life cycle handler
Let sleep is first class citizen, so no need to add its binary.
KEP is not hard but time taking. We need to take sign in from different people of community.
Feature = Enhancement = KEP
Non-Goals are very important, so distraction is avoided
Proposal section of KEP has technical part. Then read the code
Then "Design Details" may have pseudo code
"Graduation Criteria" for all 3 stages.
"Upgrade / Downgrade strategy"
K8s repo has staging repository.
SCHEO
/hack/update-codegen.sh will generate auto code
We can have KIND image with our own k8s code using config.yaml for KIND Also enable Alpha feature flag
KubeAPI Server YAML
Look for post-release blogs for user-firendly docs. Mid-release blogs also.
Beta to GA then remove feature enable check
KEP also captures all PR
ML Interfacing pipeline
NIM is off the self interfacing framwork
NIM is about decide which model to use, based on number of GPU? Which GPU? Performance criteria (throughput v/s latency)? floating point library. NIM can autodetect hardware
GenAI with RAG has many services.
NIM operator to deploy RAG application CR: 1. NIM Cache (PVC) 2. NIM service 3. NIM pipeline all service can increase together.
NIM monitoring and autoscaling: Prometheus
1. Utilization of hardware
2. inter token latency
3. first token time generation.
4. request per second
Monitoring of NIM
2 seconds, 15 chat user etc are input for SLA.
NIM monitoring operator choose metrics from many metrics exposed by NIM
Autoscaling
In the sample chat application : milvus DB is needed. RAG is frontend service
ML Deployment
Canary has mode deployment time, because more iteration. 10% increase means 10 iteration.
Blue Green has less.
Fine tune Max Surge, Max unavailable etc.
Traffic Mirroring is for ML load also.
image compression, local registry etc needed because model size is big.
Service Continuity in 5G
There is no inter cluster redundancy by K8s. We need to use proprietary solution OR cloud.
In telecom a component is connected with multiple. E.g vCU with vDU, EMS, 5GC
Sync Driver
GRAF framework with AI, Management Data analytics function(MDAF), policy driven
A1 interface is better than MDAF. As MDAF is at core network. it will add latency.
Nephio : open source project : LF + Google
Automate LCM of cloud infra and NF. Intent based declarative approach
LinkedIN : saurabhswaraj
GitOps based approach
1. N/w driver is GR aware
2. DB drive sync 2 MySQL
YAML PV and DB table synch
Each cluster has GRAPH controller
We can have several other use cases also
GRAPH is framework. We can develop our own driver.
GRAPH f/w = Redundancy Manager at Orchestrator + GRAPH controller at each cluster
Based on CRD, different driver will be deployed at each K8s cluster.
Still GRAPH is not open source. It is in process for open source. At present it is in R&D stage.
GRAPH can work with many orchestrator including Nephio
All DB has Replication Manager. Why do we need DB drive? DB drive is not novelty. Our novelty is framework.
Policy based, when failed comes up again, what will happen.
Keptn
Kept has Two Operators: App LCM and for Metrics
Day1 and Day2 operation
Integrated with Argo
Metrics based scaling: HPA and Scala
works for outside K8s application
It works with DORA metrics
Multistage application delivery.
1.5 million+ TPS
Each pod will have local bucket and then implement leaky bucket algorithm.
Start bucket size with big number
Learning from rate limiting journey
Service with smaller threshold has higher precision
higher threshold has upto 10% error rate
pod has its own swimlane
-------------------------------------------------------------------------
For inter+intra rate limit, need to synch API-GW and serviceMesh.
Unified config management
Naavik knows, where service resides (Kubecon Paris)
Canary release pattern : First canary call, the service registry will read from DB and put it in cache. For remaining, no need to access DB. Response from cache. Canary pod can put value in S3 bucket, for further optimization, and notify all pods to read form s3 bucket
Linkerd
* rust based data plane is pretty light
* We can use any Ingress Controllers (IC) with Linkerd. IC are based on (1) NGINX (2) Envoy (2) HA Proxy. NGINX IC has Mister-Minion model
* security : authz policy, mTLS
* intra cluster URL. So no need to go to LB for intra cluster.
* Gateway API (GW-API) integration
* 1.17 Linkerd has many new features.
* canary deployment
Object Store for Vector DB app
github.com/thotz/python-vectordbapp-ceph
It can search image and text both
it uses RGW bucket and in-memory channel
CR has embedded function
CR is per bucket
CR is matched with milvus collection like, vector dimensions, index type, vector metric, staticjk schema
Data Plane Technologies for Load Balancer
LB has many features
These features are supported by data plane technologies
data plane has many technologies, including open source.
1. eBPF :
2. VPP
- L2-L4 n/w
- graph based network
- fast lookup
DPDK :
- latency reduce by bypassing kernel space
- large page and pool.
1. eBPF - XDP model
2. VPP-DPDK model
KubeCon 2024 India. Keynotes
- Upcoming KCDs
1. April, 2025 at Chennai
2. June, 2025 at Bangalore
- Kubestronaut : Takeoffs begin in 2025. In this announcement, India's map was incorrect.
- Upcoming KubeCon and CloudNativeCon
Hydrabad August 6 to 7 , 2025
LF Networking will be launched in phase 1 under LF India
=============================================================
Takeaway points : Flipkart
Ambient proxy is better. Sidecar cannot scale as per pod's TPS, Fan out etc.
For PaaS Controller is better than Helm to deploy
=============================================================
Takeaway points :
- K8s is everywhere
- Nothing is complete with AI
- Starching the limits of K8s