SPIFFE


SA is at cluster level. 

So Nepheo could not use SA

Every CSP has workload identity

spiffe is standard: 

- spiffe id. It is URL. 

- spiffe verifiable documents (SVIDs): cert or toekn

- The spiffe workload API. 

spire: spiffe Runtime Environment. 

- A toolchain of API for establising trust based on spifee

- provides out of the box attestation plugins

Expiry is short. can be 4 hours. So no need of revocation

* spire agent can be colocated. it is dameonset in K8s. 

=========

Nephio

ss7, sigtra, ngin, CN model (e.g. ORAN)

DISH is on AWS

CP based requirement for identity

Nephio SIG security wiki page has all details

Porch : Package Orchestration KPT

KPT does in place substitution 

5G requirements / usecases

IMS, SMO , IMS

LF article about Nephio spifee implementation at LF wiki

Catalog packages at GitOps

Each cluster shall have its own repo

Identity federation is based on cert chain. 

R3 Oct 23 of Nephio. 

It is proposed solution. It will be upstream. 

Workload identity solution shall not be native to specific cloud provider. 

Identity federation across CSPs. 

Google, E//, RedHat are in Nephio

SPIRE's alternative may be due to speicfic attestion plugin

What protocol between SPIRE Agent and SPIRE server? Bootstrap trust. it is pre-provision aspect. REST API and TLS. x.509 cert will be pulled. protocol is spire specific

Today's attestation is based on SA, pod labels, namespace. 

CA, Cert Manager can be used. 

Network Automation


Telecom Networks are complex due to multi layer, multi vendor

N/w Management -> SDN -> Intent Based Networking (programable and declarative) -> Cloud Native Networking

Earlier Monolithic NMS with FCAPS

Now : CICD, Microservice, K8s. 

NSP (N/s Service Platform) is for IP and optical domain

It has API (OpenAPI Spec). 

Model-driven mediation

Framework has orchestration 

Contributed by Nokia: Kubenet, gNMIc, SDCIO

1. Unified Artifactory Manager Component

It uses Kubespray

UAM creates CRs. CRs are consumed by deployer. Deployer is short lived job. 

2. Telemetry: 

A: internal NSP components

B: External system

Four Core Principle

1. Model driven

2. Vendor & Mediation Agnostic

3. Horizontal scale

4. Resilent

Six Layers

6. Analytics and optimization layer

5. o/p / storage layer : Kafka

3.and 4 make it model driven

4. Normalization Layer

3. Mapping layer

2. Collector layer (SNMP, gNMI) 

1. N/w layer

Architecture

UAM, Restconf GW

source : from network using SNMP, gNMI

Sink: influxDB, Prm, VErtica, Kafka, PostgreSQL, File

Source and Sink are connected using NATS. NATS also connected with multiple transform worker using transformer CR from UAM

gNMIc

1. single mode

2. CLI mode (auto complete option)

3. cluster mode (more replica. one is leader). 

Kubenet and SDCIO

declarative model and event driven reconciliation. It is more n/w automation using K8s. Gitops principle. 

Arch:

SDCIO Schema Driven Configuration. 

IPAM etc are CRD to build abstract network configuration. 

Config CR and ConfigSet CR, RunningConfig, UnmanagedConfig. It has different backend own etcd. 

YANG by schema server. 

==========================

BNG, CUPS specific implementation 

Kubenet Nephio are solving same problem? May be overlap. 

APIs for sink? customer provides sink. 

Kubenet is automation. more than NMS

Slide 21: Cisco Prime

Keynote 2 : Kubecon India 2024


Shopify has very large scale deployment with AI use cases algorithm : 

- Vector relations of products. 

- Credit Card frauds 

- Many GPUs

* GPU utilization v/s developer productivity is trade off. 

Challenges

1. Build v/s buy 

2. Dev experience : skypilot and rocket ML

==========================

Shadow role in K8s release team is best place to start contributing at K8s

Cato is for AI. This is another good place to start with. 

He showed many Indian architectures like Taj Mahal (Agra), Jantar Mantar  (Jaipur) and inspire Indian to have largest contributors in the world

==========================

TajMahal also built with diversity. 

Conscious and continuous effort for diversity is invisible, important. 

Now many meetings started and will start in APAC friendly timezone

Very hard to justify open source contribution to employer.

Contributors shall be move to maintainers.

==========================

2014 Stateless

2017 Stateful

2019 Serverless

2023 AI

Cloud Native AI (CNAI) working group : Streamline the integration of AI with cloud native ecosystem. 

Whitepaper CNAI

CN is ideal platform for AI

- Cost efficiency

- Scalability

- Containerization

- Harmony among dev, test, staging and production

- High Availability

- Microservice Architecture

CNAI from 3 perspective

1. K8s: 

- DRA Dynamic Resource Allocation. inspired by PV/PVC (1.26, 1.32 beta)

2. ML engineers

- Kubeflow has many projects for different use cases

- Queue for ML batch processing

3. App Developer

- OPEA - Open Platform for Enterprise AI

website: opea.dev

1. Data Prep

2. Embedding *

3. LLM/SLM *

4. Vector DB *

6. Receiver

7. Reranking

* OPEA provides recipes for all options. 20+ GenAI recipes 

They are validated at Intel, ARM, AMD architecture

MongoDB / Neo4J Graph Database. no need of Vector DB.

Minio is common data layer

OPEA is available on Azure, AWS

CNAI has its own landscape on CNCF website

WG

- Scheduling

- Security

- Sustainability

AI Playground validate OPEA samples on ARM with free Oracle Credit. CNAI needs people. 

==========================

1980 data Spreadsheet

1990 Information DataBase

2000 Knowledge Data Warehouse

2010 Insight Analytics (Hadoop, Spark)

2020 Intelligence AIML

2025+ Action Agents

Analogy

- Agents Apps

- GenAI OS

- LLM Kernel

Characteristics

1. Decision Making

2. Memory

3. Reasoning

4. Action

Analogy

Container - Agent

OCI runtime - LLM

Deterministic Logic - Adaptive Logic

stateless by default - stateful by nature

static resource limit - dynamic resource

Identical replicas - Unique instance

Docker run -> compose -> K8s

Agent -> Multiple agents that needs orchestration. Here K8s fits

K8s is universal control plane for VM, DB, Iot edge, docker, WA. Agent will be yet another workload type. 

Arch : Agent Operator

1. Agent controller

2. Schedular

3. CR

LLM will tell Agent Controller what agent to create. 

Agent CR YAML will have Task, model, memory, tools, person 

AI : Crewai, metaflow, airflow, 

CN: Argo, dapar, Numaflow, KServe (came out of Kubeflow)

K8s KEPs


They are not just feature proposal. It is about

- Feature Design

- Different Alternatives

- Testing plan

Feature LCM = Feature, Deprecation, update. Captured in KEP

What is KEP? KEP-0000 K8s Enhancements Proposals

A unit of work: combined features or single feature or major changes 

Capture decisions in well crafted artifacts 

KEP has YAML based metadata. 

/enhancements/keps

Stages: 

1. Alpha: Disable by default. Enable with Feature Gate

2. Beta: API groups disable by default. Feature is enable.

3. GA: All enables

Demo

Git Issue : 114465 Add Sleep for pod life cycle handler 

Let sleep is first class citizen, so no need to add its binary.

KEP is not hard but time taking. We need to take sign in from different people of community.

Feature = Enhancement = KEP

Non-Goals are very important, so distraction is avoided

Proposal section of KEP has technical part. Then read the code

Then "Design Details" may have pseudo code

"Graduation Criteria" for all 3 stages. 

"Upgrade / Downgrade strategy" 

K8s repo has staging repository. 

SCHEO

/hack/update-codegen.sh will generate auto code

We can have KIND image  with our own k8s code using config.yaml for KIND Also enable Alpha feature flag

KubeAPI Server YAML

Look for post-release blogs for user-firendly docs. Mid-release blogs also. 

Beta to GA then remove feature enable check

KEP also captures all PR

ML Interfacing pipeline


NIM is off the self interfacing framwork

NIM is about decide which model to use, based on number of GPU? Which GPU? Performance criteria (throughput v/s latency)? floating point library. NIM can autodetect hardware

GenAI with RAG has many services.

NIM operator to deploy RAG application CR:  1. NIM Cache (PVC) 2. NIM service 3. NIM pipeline all service can increase together. 

NIM monitoring and autoscaling: Prometheus 

1. Utilization of hardware

2. inter token latency

3. first token time generation. 

4. request per second

Monitoring of NIM

2 seconds, 15 chat user etc are input for SLA. 

NIM monitoring operator choose metrics from many metrics exposed by NIM

Autoscaling

In the sample chat application : milvus DB is needed. RAG is frontend service