LLMOps


For AI application, we need automation of 

1. Data preparation
2. model tuning
3. Deployment
4. Maintenance and 
5. Monitoring

  • Managing Dependency adds complexity. 

E2E workflow for LLM based application. 

MLOps framework

1. data ingestion

2. data validation

3. data transformation

4. model

5. model analysis

6. serving model

7. logging. 

LLM System Design

boarder design of E2E app including front end, back end, data engineering etc. 

Chain multiple LLMs together

* Grounding : provides additional information/fact with prompt to LLM. 

* Track History. how it works past. 

LLM App

User input->Preprocessing->grounding->prompt goes to LLM model->LLM Response->Grounding->Post processing + Responsible AI->Final output to user.

Model Customization

1. Data Prep

2. Model Tuning

3. Evaluate

It is iterative process

LLMOps Pipeline (Simplified)

1. Data Preparation and versioning (for training data)

2. Supervised tuning (pipeline) 

3. Artifact = config and workflow : are generated. 

- config = config for workflow

E.g. 

Which data set to use

- Workflow = steps 

4. Pipeline execution

5. deploy LLM 

6. Prompting and predictions

7. Responsible AI

Orchestration = 1 + 2 . Orchestration : What is first, then next step and further next step. sequence of step assurance. 

Automation = 4 + 5

Fine Tuning Data Model using Instructions (Hint)

1. rules

2. step by step

3. procedure

4. example

File formats

1. JSONL: JSON Line. Human readable. For small and medium size dataset. 

2. TFRecord 

3. Parquet for large and complex dataset. 

MLOps Workflow for LLM

1. Apache Airflow

2. KubeFlow

DSL = Domain Specific Language

Decorator 

@dls.component

@dls.pipeline

Next compiler will generate YAML file for pipeline

YAML file has

- components

- deploymentSpec

Pipeline can be run on

- K8s

- Vertex AI pipeline execute pipeline in serverless enviornment

PipelineJob takes inputs

1. Template Path: pipline.yaml

2. Display name

3. Parameters

4. Location: Data center

5. pipeline root: temp file location

Open Source Pipeline

https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v2.0.0

Deployment

Batch and REST

1. Batch. E.g. customer review. Not real time. 

2. REST API e.g. chat. More like teal time library. 

* pprint is library to format 

LLM provides output and 'safetyAttributes'

- blocked

* We can find citation also from output of LLM

===========

vertexAI SDK

https://cloud.google.com/vertex-ai

BigQuery 

https://cloud.google.com/bigquery

sklearn

To decide data 80-20% for training and evaluation. 

Building AI/ML apps in Python with BigQuery DataFrames | Google Cloud Blog

===========

K8s GW API


Examples: 

stio, Kong, Envoy , Gluee , Trafeik, Kong Gateway and many more as per https://gateway-api.sigs.k8s.io/implementations/#gateway-controller-implementation-status


Protocols: gRPC, HTTP/2, and WebSockets


The structure of a Kubernetes Custom Resource Definition (CRD) or manifest file is referred to as an API. This is because it refers to the structure of the API in the Kubernetes control plane


Migration from ingress https://gateway-api.sigs.k8s.io/guides/migrating-from-ingress/#migrating-from-ingress


primary extension points:


1. External references


2. Custom implementations


3. Policies


GW API is not API GW


1. GatewayClass 

- It is at cluster level. so no namespace

- Annotations at GatewayClassfor vendor specific

- It defines controller capabilities

2. Gateway

- Each Gateway defines one or more listeners, which are the ingress points to the cluster

- You can control which services can be connected to this listener (allowedRoutes) by way of their namespace — this defaults to the same namespace as the Gateway 

- Advanced featues like 

-- request mirroring, 

-- direct response injection, 

-- and fine-grained traffic metrics

-- Traffic spilt

- In Istio APIs, a Gateway configures an existing gateway Deployment/Service that has been deployed. In the Gateway APIs, the Gateway resource both configures and deploys a gateway

- one can attach HPA and PodDisuptionBudget to gateway deployment. 

3. HTTP Route: 

- any combinations of hostname, path, header values and query parameters.

- hostname (optional) at HTTP route shall match with hostname at Gateway->Listener->hostname

- A definition of the Gateway to use (in ParentRefs), is referenced by name and namespace

- The backendRefs that defines the service to route the request to for this match

- advanced pattern matching and filtering on arbitrary headers as well as paths.

- In the Istio VirtualService, all protocols are configured within a single resource. In the Gateway APIs, each protocol type has its own resource, such as HTTPRoute and TCPRoute.

- Route and Gateway can be in different namespace

* 4. TLS Route

5. GRPCRoute

* 6. TCPRoute

* not v1, GA

Details: https://gateway-api.sigs.k8s.io/reference/spec/


If you are using a service mesh, it would be highly desirable to use the same API resources to configure both ingress traffic routing and internal traffic, similar to the way Istio uses VirtualService to configure route rules for both. Fortunately, the Kubernetes Gateway API is working to add this support. Although not as mature as the Gateway API for ingress traffic, an effort known as the Gateway API for Mesh Management and Administration (GAMMA) initiative is underway to make this a reality and Istio intends to make Gateway API the default API for all of its traffic management in the future.

https://gateway-api.sigs.k8s.io/mesh/


Gateway controller is for North South traffic. mesh controller is for East West traffic


7. ReferenceGrant: for cross-namespace reference. 


Low Cost Cloud


 


NVIDIA GTC25: Telecom Special Address


LTM Large Teleco Model : SoftBank is pioneer. Here is WhitePaper by GSMA https://www.gsma.com/get-involved/gsma-foundry/gsma_resources/white-paper-large-telecom-models/


Llama Nemotron Reasoning Model. Open source by NVIDIA on HF

https://www.nvidia.com/en-in/ai-data-science/foundation-models/nemotron/

https://arxiv.org/pdf/2505.00949


AI Factory is a specialized, integrated infrastructure designed to manage the entire AI lifecycle, from data ingestion to model training and deployment for real-time inference

AI Grid is a network of small, highly specialized AI communities. The members of AI Grid share their research work within these communities, initiate collaborations and establish fruitful connections for the future. https://lightning.ai/

https://ai-ran.org/


Building Blocks of the NVIDIA AI Aerial Platform: 

1. NVIDIA Aerial CUDA-Accelerated RAN

2. NVIDIA Aerial AI Radio Frameworks

3. NVIDIA Aerial Omniverse Digital Twin

Reference

NVIDIA GTC25: Telecom Special Address

AI Bootcamp for students


 8 Day Live Online Workshop

AI Bootcamp for Students

Make Your Child Future-Ready with AI

by Timesof Inida


https://www.notion.com/product Documentation

https://www.todoist.com/ To Do List

https://gamma.app/ For presentation 

https://openai.com/index/sora/ Cinematic Video 

https://www.midjourney.com/home Art Grade Visuals for story telling

https://ideogram.ai/t/explore Typography to image. Communicate in style

https://lovable.dev/ No code web apps

https://n8n.io/ Workflow automation tools

Few more tools

TachyonGPT accelerate the project planning process, potentially saving weeks of effort. This powerful AI assistant allows you to create a complex backlog structure for your project in very little time. Tachyon GPT gives you the power to improve existing work items or generate new work items based on brief titles or descriptions. https://marketplace.visualstudio.com/items?itemName=Neudesic.TachyonGPT

windserf editor and cascade. Agentic code IDE


Reference: https://economictimes.indiatimes.com/masterclass/ai-for-students

https://www.msn.com/en-in/money/news/chatgpt-to-google-gemini-top-5-ai-tools-to-enhance-productivity-mostly-free/ar-AA1GRlt1