LLMOps
For AI application, we need automation of
1. Data preparation
2. model tuning
3. Deployment
4. Maintenance and
5. Monitoring
- Managing Dependency adds complexity.
E2E workflow for LLM based application.
MLOps framework
1. data ingestion
2. data validation
3. data transformation
4. model
5. model analysis
6. serving model
7. logging.
LLM System Design
boarder design of E2E app including front end, back end, data engineering etc.
Chain multiple LLMs together
* Grounding : provides additional information/fact with prompt to LLM.
* Track History. how it works past.
LLM App
User input->Preprocessing->grounding->prompt goes to LLM model->LLM Response->Grounding->Post processing + Responsible AI->Final output to user.
Model Customization
1. Data Prep
2. Model Tuning
3. Evaluate
It is iterative process
LLMOps Pipeline (Simplified)
1. Data Preparation and versioning (for training data)
2. Supervised tuning (pipeline)
3. Artifact = config and workflow : are generated.
- config = config for workflow
E.g.
Which data set to use
- Workflow = steps
4. Pipeline execution
5. deploy LLM
6. Prompting and predictions
7. Responsible AI
Orchestration = 1 + 2 . Orchestration : What is first, then next step and further next step. sequence of step assurance.
Automation = 4 + 5
Fine Tuning Data Model using Instructions (Hint)
1. rules
2. step by step
3. procedure
4. example
File formats
1. JSONL: JSON Line. Human readable. For small and medium size dataset.
2. TFRecord
3. Parquet for large and complex dataset.
MLOps Workflow for LLM
1. Apache Airflow
2. KubeFlow
DSL = Domain Specific Language
Decorator
@dls.component
@dls.pipeline
Next compiler will generate YAML file for pipeline
YAML file has
- components
- deploymentSpec
Pipeline can be run on
- K8s
- Vertex AI pipeline execute pipeline in serverless enviornment
PipelineJob takes inputs
1. Template Path: pipline.yaml
2. Display name
3. Parameters
4. Location: Data center
5. pipeline root: temp file location
Open Source Pipeline
https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v2.0.0
Deployment
Batch and REST
1. Batch. E.g. customer review. Not real time.
2. REST API e.g. chat. More like teal time library.
* pprint is library to format
LLM provides output and 'safetyAttributes'
- blocked
* We can find citation also from output of LLM
===========
vertexAI SDK
https://cloud.google.com/vertex-ai
BigQuery
https://cloud.google.com/bigquery
sklearn
To decide data 80-20% for training and evaluation.
Building AI/ML apps in Python with BigQuery DataFrames | Google Cloud Blog
===========
K8s GW API
Examples:
stio, Kong, Envoy , Gluee , Trafeik, Kong Gateway and many more as per https://gateway-api.sigs.k8s.io/implementations/#gateway-controller-implementation-status
Protocols: gRPC, HTTP/2, and WebSockets
The structure of a Kubernetes Custom Resource Definition (CRD) or manifest file is referred to as an API. This is because it refers to the structure of the API in the Kubernetes control plane
Migration from ingress https://gateway-api.sigs.k8s.io/guides/migrating-from-ingress/#migrating-from-ingress
primary extension points:
1. External references
2. Custom implementations
3. Policies
GW API is not API GW
1. GatewayClass
- It is at cluster level. so no namespace
- Annotations at GatewayClassfor vendor specific
- It defines controller capabilities
2. Gateway
- Each Gateway defines one or more listeners, which are the ingress points to the cluster
- You can control which services can be connected to this listener (allowedRoutes) by way of their namespace — this defaults to the same namespace as the Gateway
- Advanced featues like
-- request mirroring,
-- direct response injection,
-- and fine-grained traffic metrics
-- Traffic spilt
- In Istio APIs, a Gateway configures an existing gateway Deployment/Service that has been deployed. In the Gateway APIs, the Gateway resource both configures and deploys a gateway
- one can attach HPA and PodDisuptionBudget to gateway deployment.
3. HTTP Route:
- any combinations of hostname, path, header values and query parameters.
- hostname (optional) at HTTP route shall match with hostname at Gateway->Listener->hostname
- A definition of the Gateway to use (in ParentRefs), is referenced by name and namespace
- The backendRefs that defines the service to route the request to for this match
- advanced pattern matching and filtering on arbitrary headers as well as paths.
- In the Istio VirtualService, all protocols are configured within a single resource. In the Gateway APIs, each protocol type has its own resource, such as HTTPRoute and TCPRoute.
- Route and Gateway can be in different namespace
* 4. TLS Route
5. GRPCRoute
* 6. TCPRoute
* not v1, GA
Details: https://gateway-api.sigs.k8s.io/reference/spec/
If you are using a service mesh, it would be highly desirable to use the same API resources to configure both ingress traffic routing and internal traffic, similar to the way Istio uses VirtualService to configure route rules for both. Fortunately, the Kubernetes Gateway API is working to add this support. Although not as mature as the Gateway API for ingress traffic, an effort known as the Gateway API for Mesh Management and Administration (GAMMA) initiative is underway to make this a reality and Istio intends to make Gateway API the default API for all of its traffic management in the future.
https://gateway-api.sigs.k8s.io/mesh/
Gateway controller is for North South traffic. mesh controller is for East West traffic
7. ReferenceGrant: for cross-namespace reference.
Low Cost Cloud
NVIDIA GTC25: Telecom Special Address
LTM Large Teleco Model : SoftBank is pioneer. Here is WhitePaper by GSMA https://www.gsma.com/get-involved/gsma-foundry/gsma_resources/white-paper-large-telecom-models/
Llama Nemotron Reasoning Model. Open source by NVIDIA on HF
https://www.nvidia.com/en-in/ai-data-science/foundation-models/nemotron/
https://arxiv.org/pdf/2505.00949
AI Factory is a specialized, integrated infrastructure designed to manage the entire AI lifecycle, from data ingestion to model training and deployment for real-time inference
AI Grid is a network of small, highly specialized AI communities. The members of AI Grid share their research work within these communities, initiate collaborations and establish fruitful connections for the future. https://lightning.ai/
Building Blocks of the NVIDIA AI Aerial Platform:
1. NVIDIA Aerial CUDA-Accelerated RAN
2. NVIDIA Aerial AI Radio Frameworks
3. NVIDIA Aerial Omniverse Digital Twin
Reference
AI Bootcamp for students
8 Day Live Online Workshop
AI Bootcamp for Students
Make Your Child Future-Ready with AI
by Timesof Inida
https://www.notion.com/product Documentation
https://www.todoist.com/ To Do List
https://gamma.app/ For presentation
https://openai.com/index/sora/ Cinematic Video
https://www.midjourney.com/home Art Grade Visuals for story telling
https://ideogram.ai/t/explore Typography to image. Communicate in style
https://lovable.dev/ No code web apps
https://n8n.io/ Workflow automation tools
Few more tools
TachyonGPT accelerate the project planning process, potentially saving weeks of effort. This powerful AI assistant allows you to create a complex backlog structure for your project in very little time. Tachyon GPT gives you the power to improve existing work items or generate new work items based on brief titles or descriptions. https://marketplace.visualstudio.com/items?itemName=Neudesic.TachyonGPT
windserf editor and cascade. Agentic code IDE
Reference: https://economictimes.indiatimes.com/masterclass/ai-for-students
https://www.msn.com/en-in/money/news/chatgpt-to-google-gemini-top-5-ai-tools-to-enhance-productivity-mostly-free/ar-AA1GRlt1