LLMOps
For AI application, we need automation of
1. Data preparation
2. model tuning
3. Deployment
4. Maintenance and
5. Monitoring
- Managing Dependency adds complexity.
E2E workflow for LLM based application.
MLOps framework
1. data ingestion
2. data validation
3. data transformation
4. model
5. model analysis
6. serving model
7. logging.
LLM System Design
boarder design of E2E app including front end, back end, data engineering etc.
Chain multiple LLMs together
* Grounding : provides additional information/fact with prompt to LLM.
* Track History. how it works past.
LLM App
User input->Preprocessing->grounding->prompt goes to LLM model->LLM Response->Grounding->Post processing + Responsible AI->Final output to user.
Model Customization
1. Data Prep
2. Model Tuning
3. Evaluate
It is iterative process
LLMOps Pipeline (Simplified)
1. Data Preparation and versioning (for training data)
2. Supervised tuning (pipeline)
3. Artifact = config and workflow : are generated.
- config = config for workflow
E.g.
Which data set to use
- Workflow = steps
4. Pipeline execution
5. deploy LLM
6. Prompting and predictions
7. Responsible AI
Orchestration = 1 + 2 . Orchestration : What is first, then next step and further next step. sequence of step assurance.
Automation = 4 + 5
Fine Tuning Data Model using Instructions (Hint)
1. rules
2. step by step
3. procedure
4. example
File formats
1. JSONL: JSON Line. Human readable. For small and medium size dataset.
2. TFRecord
3. Parquet for large and complex dataset.
MLOps Workflow for LLM
1. Apache Airflow
2. KubeFlow
DSL = Domain Specific Language
Decorator
@dls.component
@dls.pipeline
Next compiler will generate YAML file for pipeline
YAML file has
- components
- deploymentSpec
Pipeline can be run on
- K8s
- Vertex AI pipeline execute pipeline in serverless enviornment
PipelineJob takes inputs
1. Template Path: pipline.yaml
2. Display name
3. Parameters
4. Location: Data center
5. pipeline root: temp file location
Open Source Pipeline
https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v2.0.0
Deployment
Batch and REST
1. Batch. E.g. customer review. Not real time.
2. REST API e.g. chat. More like teal time library.
* pprint is library to format
LLM provides output and 'safetyAttributes'
- blocked
* We can find citation also from output of LLM
===========
vertexAI SDK
https://cloud.google.com/vertex-ai
BigQuery
https://cloud.google.com/bigquery
sklearn
To decide data 80-20% for training and evaluation.
Building AI/ML apps in Python with BigQuery DataFrames | Google Cloud Blog
===========
NVIDIA GTC25: Telecom Special Address
LTM Large Teleco Model : SoftBank is pioneer. Here is WhitePaper by GSMA https://www.gsma.com/get-involved/gsma-foundry/gsma_resources/white-paper-large-telecom-models/
Llama Nemotron Reasoning Model. Open source by NVIDIA on HF
https://www.nvidia.com/en-in/ai-data-science/foundation-models/nemotron/
https://arxiv.org/pdf/2505.00949
AI Factory is a specialized, integrated infrastructure designed to manage the entire AI lifecycle, from data ingestion to model training and deployment for real-time inference
AI Grid is a network of small, highly specialized AI communities. The members of AI Grid share their research work within these communities, initiate collaborations and establish fruitful connections for the future. https://lightning.ai/
Building Blocks of the NVIDIA AI Aerial Platform:
1. NVIDIA Aerial CUDA-Accelerated RAN
2. NVIDIA Aerial AI Radio Frameworks
3. NVIDIA Aerial Omniverse Digital Twin
Reference
AI Bootcamp for students
8 Day Live Online Workshop
AI Bootcamp for Students
Make Your Child Future-Ready with AI
by Timesof Inida
https://www.notion.com/product Documentation
https://www.todoist.com/ To Do List
https://gamma.app/ For presentation
https://openai.com/index/sora/ Cinematic Video
https://www.midjourney.com/home Art Grade Visuals for story telling
https://ideogram.ai/t/explore Typography to image. Communicate in style
https://lovable.dev/ No code web apps
https://n8n.io/ Workflow automation tools
Few more tools
TachyonGPT accelerate the project planning process, potentially saving weeks of effort. This powerful AI assistant allows you to create a complex backlog structure for your project in very little time. Tachyon GPT gives you the power to improve existing work items or generate new work items based on brief titles or descriptions. https://marketplace.visualstudio.com/items?itemName=Neudesic.TachyonGPT
windserf editor and cascade. Agentic code IDE
Reference: https://economictimes.indiatimes.com/masterclass/ai-for-students
https://www.msn.com/en-in/money/news/chatgpt-to-google-gemini-top-5-ai-tools-to-enhance-productivity-mostly-free/ar-AA1GRlt1
Regional LLM, SLM, TinyML Language Learning
New Language Learning
Want to learn a new language this summer? Explore these expert-led platforms
Mobile app https://youtu.be/jyffkeM9GB0
Regional LLM
Sarvam AI launches Bulbul-v2, its voice model with support for 11 Indian languages
https://asr.iitm.ac.in/
BharatGen https://bharatgen.tech/ Bharatgen: First Indigenous Language Ai Model Launched In India News In Hindi - Amar Ujala Hindi News Live - Bharatgen:भारत में लॉन्च हुआ पहला स्वदेशी भाषा Ai मॉडल, 22 भाषाओं में करेगा अनुवाद; दूर होंगी संवाद चुनौतियां and Google to collaborate with IIT Bombay’s BharatGen to build indigenous Indic language model
AIKoshahttps://aikosha.indiaai.gov.in/home It looks like huggingface website for India https://aikosha.indiaai.gov.in/home/resources?from= resource-detail having some PDF books https://aikosha.indiaai.gov.in/home/toolkit having a list of popular AI related tools.
Google for Education
URL for AI/GenAI
LLM
Inside The Brain Of An LLM: What Makes AI So Powerful?
Landscape
https://landscape.lfai.foundation/
https://landscape.pytorch.org/
Models for coding
1. Qwen2.5 Coder
2. Granite Code
3. CodeGemma
4. Deep Seek Coder V2
5. StarCoder 2
6. Code llama
7. Codestral
Coding tool Google Opal is a new vibe coding app and here's how you can try it for free
Interviews
Pioneering Innovation in Cloud and AI Transformation Done By Chandrakanth Devarakadra Anantha
Innovation in Machine Learning & Engineering Leadership by Pratik Parekh
Amazing Innovation in Telecom Cloud: The Journey of Jayavelan Jayabalan
Handson
https://www.youtube.com/watch?v=RQFfK7xIL28
AI for Observability
The speaker explains his solution about adding AI for observability. Where observability includes logs, traces and matrices.
Features
It does not embed log message. most sophisticated GenAI also takes maximum 2 millions token. Logs generates it in 2 seconds. So solution need to feed right data to AI. It understands form log, which field shall be feed as initial value and then instruct to feed more data.
It creates visualization dashboard based on question
It has level 0 (manual observability) to level 4 (full observability)
It uses AWS Bedrock to solve privacy issue and compliance.
In future solution : GenAI
- will understand deployment
- will understand changes between deployments and its impact : cost, error increase or decrease.
- can go to Github repo to know changes that happen
- can fix the code
- then write test (UT) so it cannot happen again
So it makes much stable environment. It can make autonomous cluster configuration
At present, the solution has
- ability to analyze exception. Root cause analysis of exception. not 100% accurate all the time. It gives list of actions, that are taken to understand & troubleshoot problem. The solution can auto run RCA for each alert.
As we know GenAI has 3 models
1. generic questions
2. RAG
3. Agent
Yes, the solution will make openAI calls. every openAI call costs money. Now cost is reducing.
Future we may have trend of : BoY RAG
AI Language Model
Final thoughts
The choice between DeepSeek R1, Llama 3.2, and OpenAI o1 depends on specific project requirements:
- Choose DeepSeek R1 for budget-friendly deployments with strong reasoning capabilities.
- Opt for Llama 3.2 if multimodal functionality or edge optimisation is critical.
- Select OpenAI o1 for unparalleled reasoning performance in STEM fields despite its higher cost.
Refernce:
Deepseek R1 vs Llama 3.2 vs ChatGPT o1: Which AI model wins?
DeepSeek-R1, BLOOM and Falcon AI: Exploring lesser-known open source LLMs
GitHub - deepseek-ai/awesome-deepseek-integration
(1) Use DeepSeek-R1 in Microsoft Word Locally. No Monthly Fees. - YouTube
Keynote 2 : Kubecon India 2024
Shopify has very large scale deployment with AI use cases algorithm :
- Vector relations of products.
- Credit Card frauds
- Many GPUs
* GPU utilization v/s developer productivity is trade off.
Challenges
1. Build v/s buy
2. Dev experience : skypilot and rocket ML
==========================
Shadow role in K8s release team is best place to start contributing at K8s
Cato is for AI. This is another good place to start with.
He showed many Indian architectures like Taj Mahal (Agra), Jantar Mantar (Jaipur) and inspire Indian to have largest contributors in the world
==========================
TajMahal also built with diversity.
Conscious and continuous effort for diversity is invisible, important.
Now many meetings started and will start in APAC friendly timezone
Very hard to justify open source contribution to employer.
Contributors shall be move to maintainers.
==========================
2014 Stateless
2017 Stateful
2019 Serverless
2023 AI
Cloud Native AI (CNAI) working group : Streamline the integration of AI with cloud native ecosystem.
Whitepaper CNAI
CN is ideal platform for AI
- Cost efficiency
- Scalability
- Containerization
- Harmony among dev, test, staging and production
- High Availability
- Microservice Architecture
CNAI from 3 perspective
1. K8s:
- DRA Dynamic Resource Allocation. inspired by PV/PVC (1.26, 1.32 beta)
2. ML engineers
- Kubeflow has many projects for different use cases
- Queue for ML batch processing
3. App Developer
- OPEA - Open Platform for Enterprise AI
website: opea.dev
1. Data Prep
2. Embedding *
3. LLM/SLM *
4. Vector DB *
6. Receiver
7. Reranking
* OPEA provides recipes for all options. 20+ GenAI recipes
They are validated at Intel, ARM, AMD architecture
MongoDB / Neo4J Graph Database. no need of Vector DB.
Minio is common data layer
OPEA is available on Azure, AWS
CNAI has its own landscape on CNCF website
WG
- Scheduling
- Security
- Sustainability
AI Playground validate OPEA samples on ARM with free Oracle Credit. CNAI needs people.
==========================
1980 data Spreadsheet
1990 Information DataBase
2000 Knowledge Data Warehouse
2010 Insight Analytics (Hadoop, Spark)
2020 Intelligence AIML
2025+ Action Agents
Analogy
- Agents Apps
- GenAI OS
- LLM Kernel
Characteristics
1. Decision Making
2. Memory
3. Reasoning
4. Action
Analogy
Container - Agent
OCI runtime - LLM
Deterministic Logic - Adaptive Logic
stateless by default - stateful by nature
static resource limit - dynamic resource
Identical replicas - Unique instance
Docker run -> compose -> K8s
Agent -> Multiple agents that needs orchestration. Here K8s fits
K8s is universal control plane for VM, DB, Iot edge, docker, WA. Agent will be yet another workload type.
Arch : Agent Operator
1. Agent controller
2. Schedular
3. CR
LLM will tell Agent Controller what agent to create.
Agent CR YAML will have Task, model, memory, tools, person
AI : Crewai, metaflow, airflow,
CN: Argo, dapar, Numaflow, KServe (came out of Kubeflow)
ML Interfacing pipeline
NIM is off the self interfacing framwork
NIM is about decide which model to use, based on number of GPU? Which GPU? Performance criteria (throughput v/s latency)? floating point library. NIM can autodetect hardware
GenAI with RAG has many services.
NIM operator to deploy RAG application CR: 1. NIM Cache (PVC) 2. NIM service 3. NIM pipeline all service can increase together.
NIM monitoring and autoscaling: Prometheus
1. Utilization of hardware
2. inter token latency
3. first token time generation.
4. request per second
Monitoring of NIM
2 seconds, 15 chat user etc are input for SLA.
NIM monitoring operator choose metrics from many metrics exposed by NIM
Autoscaling
In the sample chat application : milvus DB is needed. RAG is frontend service
GenAI Courses
LLM courses on Neo4j Graph Academy
https://graphacademy.neo4j.com/courses/llm-fundamentals/
https://graphacademy.neo4j.com/courses/llm-chatbot-python/
https://graphacademy.neo4j.com/courses/llm-chatbot-python/
https://www.youtube.com/@neo4j/playlists
Semantic Kernel
https://www.linkedin.com/learning/introducing-semantic-kernel-building-ai-based-apps/what-is-semantic-kernel
https://github.com/microsoft/semantic-kernel/tree/main/python/notebooks
https://github.com/microsoft/globalopenaihack/tree/main/SemanticKernel
Intel
https://learn.activeloop.ai/courses/llms
https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/training/overview.html
Databricks
https://www.databricks.com/resources/learn/training/generative-ai-fundamentals
Nvidia
https://www.nvidia.com/en-in/training/online/ Select tab "Generative AI / LLM"
https://www.nvidia.com/en-us/events/llm-developer-day/
Other relevant websites
https://www.classcentral.com/
https://www.careers360.com/
https://www.mooc-list.com/
OpenAI
https://academy.openai.com/public/content
https://vision.hack2skill.com/event/genaiexchange?tab=academy
Semantic Kernel
Semantic Kernel is an Application Framework (SDK) by Microsoft. It is used to develop many co-pilot software e.g. github co-pilot. Other such frameworks are:
- Semantic Kernel (Microsoft)
- LangChain (Open Source)
- FIXIE (Enterprise Grade)
- Vertex AI (Google cloud)
- griptape
- HumanLoop
- Beam
=========================================================
Semantic Kernel has (1) skill (2) prompts (3) AI Services (4) Connector (5) Kernel (6) Planner (6) Plugins
Skill is a set of functions.
The functions are divided in two types
1. Semantic Functions : User defined functions
2. Native Functions: It has core_skills like TextMemory, ConversationSummary,
FileIO, HTTP, Math, Text, Time, Wait (calendar etc.)
Kerenel class has RunAsync method. It has 1+ SKFunction objects
Prompt has (1) skprompt.txt (2) config.json
AI Services
- ChatCompletion
- Embeddings
- Embeddings.VectorOperations
- ImageGenerations
- TextGenerations
- HuggingFace Interface API
- HuggingFace Local
- Oobabooga
- OpenAI
- OpenAI.Azure
- AzureCognitiveSearch
- Chroma
- DuckDB
- Kusto
- Pinecone
- Postgress
- Qdrant
- Redis
- Sqlite
- Weaviate
- AI Service
- Template Engine
- Logger
- Plugins
- Kernel Config Class
- BasicPlanner: A simplified version of SequentialPlanner that strings together a set of functions.
- ActionPlanner: Creates a plan with a single step.
- SequentialPlanner: Creates a plan with a series of steps that are interconnected with custom generated input and output variables.
- StepwisePlanner: Incrementally performs steps and observes any results before performing the next step.
- MSGraph (C#)
- Document and Data Loading (only word in C#)
- OpenAPI (C#)
- Web Search Engine
- Text Chunkers
- The memory is constructed every time during setup.
https://medium.com/@kcwayne1219/exploring-microsoft-semantic-kernel-a-newbie-developers-journey-902f58091504
- Loading hugging face model
https://stackoverflow.com/questions/77110608/loading-a-huggingface-model-with-microsofts-semantic-kernel-in-c-sharp-vb-net
https://github.com/microsoft/semantic-kernel/blob/3451a4ebbc9db0d049f48804c12791c681a326cb/samples/apps/hugging-face-http-server/inference_app.py
https://github.com/microsoft/semantic-kernel/blob/3451a4ebbc9db0d049f48804c12791c681a326cb/samples/apps/hugging-face-http-server/utils/create_responses.py
- Add Support for running local models using Ollama
Github
https://github.com/microsoft/semantic-kernel/tree/c4ef6ab227fc967ab12291cc862852e66d6d75ae
Documentation
https://github.com/MicrosoftDocs/semantic-kernel-docs/tree/main
Reference
https://devblogs.microsoft.com/semantic-kernel/page/5/
https://build.microsoft.com/en-US/sessions/31e11443-70d3-4020-8c8c-0a654bccd233
OpenAPI Text to Speech
https://github.com/simonw/ospeak
https://simonwillison.net/2023/Nov/7/ospeak/
https://platform.openai.com/docs/guides/text-to-speech
CLI tool for running text through the OpenAI Text to speech API and speaking or saving the result
AI ML Useful YouTube channels
K8sGPT
Default AI backend OpenAI. LocalAI can also used.
Built in analyzers
Enabled by default
- podAnalyzer
- pvcAnalyzer
- rsAnalyzer
- serviceAnalyzer
- eventAnalyzer
- ingressAnalyzer
- statefulSetAnalyzer
- deploymentAnalyzer
- cronJobAnalyzer
- nodeAnalyzer
- mutatingWebhookAnalyzer
- validatingWebhookAnalyzer
Optional
- hpaAnalyzer
- pdbAnalyzer
- networkPolicyAnalyzer
https://itnext.io/k8sgpt-localai-unlock-kubernetes-superpowers-for-free-584790de9b65
https://github.com/k8sgpt-ai/k8sgpt
https://github.com/k8sgpt-ai/k8sgpt-operator
https://www.youtube.com/watch?v=PKrDNuJ_dfE
GenAI Part 2
To understand any Github Repo. Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code
https://app.getonboardai.com/
=========================================================
Langchain
https://github.com/kyrolabs/awesome-langchain
=========================================================
Model file
extension: HF, GPTQ, GGML, BIN and GGUF.
Each model needs 4 files (with example content)
File 1 - The model's GGUF file
File 2 - The model's .yaml file
backend: llama
context_size: 2000
name: lunademo
parameters:
model: luna-ai-llama2-uncensored.Q4_K_M.gguf
temperature: 0.2
top_k: 40
top_p: 0.65
roles:
assistant: 'ASSISTANT:'
system: 'SYSTEM:'
user: 'USER:'
roles:
assistant: 'ASSISTANT:'
system: 'SYSTEM:'
user: 'USER:'
File 3 - The Chat API .tmpl file
{{.Input}}
ASSISTANT:
Complete the following sentence: {{.Input}}
File 4 - The Completion API .tmpl file
=========================================================
Local Deployment of GenAI
1. https://github.com/cocktailpeanut/dalai
2. GPT 4 all https://gpt4all.io/index.html https://github.com/nomic-ai/gpt4all
3. vLLM library https://vllm.readthedocs.io/en/latest/#
4. API store https://gorilla.cs.berkeley.edu/ and https://github.com/ShishirPatil/gorilla
5. https://github.com/imartinez/privateGPT and https://docs.privategpt.dev/ Based on PrivateGPT https://github.com/marella/chatdocs
6. https://github.com/Lightning-AI/lit-gpt based on nanoGPT
7. https://github.com/Vision-CAIR/MiniGPT-4/tree/main and https://minigpt-4.github.io/ https://huggingface.co/spaces/Vision-CAIR/minigpt4 https://www.youtube.com/watch?v=__tftoxpBAw
8. https://collabnix.com/running-ollama-2-on-nvidia-jetson-nano-with-gpu-using-docker/
9. Langchain
https://python.langchain.com/docs/guides/local_llms
10. Casandra DB
https://cassio.org/frameworks/langchain/qa-basic/
https://colab.research.google.com/github/CassioML/cassio-website/blob/main/docs/frameworks/langchain/.colab/colab_qa-basic.ipynb
11. gtr-t5-large model is around 640 MB
https://til.simonwillison.net/python/gtr-t5-large
=========================================================
Local Document search
1. https://github.com/h2oai/h2ogpt
2. https://github.com/imartinez/privateGPT and https://docs.privategpt.dev/ OR https://github.com/SamurAIGPT/EmbedAI
3. https://github.com/PromtEngineer/localGPT
4. LocalAI
https://localai.io/
https://github.com/mudler/LocalAI
https://github.com/go-skynet/helm-charts/tree/main/charts/local-ai and https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes
https://localai.io/basics/build/index.html
integration with Logseq https://github.com/briansunter/logseq-plugin-gpt3-openai
How Tos https://localai.io/howtos/
https://localai.io/howtos/easy-request-openai/
Access
https://localai.io/howtos/easy-model-import-downloaded/
https://localai.io/howtos/easy-request-curl/
All Git Repo https://github.com/lunamidori5
5. https://mudler.pm/posts/localai-question-answering/
6. GPT3 and datasette
https://simonwillison.net/2023/Jan/13/semantic-search-answers/
What GenAI? Part 1
What GenAI?
GenAI is about generating text, images, or other media, in response to prompts. GenAI is using generative models. GenAI is based on Transformer based deep learning model
Modality
- Unimodal (only 1 input)
- Multimodal. E.g. GPT-4 accepts text and image. Wu Dao
================================================
What Transformer?
* A transformer is a deep learning architecture,
* It is designed to understand the context and semantics of language.
* It takes in a sequence of tokens (words, or parts of words) and outputting a corresponding sequence.
* It pays attention to each input token and the relationships between them, using a mechanism known as
1. self-attention or
2. scaled dot-product attention.
* This enables transformer to understand complex linguistic constructs and generate coherent and contextually accurate responses.
* It relies on the parallel multi-head attention mechanism.
* requiring less training time than previous recurrent neural architectures, (e.g. long short-term memory (LSTM) )
Implementation
- TensorFlow
- PyTorch
- JAX Deep Learning
- Transformer library by Hugging Face
Architecture
1. Tokenizer
2. Embedding layer. Token to vector
3. Transformer Layers : alternate attention and feedforward.
4. Optional un-embedding layer
* It uses activation function ReLU, SwiGLU
================================================
What ChatGPT?
- write and debug computer programs
- teleplays
- fairy tales
- student essays
- answer test questions
- generate business ideas
- write poetry
- song lyrics
- translate and summarize text
- emulate a Linux system
- simulate entire chat rooms
- play games like tic-tac-toe
- or simulate an ATM.
What DGM Deep Generative Model ?
A generative model is a statistical model of the joint probability distribution P(X,Y) on given observable variable X and target variable Y.
Simple example
Suppose the input data is , the set of labels for
is
, and there are the following 4 data points:
For the above data, estimating the joint probability distribution from the empirical measure will be the following:
- machine translation
- document summarization
- document generation
- named entity recognition (NER)
- biological sequence analysis
- writing computer code based on requirements expressed in natural language.
- video understanding.
- syntactic parsing
- sentiment analysis