Showing posts with label GenAI. Show all posts
Showing posts with label GenAI. Show all posts

LLMOps


For AI application, we need automation of 

1. Data preparation
2. model tuning
3. Deployment
4. Maintenance and 
5. Monitoring

  • Managing Dependency adds complexity. 

E2E workflow for LLM based application. 

MLOps framework

1. data ingestion

2. data validation

3. data transformation

4. model

5. model analysis

6. serving model

7. logging. 

LLM System Design

boarder design of E2E app including front end, back end, data engineering etc. 

Chain multiple LLMs together

* Grounding : provides additional information/fact with prompt to LLM. 

* Track History. how it works past. 

LLM App

User input->Preprocessing->grounding->prompt goes to LLM model->LLM Response->Grounding->Post processing + Responsible AI->Final output to user.

Model Customization

1. Data Prep

2. Model Tuning

3. Evaluate

It is iterative process

LLMOps Pipeline (Simplified)

1. Data Preparation and versioning (for training data)

2. Supervised tuning (pipeline) 

3. Artifact = config and workflow : are generated. 

- config = config for workflow

E.g. 

Which data set to use

- Workflow = steps 

4. Pipeline execution

5. deploy LLM 

6. Prompting and predictions

7. Responsible AI

Orchestration = 1 + 2 . Orchestration : What is first, then next step and further next step. sequence of step assurance. 

Automation = 4 + 5

Fine Tuning Data Model using Instructions (Hint)

1. rules

2. step by step

3. procedure

4. example

File formats

1. JSONL: JSON Line. Human readable. For small and medium size dataset. 

2. TFRecord 

3. Parquet for large and complex dataset. 

MLOps Workflow for LLM

1. Apache Airflow

2. KubeFlow

DSL = Domain Specific Language

Decorator 

@dls.component

@dls.pipeline

Next compiler will generate YAML file for pipeline

YAML file has

- components

- deploymentSpec

Pipeline can be run on

- K8s

- Vertex AI pipeline execute pipeline in serverless enviornment

PipelineJob takes inputs

1. Template Path: pipline.yaml

2. Display name

3. Parameters

4. Location: Data center

5. pipeline root: temp file location

Open Source Pipeline

https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v2.0.0

Deployment

Batch and REST

1. Batch. E.g. customer review. Not real time. 

2. REST API e.g. chat. More like teal time library. 

* pprint is library to format 

LLM provides output and 'safetyAttributes'

- blocked

* We can find citation also from output of LLM

===========

vertexAI SDK

https://cloud.google.com/vertex-ai

BigQuery 

https://cloud.google.com/bigquery

sklearn

To decide data 80-20% for training and evaluation. 

Building AI/ML apps in Python with BigQuery DataFrames | Google Cloud Blog

===========

NVIDIA GTC25: Telecom Special Address


LTM Large Teleco Model : SoftBank is pioneer. Here is WhitePaper by GSMA https://www.gsma.com/get-involved/gsma-foundry/gsma_resources/white-paper-large-telecom-models/


Llama Nemotron Reasoning Model. Open source by NVIDIA on HF

https://www.nvidia.com/en-in/ai-data-science/foundation-models/nemotron/

https://arxiv.org/pdf/2505.00949


AI Factory is a specialized, integrated infrastructure designed to manage the entire AI lifecycle, from data ingestion to model training and deployment for real-time inference

AI Grid is a network of small, highly specialized AI communities. The members of AI Grid share their research work within these communities, initiate collaborations and establish fruitful connections for the future. https://lightning.ai/

https://ai-ran.org/


Building Blocks of the NVIDIA AI Aerial Platform: 

1. NVIDIA Aerial CUDA-Accelerated RAN

2. NVIDIA Aerial AI Radio Frameworks

3. NVIDIA Aerial Omniverse Digital Twin

Reference

NVIDIA GTC25: Telecom Special Address

AI Bootcamp for students


 8 Day Live Online Workshop

AI Bootcamp for Students

Make Your Child Future-Ready with AI

by Timesof Inida


https://www.notion.com/product Documentation

https://www.todoist.com/ To Do List

https://gamma.app/ For presentation 

https://openai.com/index/sora/ Cinematic Video 

https://www.midjourney.com/home Art Grade Visuals for story telling

https://ideogram.ai/t/explore Typography to image. Communicate in style

https://lovable.dev/ No code web apps

https://n8n.io/ Workflow automation tools

Few more tools

TachyonGPT accelerate the project planning process, potentially saving weeks of effort. This powerful AI assistant allows you to create a complex backlog structure for your project in very little time. Tachyon GPT gives you the power to improve existing work items or generate new work items based on brief titles or descriptions. https://marketplace.visualstudio.com/items?itemName=Neudesic.TachyonGPT

windserf editor and cascade. Agentic code IDE


Reference: https://economictimes.indiatimes.com/masterclass/ai-for-students

https://www.msn.com/en-in/money/news/chatgpt-to-google-gemini-top-5-ai-tools-to-enhance-productivity-mostly-free/ar-AA1GRlt1


Regional LLM, SLM, TinyML Language Learning


New Language Learning

Want to learn a new language this summer? Explore these expert-led platforms

Mobile app https://youtu.be/jyffkeM9GB0

Regional LLM

Sarvam AI launches Bulbul-v2, its voice model with support for 11 Indian languages

https://asr.iitm.ac.in/models/ 

BharatGen  https://bharatgen.tech/  Bharatgen: First Indigenous Language Ai Model Launched In India News In Hindi - Amar Ujala Hindi News Live - Bharatgen:भारत में लॉन्च हुआ पहला स्वदेशी भाषा Ai मॉडल, 22 भाषाओं में करेगा अनुवाद; दूर होंगी संवाद चुनौतियां and Google to collaborate with IIT Bombay’s BharatGen to build indigenous Indic language model

AIKosha

https://aikosha.indiaai.gov.in/home It looks like huggingface website for India


https://aikosha.indiaai.gov.in/home/toolkit having a list of popular AI related tools.

Kannada Models

nomic-embed-text-v2-moe 
snowflake-arctic-embed2
OpenAIs text-embedding-3-large
Vovage
Cohere
intfloat/multilingual-e5-large-instruct
paraphrase-multilingual
BGE-M3 is based on the XLM-RoBERTa

URL for AI/GenAI


LLM

Inside The Brain Of An LLM: What Makes AI So Powerful?

Landscape

https://landscape.lfai.foundation/

https://landscape.pytorch.org/

Models for coding

1. Qwen2.5 Coder

2. Granite Code

3. CodeGemma

4. Deep Seek Coder V2

5. StarCoder 2

6. Code llama

7. Codestral 

Coding tool Google Opal is a new vibe coding app and here's how you can try it for free

Interviews

Pioneering Innovation in Cloud and AI Transformation Done By Chandrakanth Devarakadra Anantha

Innovation in Machine Learning & Engineering Leadership by Pratik Parekh

Amazing Innovation in Telecom Cloud: The Journey of Jayavelan Jayabalan

Handson

https://www.youtube.com/watch?v=RQFfK7xIL28

AI for Observability


The speaker explains his solution about adding AI for observability. Where observability includes logs, traces and matrices. 

Features

It does not embed log message. most sophisticated GenAI also takes maximum 2 millions token. Logs generates it in 2 seconds. So solution need to feed right data to AI. It understands form log, which field shall be feed as initial value and then instruct to feed more data. 

It creates visualization dashboard based on question

It has level 0 (manual observability) to level 4 (full observability)

It uses AWS Bedrock to solve privacy issue and compliance. 

In future solution : GenAI 

- will understand deployment

- will understand changes between deployments and its impact : cost, error increase or decrease. 

- can go to Github repo to know changes that happen

- can fix the code

- then write test (UT) so it cannot happen again

So it makes much stable environment. It can make autonomous cluster configuration

At present, the solution has

- ability to analyze exception. Root cause analysis of exception. not 100% accurate all the time. It gives list of actions, that are taken to understand & troubleshoot problem. The solution can auto run RCA for each alert. 

As we know GenAI has 3 models

1. generic questions

2. RAG

3. Agent

Yes, the solution will make openAI calls. every openAI call costs money. Now cost is reducing. 

Future we may have trend of : BoY RAG

Ref: https://www.youtube.com/watch?v=IIz8Xpyebug

AI Language Model


 



Final thoughts

The choice between DeepSeek R1, Llama 3.2, and OpenAI o1 depends on specific project requirements:


  • Choose DeepSeek R1 for budget-friendly deployments with strong reasoning capabilities.
  • Opt for Llama 3.2 if multimodal functionality or edge optimisation is critical.
  • Select OpenAI o1 for unparalleled reasoning performance in STEM fields despite its higher cost.

Refernce: 

https://www.msn.com/en-in/money/technology/here-are-the-best-ai-language-models-you-can-use-right-now/ar-AA1xVZ7k

Deepseek R1 vs Llama 3.2 vs ChatGPT o1: Which AI model wins?

DeepSeek-R1, BLOOM and Falcon AI: Exploring lesser-known open source LLMs

GitHub - deepseek-ai/awesome-deepseek-integration

(1) Use DeepSeek-R1 in Microsoft Word Locally. No Monthly Fees. - YouTube

Keynote 2 : Kubecon India 2024


Shopify has very large scale deployment with AI use cases algorithm : 

- Vector relations of products. 

- Credit Card frauds 

- Many GPUs

* GPU utilization v/s developer productivity is trade off. 

Challenges

1. Build v/s buy 

2. Dev experience : skypilot and rocket ML

==========================

Shadow role in K8s release team is best place to start contributing at K8s

Cato is for AI. This is another good place to start with. 

He showed many Indian architectures like Taj Mahal (Agra), Jantar Mantar  (Jaipur) and inspire Indian to have largest contributors in the world

==========================

TajMahal also built with diversity. 

Conscious and continuous effort for diversity is invisible, important. 

Now many meetings started and will start in APAC friendly timezone

Very hard to justify open source contribution to employer.

Contributors shall be move to maintainers.

==========================

2014 Stateless

2017 Stateful

2019 Serverless

2023 AI

Cloud Native AI (CNAI) working group : Streamline the integration of AI with cloud native ecosystem. 

Whitepaper CNAI

CN is ideal platform for AI

- Cost efficiency

- Scalability

- Containerization

- Harmony among dev, test, staging and production

- High Availability

- Microservice Architecture

CNAI from 3 perspective

1. K8s: 

- DRA Dynamic Resource Allocation. inspired by PV/PVC (1.26, 1.32 beta)

2. ML engineers

- Kubeflow has many projects for different use cases

- Queue for ML batch processing

3. App Developer

- OPEA - Open Platform for Enterprise AI

website: opea.dev

1. Data Prep

2. Embedding *

3. LLM/SLM *

4. Vector DB *

6. Receiver

7. Reranking

* OPEA provides recipes for all options. 20+ GenAI recipes 

They are validated at Intel, ARM, AMD architecture

MongoDB / Neo4J Graph Database. no need of Vector DB.

Minio is common data layer

OPEA is available on Azure, AWS

CNAI has its own landscape on CNCF website

WG

- Scheduling

- Security

- Sustainability

AI Playground validate OPEA samples on ARM with free Oracle Credit. CNAI needs people. 

==========================

1980 data Spreadsheet

1990 Information DataBase

2000 Knowledge Data Warehouse

2010 Insight Analytics (Hadoop, Spark)

2020 Intelligence AIML

2025+ Action Agents

Analogy

- Agents Apps

- GenAI OS

- LLM Kernel

Characteristics

1. Decision Making

2. Memory

3. Reasoning

4. Action

Analogy

Container - Agent

OCI runtime - LLM

Deterministic Logic - Adaptive Logic

stateless by default - stateful by nature

static resource limit - dynamic resource

Identical replicas - Unique instance

Docker run -> compose -> K8s

Agent -> Multiple agents that needs orchestration. Here K8s fits

K8s is universal control plane for VM, DB, Iot edge, docker, WA. Agent will be yet another workload type. 

Arch : Agent Operator

1. Agent controller

2. Schedular

3. CR

LLM will tell Agent Controller what agent to create. 

Agent CR YAML will have Task, model, memory, tools, person 

AI : Crewai, metaflow, airflow, 

CN: Argo, dapar, Numaflow, KServe (came out of Kubeflow)

ML Interfacing pipeline


NIM is off the self interfacing framwork

NIM is about decide which model to use, based on number of GPU? Which GPU? Performance criteria (throughput v/s latency)? floating point library. NIM can autodetect hardware

GenAI with RAG has many services.

NIM operator to deploy RAG application CR:  1. NIM Cache (PVC) 2. NIM service 3. NIM pipeline all service can increase together. 

NIM monitoring and autoscaling: Prometheus 

1. Utilization of hardware

2. inter token latency

3. first token time generation. 

4. request per second

Monitoring of NIM

2 seconds, 15 chat user etc are input for SLA. 

NIM monitoring operator choose metrics from many metrics exposed by NIM

Autoscaling

In the sample chat application : milvus DB is needed. RAG is frontend service

GenAI Courses


LLM courses on Neo4j Graph Academy

https://graphacademy.neo4j.com/courses/llm-fundamentals/

https://graphacademy.neo4j.com/courses/llm-chatbot-python/

https://graphacademy.neo4j.com/courses/llm-chatbot-python/

https://www.youtube.com/@neo4j/playlists

Semantic Kernel

https://www.linkedin.com/learning/introducing-semantic-kernel-building-ai-based-apps/what-is-semantic-kernel

https://github.com/microsoft/semantic-kernel/tree/main/python/notebooks

https://github.com/microsoft/globalopenaihack/tree/main/SemanticKernel

Intel

https://learn.activeloop.ai/courses/llms

https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/training/overview.html

Databricks

https://www.databricks.com/resources/learn/training/generative-ai-fundamentals

Nvidia

https://www.nvidia.com/en-in/training/online/ Select tab "Generative AI / LLM"

https://www.nvidia.com/en-us/events/llm-developer-day/

Other relevant websites

https://www.classcentral.com/

https://www.careers360.com/

https://www.mooc-list.com/

OpenAI

https://academy.openai.com/public/content

Google

https://vision.hack2skill.com/event/genaiexchange?tab=academy

Semantic Kernel


Semantic Kernel is an Application Framework (SDK) by Microsoft. It is used to develop many co-pilot software e.g. github co-pilot. Other such frameworks are:

- Semantic Kernel (Microsoft)

- LangChain (Open Source) 

- FIXIE (Enterprise Grade)

- Vertex AI (Google cloud)

- griptape

- HumanLoop

- Beam

=========================================================

Semantic Kernel has (1) skill (2) prompts (3) AI Services (4) Connector (5) Kernel (6) Planner (6) Plugins

Skill is a set of functions. 

The functions are divided in two types

1. Semantic Functions : User defined functions

2. Native Functions: It has core_skills like TextMemory, ConversationSummary, 

FileIO, HTTP, Math, Text, Time, Wait (calendar etc.)

Kerenel class has RunAsync method. It has 1+ SKFunction objects

Prompt has (1) skprompt.txt (2) config.json

AI Services 

  • ChatCompletion
  • Embeddings
  • Embeddings.VectorOperations
  • ImageGenerations
  • TextGenerations

Connector has (1) AI Service Endpoint and (2) Memory

AI Service Endpoint
  • HuggingFace Interface API
  • HuggingFace Local
  • Oobabooga
  • OpenAI
  • OpenAI.Azure
Memory
  • AzureCognitiveSearch
  • Chroma
  • DuckDB
  • Kusto
  • Pinecone
  • Postgress
  • Qdrant
  • Redis
  • Sqlite
  • Weaviate
Kerenel, Planner and Plugins are part of Kernel Orchestration 

Kernel

Kernel decides model, memory (database) and planner. 

We can configure kernel using Configruing Kerel. It has following Runtime Properties
  • AI Service
  • Template Engine
  • Logger
  • Plugins
  • Kernel Config Class
Planner is about 'Plan Object Model' It has following types
  • BasicPlanner: A simplified version of SequentialPlanner that strings together a set of functions.
  • ActionPlanner: Creates a plan with a single step.
  • SequentialPlanner: Creates a plan with a series of steps that are interconnected with custom generated input and output variables.
  • StepwisePlanner: Incrementally performs steps and observes any results before performing the next step.
Plugins are for following usecases
  • MSGraph (C#)
  • Document and Data Loading (only word in C#)
  • OpenAPI (C#)
  • Web Search Engine
  • Text Chunkers
=========================================================

Issues: 

  • The memory is constructed every time during setup. 

https://medium.com/@kcwayne1219/exploring-microsoft-semantic-kernel-a-newbie-developers-journey-902f58091504

  • Loading hugging face model

https://stackoverflow.com/questions/77110608/loading-a-huggingface-model-with-microsofts-semantic-kernel-in-c-sharp-vb-net

https://github.com/microsoft/semantic-kernel/blob/3451a4ebbc9db0d049f48804c12791c681a326cb/samples/apps/hugging-face-http-server/inference_app.py

https://github.com/microsoft/semantic-kernel/blob/3451a4ebbc9db0d049f48804c12791c681a326cb/samples/apps/hugging-face-http-server/utils/create_responses.py

  • Add Support for running local models using Ollama
https://github.com/microsoft/semantic-kernel/actions/runs/6527865286/workflow

Github

https://github.com/microsoft/semantic-kernel/tree/c4ef6ab227fc967ab12291cc862852e66d6d75ae

Documentation

https://github.com/MicrosoftDocs/semantic-kernel-docs/tree/main

Reference 

https://devblogs.microsoft.com/semantic-kernel/page/5/

https://build.microsoft.com/en-US/sessions/31e11443-70d3-4020-8c8c-0a654bccd233

OpenAPI Text to Speech


 https://github.com/simonw/ospeak

https://simonwillison.net/2023/Nov/7/ospeak/

https://platform.openai.com/docs/guides/text-to-speech

CLI tool for running text through the OpenAI Text to speech API and speaking or saving the result

AI ML Useful YouTube channels


 AI ML Useful YouTube channels


https://www.youtube.com/@codebasics

https://www.youtube.com/@statquest


K8sGPT


Default AI backend OpenAI. LocalAI can also used. 

Built in analyzers

Enabled by default

  •  podAnalyzer
  •  pvcAnalyzer
  •  rsAnalyzer
  •  serviceAnalyzer
  •  eventAnalyzer
  •  ingressAnalyzer
  •  statefulSetAnalyzer
  •  deploymentAnalyzer
  •  cronJobAnalyzer
  •  nodeAnalyzer
  •  mutatingWebhookAnalyzer
  •  validatingWebhookAnalyzer

Optional

  •  hpaAnalyzer
  •  pdbAnalyzer
  •  networkPolicyAnalyzer


https://itnext.io/k8sgpt-localai-unlock-kubernetes-superpowers-for-free-584790de9b65

https://github.com/k8sgpt-ai/k8sgpt

https://github.com/k8sgpt-ai/k8sgpt-operator

https://www.youtube.com/watch?v=PKrDNuJ_dfE

GenAI Part 2


To understand any Github RepoLearn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code

https://app.getonboardai.com/

=========================================================

Langchain

https://github.com/kyrolabs/awesome-langchain

=========================================================

Model file 

extension: HF, GPTQ, GGML, BIN and GGUF.

Each model needs 4 files (with example content)

File 1 - The model's GGUF file

File 2 - The model's .yaml file

backend: llama

context_size: 2000

name: lunademo

parameters:

  model: luna-ai-llama2-uncensored.Q4_K_M.gguf

  temperature: 0.2

  top_k: 40

  top_p: 0.65

roles:

  assistant: 'ASSISTANT:'

  system: 'SYSTEM:'

  user: 'USER:'

roles:

  assistant: 'ASSISTANT:'

  system: 'SYSTEM:'

  user: 'USER:'

File 3 - The Chat API .tmpl file

{{.Input}}

ASSISTANT:

Complete the following sentence: {{.Input}}

File 4 - The Completion API .tmpl file

=========================================================

Local Deployment of GenAI

1. https://github.com/cocktailpeanut/dalai

2. GPT 4 all https://gpt4all.io/index.html https://github.com/nomic-ai/gpt4all

3. vLLM library https://vllm.readthedocs.io/en/latest/#

4. API store https://gorilla.cs.berkeley.edu/ and https://github.com/ShishirPatil/gorilla

5. https://github.com/imartinez/privateGPT and https://docs.privategpt.dev/ Based on PrivateGPT https://github.com/marella/chatdocs

6. https://github.com/Lightning-AI/lit-gpt based on nanoGPT

7. https://github.com/Vision-CAIR/MiniGPT-4/tree/main and https://minigpt-4.github.io/ https://huggingface.co/spaces/Vision-CAIR/minigpt4  https://www.youtube.com/watch?v=__tftoxpBAw

8. https://collabnix.com/running-ollama-2-on-nvidia-jetson-nano-with-gpu-using-docker/

9. Langchain

https://python.langchain.com/docs/guides/local_llms

10. Casandra DB

https://cassio.org/frameworks/langchain/qa-basic/

https://colab.research.google.com/github/CassioML/cassio-website/blob/main/docs/frameworks/langchain/.colab/colab_qa-basic.ipynb

11. gtr-t5-large model is around 640 MB

https://til.simonwillison.net/python/gtr-t5-large

=========================================================

Local Document search

1. https://github.com/h2oai/h2ogpt

2. https://github.com/imartinez/privateGPT and https://docs.privategpt.dev/  OR https://github.com/SamurAIGPT/EmbedAI

3. https://github.com/PromtEngineer/localGPT

4. LocalAI

https://localai.io/

https://github.com/mudler/LocalAI

https://github.com/go-skynet/helm-charts/tree/main/charts/local-ai and https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes

https://localai.io/basics/build/index.html

integration with Logseq https://github.com/briansunter/logseq-plugin-gpt3-openai

How Tos https://localai.io/howtos/

https://localai.io/howtos/easy-request-openai/

Access

https://localai.io/howtos/easy-model-import-downloaded/

https://localai.io/howtos/easy-request-curl/

All Git Repo https://github.com/lunamidori5

5. https://mudler.pm/posts/localai-question-answering/

6. GPT3 and datasette

https://simonwillison.net/2023/Jan/13/semantic-search-answers/

What GenAI? Part 1


What GenAI?

GenAI is about generating text, images, or other media, in response to prompts. GenAI is using generative models. GenAI is based on Transformer based deep learning model

Modality

- Unimodal (only 1 input)

- Multimodal. E.g. GPT-4 accepts text and image. Wu Dao

================================================

What Transformer?

* A transformer is a deep learning architecture, 

* It is designed to understand the context and semantics of language.

* It takes in a sequence of tokens (words, or parts of words) and outputting a corresponding sequence. 

* It pays attention to each input token and the relationships between them, using a mechanism known as 

1. self-attention or 

2. scaled dot-product attention. 

* This enables transformer to understand complex linguistic constructs and generate coherent and contextually accurate responses.

* It relies on the parallel multi-head attention mechanism.

* requiring less training time than previous recurrent neural architectures, (e.g. long short-term memory (LSTM) )

Implementation

- TensorFlow

- PyTorch

- JAX Deep Learning

- Transformer library by Hugging Face

Architecture

1. Tokenizer

2. Embedding layer. Token to vector

3. Transformer Layers : alternate attention and feedforward. 

4. Optional un-embedding layer

* It uses activation function ReLU, SwiGLU

================================================

What ChatGPT?


It is LLM chatbots. It enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. 

ChatGPT is based on GPT 3.5 and ChatGPT Plus is based on GTP 4

Other examples: Bing Chat (based on GTP 4), Bard, LLaMA, Ernie Bot

Usage: 
  • write and debug computer programs
  • teleplays
  • fairy tales
  • student essays
  • answer test questions 
  • generate business ideas
  • write poetry 
  • song lyrics
  • translate and summarize text
  • emulate a Linux system
  • simulate entire chat rooms
  • play games like tic-tac-toe
  • or simulate an ATM.
================================================
What GPT?

generative pre-trained transformers
Based on uni-directional ("autoregressive") transformers
It is a GenAI model. It combines two forms of training

1. Pre-Training: General purpose, using vast quantities of data
2. Fine-Tuning: Supervised ML tasks on small specific data. 

================================================
What OpenAI?

an organization behind GPT and other products
================================================
What LLM?

Language Model (LM) It is ML approach to model, probability distribution over a sequence of words. It predicts probability for next word in a sequence. 

LLM is large ANN with billions of parameters. It is trained on large quantities of data using self-supervised / semi-supervised approaches. LLM is GenAI for language / text. 

Examples: GPT-2, GPT-3, GPT-4, GPT-J, Claude, BERT, XLNet, RoBERTa, BLOOM (BigScience Large Open-science Open-access Multilingual Language Model), LaMDA, LLaMA, Stable Diffusion, PaLM, FLAT T-5, Llama, gpt4all, Llama 2, Code Llama, Mistral

Smaller models
- LLaMA-7B (Raspberry Pi 4)
- one version of Stable Diffusion on iPhone 11
- Llama-2

Deploy

OctoML allows to host model on server and edge devices (even on browser)
================================================
What GenAI Stack?

1. Data Extraction and loading (airbyte and llamahub) 
2. Embeddings (Word2Vec, GloVe, and FastText) 
3. Vector DB
4. Prompt Engine
5. Retrieval
6. Memory
7. Model

================================================

What DGM Deep Generative Model ?

A generative model is a statistical model of the joint probability distribution P(X,Y) on given observable variable X and target variable Y.

Simple example

Suppose the input data is , the set of labels for  is , and there are the following 4 data points: 

For the above data, estimating the joint probability distribution  from the empirical measure will be the following:



Types

1. variational autoencoders (VAEs)
2. generative adversarial networks (GANs)
3. auto-regressive (uni-directional) models

Examples
For text
1. GPT2
2. GPT3
3. Bidirectional Encoder Representations from Transformers (BERT)
For image
1. BigGAN
2. VQ-VAE
        3. DALL-E
For Music
1. Jukebox
        2. MuseNet
        3. MusicLM
        4. MusicGen
For Text to Video
        1. RunwayML
        2. Make-A-Video by Meta Platforms
For Programming
        1. GitHub Copilot
For text to image
        1. Midjourney
    

This architecture has also led to the development of pre-trained systems, such as generative pre-trained transformers (GPTs) and BERT[12] (Bidirectional Encoder Representations from Transformers).

================================================

What Vector DB ?

Word embedding : encode each word from training set as vector. It is a representation of a word. It encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning. It is useful for syntactic parsing and sentiment analysis.

The OpenAI word embedding model lets you take any string of text (up to a ~8,000 word length limit) and turn that into a vector with 1536 dimension. So word has 1,536 floating point numbers as attributes. These floating point numbers are derived from a sophisticated language model. They take a vast amount of knowledge of human language and flatten that down to a list of floating point numbers. 4 bytes per floating point number that’s 4*1,536 = 6,144 bytes per word embedding—6KiB.

the whole vocabulary is vector DB. It is useful for sequence prediction.

Example: Pinecone, Weavite, chromadb, drant, activeloop, pgvector, zilliz, redis, momento, Neo4j, Casandara (CaasIO library)

================================================

What NLP Tasks?
  • machine translation
  • document summarization
  • document generation
  • named entity recognition (NER)
  • biological sequence analysis
  • writing computer code based on requirements expressed in natural language.
  • video understanding.
  • syntactic parsing
  • sentiment analysis
================================================

What Prompt Engineering?

process of structuring text that can be interpreted and understood by a generative AI model. It is enabled by in-context learning, defined as a model's ability to temporarily learn from prompts.

Example: PromptLayer, Aim, scale, Humanloop, HoneyHive

LangChain and LlamaIndex are useful. 

================================================

What RAG?

retrieval augmented generation

It can finetune the models 
We can feed an initial prompt with additional data from live database. 
It enables to personalize or finetune an answer on the fly.

Retrieval Augmented Generation (RAG) is a method for improving the performance of large language models (LLMs) by providing them with access to external knowledge sources. This is done by first retrieving a set of relevant documents from the knowledge source, and then using those documents to generate a response.


Two sides of RAG

1. Semantic Search
2. Cypher Generation 

Examples

1. ChatGPT plugin
2. Google Search
3. Vector DB
4. Knowledge Graph (graph DB Neo4J)
5. LlmaIndex (GPT Index) is also an library of LangChain
https://gpt-index.readthedocs.io/en/latest/index.html

It has
5.A. Data connectors API, PDF, SQL etc.
5.B. Data Index : Structure data as intermediate representation
5.C. Engine : Natural language access to data e.g. Chat Engine , Query Engine
5.D. Data Agents : LLM powered knowledge worker
5.E. Application Integrations : tie LlamaIndex back into the rest of your ecosystem. This could be LangChain, Flask, Docker, ChatGPT, etc.

RAG and Prompt Engineering are two of the techniques to eliminate Hallucination 

================================================

What Hallucination ?

GenAI generates output that looks very authenticate but actually it is false, untrue, incorrect. Like Fake Video. However Fake Video is not result of hallucination. 

Why Hallucination?

1. The input parameter temperature is high. So chances of output will go off track is higher. 

2. Missing Information. E.g. all LLM models have some cut off date. There is no information about event after that cutoff date. 

3. Bias training data and complex models. 

How to avoid hallucination ?

1. Prompt Engineering
2. In-context learning
3. Fine tuning of actual model
4. Grounding using RAG. RAG / grounding is available since May 2020