What GenAI? Part 1


What GenAI?

GenAI is about generating text, images, or other media, in response to prompts. GenAI is using generative models. GenAI is based on Transformer based deep learning model

Modality

- Unimodal (only 1 input)

- Multimodal. E.g. GPT-4 accepts text and image. Wu Dao

================================================

What Transformer?

* A transformer is a deep learning architecture, 

* It is designed to understand the context and semantics of language.

* It takes in a sequence of tokens (words, or parts of words) and outputting a corresponding sequence. 

* It pays attention to each input token and the relationships between them, using a mechanism known as 

1. self-attention or 

2. scaled dot-product attention. 

* This enables transformer to understand complex linguistic constructs and generate coherent and contextually accurate responses.

* It relies on the parallel multi-head attention mechanism.

* requiring less training time than previous recurrent neural architectures, (e.g. long short-term memory (LSTM) )

Implementation

- TensorFlow

- PyTorch

- JAX Deep Learning

- Transformer library by Hugging Face

Architecture

1. Tokenizer

2. Embedding layer. Token to vector

3. Transformer Layers : alternate attention and feedforward. 

4. Optional un-embedding layer

* It uses activation function ReLU, SwiGLU

================================================

What ChatGPT?


It is LLM chatbots. It enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. 

ChatGPT is based on GPT 3.5 and ChatGPT Plus is based on GTP 4

Other examples: Bing Chat (based on GTP 4), Bard, LLaMA, Ernie Bot

Usage: 
  • write and debug computer programs
  • teleplays
  • fairy tales
  • student essays
  • answer test questions 
  • generate business ideas
  • write poetry 
  • song lyrics
  • translate and summarize text
  • emulate a Linux system
  • simulate entire chat rooms
  • play games like tic-tac-toe
  • or simulate an ATM.
================================================
What GPT?

generative pre-trained transformers
Based on uni-directional ("autoregressive") transformers
It is a GenAI model. It combines two forms of training

1. Pre-Training: General purpose, using vast quantities of data
2. Fine-Tuning: Supervised ML tasks on small specific data. 

================================================
What OpenAI?

an organization behind GPT and other products
================================================
What LLM?

Language Model (LM) It is ML approach to model, probability distribution over a sequence of words. It predicts probability for next word in a sequence. 

LLM is large ANN with billions of parameters. It is trained on large quantities of data using self-supervised / semi-supervised approaches. LLM is GenAI for language / text. 

Examples: GPT-2, GPT-3, GPT-4, GPT-J, Claude, BERT, XLNet, RoBERTa, BLOOM (BigScience Large Open-science Open-access Multilingual Language Model), LaMDA, LLaMA, Stable Diffusion, PaLM, FLAT T-5, Llama, gpt4all, Llama 2, Code Llama, Mistral

Smaller models
- LLaMA-7B (Raspberry Pi 4)
- one version of Stable Diffusion on iPhone 11
- Llama-2

Deploy

OctoML allows to host model on server and edge devices (even on browser)
================================================
What GenAI Stack?

1. Data Extraction and loading (airbyte and llamahub) 
2. Embeddings (Word2Vec, GloVe, and FastText) 
3. Vector DB
4. Prompt Engine
5. Retrieval
6. Memory
7. Model

================================================

What DGM Deep Generative Model ?

A generative model is a statistical model of the joint probability distribution P(X,Y) on given observable variable X and target variable Y.

Simple example

Suppose the input data is , the set of labels for  is , and there are the following 4 data points: 

For the above data, estimating the joint probability distribution  from the empirical measure will be the following:



Types

1. variational autoencoders (VAEs)
2. generative adversarial networks (GANs)
3. auto-regressive (uni-directional) models

Examples
For text
1. GPT2
2. GPT3
3. Bidirectional Encoder Representations from Transformers (BERT)
For image
1. BigGAN
2. VQ-VAE
        3. DALL-E
For Music
1. Jukebox
        2. MuseNet
        3. MusicLM
        4. MusicGen
For Text to Video
        1. RunwayML
        2. Make-A-Video by Meta Platforms
For Programming
        1. GitHub Copilot
For text to image
        1. Midjourney
    

This architecture has also led to the development of pre-trained systems, such as generative pre-trained transformers (GPTs) and BERT[12] (Bidirectional Encoder Representations from Transformers).

================================================

What Vector DB ?

Word embedding : encode each word from training set as vector. It is a representation of a word. It encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning. It is useful for syntactic parsing and sentiment analysis.

The OpenAI word embedding model lets you take any string of text (up to a ~8,000 word length limit) and turn that into a vector with 1536 dimension. So word has 1,536 floating point numbers as attributes. These floating point numbers are derived from a sophisticated language model. They take a vast amount of knowledge of human language and flatten that down to a list of floating point numbers. 4 bytes per floating point number that’s 4*1,536 = 6,144 bytes per word embedding—6KiB.

the whole vocabulary is vector DB. It is useful for sequence prediction.

Example: Pinecone, Weavite, chromadb, drant, activeloop, pgvector, zilliz, redis, momento, Neo4j, Casandara (CaasIO library)

================================================

What NLP Tasks?
  • machine translation
  • document summarization
  • document generation
  • named entity recognition (NER)
  • biological sequence analysis
  • writing computer code based on requirements expressed in natural language.
  • video understanding.
  • syntactic parsing
  • sentiment analysis
================================================

What Prompt Engineering?

process of structuring text that can be interpreted and understood by a generative AI model. It is enabled by in-context learning, defined as a model's ability to temporarily learn from prompts.

Example: PromptLayer, Aim, scale, Humanloop, HoneyHive

LangChain and LlamaIndex are useful. 

================================================

What RAG?

retrieval augmented generation

It can finetune the models 
We can feed an initial prompt with additional data from live database. 
It enables to personalize or finetune an answer on the fly.

Retrieval Augmented Generation (RAG) is a method for improving the performance of large language models (LLMs) by providing them with access to external knowledge sources. This is done by first retrieving a set of relevant documents from the knowledge source, and then using those documents to generate a response.


Two sides of RAG

1. Semantic Search
2. Cypher Generation 

Examples

1. ChatGPT plugin
2. Google Search
3. Vector DB
4. Knowledge Graph (graph DB Neo4J)
5. LlmaIndex (GPT Index) is also an library of LangChain
https://gpt-index.readthedocs.io/en/latest/index.html

It has
5.A. Data connectors API, PDF, SQL etc.
5.B. Data Index : Structure data as intermediate representation
5.C. Engine : Natural language access to data e.g. Chat Engine , Query Engine
5.D. Data Agents : LLM powered knowledge worker
5.E. Application Integrations : tie LlamaIndex back into the rest of your ecosystem. This could be LangChain, Flask, Docker, ChatGPT, etc.

RAG and Prompt Engineering are two of the techniques to eliminate Hallucination 

================================================

What Hallucination ?

GenAI generates output that looks very authenticate but actually it is false, untrue, incorrect. Like Fake Video. However Fake Video is not result of hallucination. 

Why Hallucination?

1. The input parameter temperature is high. So chances of output will go off track is higher. 

2. Missing Information. E.g. all LLM models have some cut off date. There is no information about event after that cutoff date. 

3. Bias training data and complex models. 

How to avoid hallucination ?

1. Prompt Engineering
2. In-context learning
3. Fine tuning of actual model
4. Grounding using RAG. RAG / grounding is available since May 2020

0 comments:

Post a Comment