Epic history of LLM
RNN. Seq to seq NLP tasks.
1. Many to one: Sentimental Analysis
2. One to Many: Image caption
3. Many to Many:
- Synch many to many: # input = # output. E.g. Part of speech tagging, Named Entity Recognition
- Asynch many to many: translation, text summarization, question and answer, chatboat, speech to text,
Seq2seq model is used for Many to Many
Stage 1: 2014 Encoder decoder network
Encoder and decoder are LSTM. RNN and GRU are other options.
It is good for small sentences. Not for 30+ words
BLEU score
Stage 2: 2015 Attention Mechanism
Encoder is same
Attention Mechanism: Attention layer at decoder finds out which hidden state is useful at each stage of decoder and generate context vector for that stage. So, Multiple context vectors based on encoder's (hidden state of LSTM = ctht vector) are available to decoder.
Training time is more.
2015 to 2017: May types of Attention Mechanisms were introduced.
Stage 3: 2017 Transformer
No LSTM
No RNN Cell
Self-attention was introduced
Both encoder and decoder uses attention
Transformer can process all words in parallel
1. Attention layer = Multi Head Attention
2. Normalization Layer
3. Dense Layer
4. Input embeddings
It needs hardware, time, and data
Stage 4: 2018 Jan Transfer Learning
Challenges
1 Single model cannot perform all tasks like sentimental, translation, summarization
2 lots of labeled data
Universal Language Model Fine-tuning ULMFiT proposed to use Language modelling as Pre-training. Language modelling is NLP task to predict next word. Advantages
1. Rich feature training
2. unsupervised task
model: AWD LSTM model
data set: wikipedia
finetuning changed output as classifier with many data set
Scratch 10000 data. Now fine tune 100 data still better result
- No transformer
Now in 2018, we have two technolgoies
1. architecture: transformer
2. training. Pretrain and transfer learning
Stage 5: 2018 Oct LLM
Transfer learning on transformer
1. Google : BERT (encoder only model)
2. OpenAI: GPT (decoder only model)
LM to LLM
1. data
2 hardware GPU clusters
3 time : days to weeks
4. cost = h/w + electricity + people + infra
5. energy consumption
---------------
GPT3 - > chatGPT
1. RLHF : Reinforcement Learning from Human Feedback
2. incorporate safety and ethical guidelines
3. improvement in contextual point
4. dialogue specific
5. continuous improvement based on user feedback
Reference https://www.youtube.com/watch?v=8fX3rOjTloc&list=PPSV





