Hashicorp User Group Bangalore Meetup #1 : Powering the Multi-Cloud Era
Alternatives for IDP
(1) https://github.com/JanssenProject/jans https://github.com/JanssenProject/jans/tree/main/jans-keycloak-link https://imshakil.medium.com/janssen-mod-auth-openidc-module-to-test-openid-connect-single-sign-on-s… It is by Glu
(2) Vault it self support OIDC https://developer.hashicorp.com/vault/docs/secrets/identity/oidc-provider https://brian-candler.medium.com/using-vault-as-an-openid-connect-identity-provider-ee0aaef2bba2
SQL++ is for JSON data. https://www.couchbase.com/sqlplusplus/
https://techmilap.com/ is free website for hosting event
Vault can provide dynamic temporary secrets to access data for each identity used by consumer. so later on, we can audit, who has accessed data. In our case, pods use ServiceAccount (SA). here we get dynamic secret per serviceaccount. So we cannot audit which pod accessed the data. we can only audit, data is accessed by which ServiceAccount. This dynamic secret has short life so one cannot use it again. SA we can use it as many time as we want.
Vault secure data in-transit with TLS and other encryption method that is called "encryption as a service"
In terraform, state file is the most confidential.
Nomad is alternative of K8s. It can manage VM also using QEMU driver. Consul is used for networking and service. Fabio is for ingress and load balancing in Nomad.
Identity Provider
https://github.com/pando85/kaniop Kaniop is a Kubernetes operator for managing Kanidm.
https://kanidm.com/ Kanidm is a modern, secure identity management system that provides authentication and authorization services with support for POSIX accounts, OAuth2, and more. It is simple and written in rust
IDP
(1)
https://github.com/JanssenProject/jans
https://github.com/JanssenProject/jans/tree/main/jans-keycloak-link
https://imshakil.medium.com/janssen-mod-auth-openidc-module-to-test-openid-connect-single-sign-on-s…
It is by Glu
(2) Vault it self support OIDC https://developer.hashicorp.com/vault/docs/secrets/identity/oidc-provider https://brian-candler.medium.com/using-vault-as-an-openid-connect-identity-provider-ee0aaef2bba2
-------------
Why Choose Keycloak?. Understanding the Need for an Identity… | by J3 | Jungletronics | Medium
Ory
GitHub - ory/k8s: Kubernetes Helm Charts for the ORY ecosystem. · GitHub
The Paper That Changed Everything: Attention is All You Need
Here are few links
The Paper
https://arxiv.org/pdf/1706.03762.pdf
------------------------
Medium
https://medium.com/@SimplifyingFutureTech/understanding-attention-is-all-you-need-750713a1631b
https://medium.com/codex/attention-is-all-you-need-explained-ebdb02c7f4d4
-------------
PoloClub
https://poloclub.github.io/transformer-explainer/
https://arxiv.org/abs/2408.04619
https://www.youtube.com/watch?v=ECR4oAwocjs
-----------
Last Few videos of https://www.youtube.com/watch?v=2dH_qjc9mFg&list=PLKnIA16_RmvYuZauWaPlRTC54KxSNLtNn
https://hasgeek.com/fifthelephant/paper-reading-meet-up-december-2023/
https://www.linkedin.com/pulse/decoding-attention-all-you-need-how-transformers-ai-yuri-sylse/
--------------
Embedding is representation of text in multi dimensional space
Diffusion model add noise and then remove it. It is for multimodal.
Multi head = syntax + semantics + position. It improves expressiveness and captures richer patterns.
Attention is about which embedding to look at. It does not change embedding.
Few other miscellaneous link from event https://luma.com/d0yhf0ib
1. IronClaw
https://github.com/nearai/ironclaw
https://www.ironclaw.com/
IronClaw is the secure, open-source alternative to OpenClaw that runs in encrypted enclaves on NEAR AI Cloud. TEE (Trusted Execution Environment)
VoIP in Agentic AI era
Once upon a time signaling stack is separated from voice as packet switched SS7 network, with its own protocol stack. SS7 over TCP/IP stack is SIGTRAN. VoIP signaling plane has protocols like H.323 (by ITU), SIP (by IETF) and MEGACO. SIP became most popular. VoIP data plane is RTP. Now in era of Agentic AI, we have business solutions for different verticals to integrate voice with STT, LLM, TTS etc. Here are few resource URLs
All Relevant technologies
https://www.voip-info.org/
https://telecom.altanai.com/
Signalwire
https://www.linkedin.com/posts/briankwest_github-signalwire-demosveronica-this-activity-7430982255675678720-jsTH/
https://developer.signalwire.com/sdks/agents-sdk/
https://github.com/signalwire-demos
https://signalwire.com/
https://postpromptviewer.signalwire.io/
FreeSWITCH
https://en.wikipedia.org/wiki/FreeSWITCH
https://signalwire.com/freeswitch
https://github.com/signalwire/freeswitch
https://developer.signalwire.com/freeswitch/FreeSWITCH-Explained/
https://github.com/amigniter/mod_audio_stream
https://github.com/sptmru/freeswitch_mod_audio_stream
https://medium.com/@srivastava.vikash/day-9-real-time-voice-ai-starts-here-streaming-audio-from-freeswitch-a45d69547164
https://www.cyberpunk.tools/jekyll/update/2025/11/18/add-ai-voice-agent-to-freeswitch.html
Asterisk
https://www.asterisk.org/
https://en.wikipedia.org/wiki/Asterisk_(PBX)
https://github.com/asterisk/asterisk
Plivo
https://www.plivo.com/
https://github.com/plivo
JsSIP
https://jssip.net/
https://github.com/versatica/JsSIP
https://en.wikipedia.org/wiki/JsSIP
Security
https://www.frafos.com/
OverSIP
https://oversip.versatica.com/
https://github.com/versatica/OverSIP
https://rubygems.org/gems/oversip/versions/2.0.1?locale=en
https://www.voip-info.org/oversip/
OfficeSIP
https://officesip-server.software.informer.com/
https://telecom.altanai.com/2014/10/13/sip-server-officesip/
https://sourceforge.net/projects/officesip/
https://github.com/vf1/sipserver
FlexiSIP
https://github.com/BelledonneCommunications/flexisip
https://www.linphone.org/en/flexisip-sip-server/
https://www.linhome.org/software-products/flexisip/
https://wiki.linphone.org/xwiki/wiki/public/view/Flexisip/
Tools
https://postpromptviewer.signalwire.io/
https://github.com/briankwest/libnemo_normalize
https://github.com/signalwire-demos/utils
https://github.com/xiph/rnnoise
FreePBX
https://www.hostinger.com/in/tutorials/freepbx-tutorial
https://www.freepbx.org/
https://en.wikipedia.org/wiki/FreePBX
https://github.com/freepbx
Others
https://medium.com/@dwilkie_34546/implementing-ai-powered-voice-at-somleng-a-technical-deep-dive-93edbb920e02
https://stringee.com/en/
https://www.kamailio.org/w/
https://github.com/resiprocate/resiprocate/wiki
https://www.kaplansoft.com/teksip/
AI
https://deepgram.com/
Transformers & Large Language Models - 1 of 9
• Background on NLP and tasks
NLP Tasks
1. Classification
- Sentimental analysis : Amazon reviews, IMDB critiques, Twitter
- Intent detection
- Language detection
- Topic modeling
2. "Multi"-Classification
- Part of speech tagging
- Named entity recognition (NER): Dataset = annotated Reuters newspaper (CONLL-2003, CONLL+)
- Dependency parsing
- Constituency parsing
3. Generation
- Machine translation: Dataset = WMT'14
- Question answering
- Summarization
- Text generation
History of LLM
1980 RNN
1997 LSTM (Theoretical Foundation)
2013 Word2Vec
2020s LLM
• Tokenization
1. Arbitrary (n/a)
2. Word (multiple tokens with similar meanings need same embedding, so Word variations not handled)
3. sub-word : focus on common root. Increase sequence length. Tokenization more complex
4. character level: can correct mis-spelled word & CasINg. Sequence length is much longer. No OOV
• Embeddings
Word (Token) Representation by vector
OHE = One Hot Encoding
cosine similarity
• Word2vec, RNN, LSTM
1. Word2Vec
It is ANN with proxy-task
1. CBOW: Continuous Bag of Words. You predict the target word
2. Skip-gram : You take the target word and predict words around it
Word order does not matter
Embeddings is not context aware
Dimension size example 768
Special token to indicate "end of sequence"
2. RNN Recurrent Neural Network
Connection forms a temporal sequence
H = Hidden state = A = Activation Vector = Context Vector.
RNN is used for all 3 NLP tasks
1. Classification
2. "Multi"-Classification
3. Generation
RNN is keep forgetting the past. This phenomena is called "vanishing gradient"
Word order matters in RNN
3. LSTM = Long short-term memory
1. hidden state
2. cell state
• Attention mechanism
Attention tries to have a direct link between next word that we are predicting and something from the past.
"self-attention" is main principle of "Attention is all you need" 2017 paper
"self-attention" = Instead of sequential, let direct connection with all part of text at once.
Concept of Query, Key and Value
We compare Q to K. How they are similar and then take corresponding value
Softmax converts unnormalized network output into probability for different class such that value is [0,1] and sum is 1.
Formula – Given a query Q, we want to know which key K the query should pay "attention" to with respect to the associated value V.
attention = softmax ( Q * K ^ T / Sqrt (dimension of K) ) * V
There are three attention layers
1. Attention layer at encoder to compute embeddings from input
2. Decoder-decoder attention OR self-attention layer in decoder, It is is masked, because it only look at those token that are translated. It determines: what other token of output sentence is useful to predict next token.
3. cross-attention layer : expressed as function of what is seen in input. Last part of encoder. it is fetch to decoder.
We have direct link to all token. So order words does not matter. (unlike RNN). So we have Position Encoding: to inform position of word in sequence.
BOS Token: Beginning of Sequence.
EOS Token: End of Sequence
• Transformer architecture
Self-attention is achieved by transformer = encoder and decoder
1. Encoder computes meaningful embedding from input text. We have N such encoders. Input layer generates position aware embedding matrix with size d = model size and length = length of input sequence = n
Encoder projects input sequence on 3 spaces Wk, Wq and Wv. so model learns.
attention = softmax ( Q * K ^ T / Sqrt (dimension of K) ) * V
Projecting on Wq gives a matrix where each row represents a given query Q. So we get matrix Wo that is project back to original dimension of embedding.
K^T is each column represents key of each token.
When we multiple K^T and Wq, Each row represents projection of query over each key and then get probability distribution.
Now multiple with matrix V
This is self-attention mechanism. means compute representation of each token as function of other tokens. it is done by attention layer.
Multi-Head Attention (MHA) means this computation is done in different way. So model can learn
- different representation
- different projections
so all token of input text attend each other.
It is masked self-attention layer.
A Multi-Head Attention (MHA) layer performs attention computations across multiple heads, then projects the result in the output space.
2. FFNN (Feed Forward Neural Network) : so model learn another kind of projection
so we get rich representation of input token
In LLM, hidden layer has higher dimension. So model has enough degree of freedom to learn useful representation.
3. output is for decoder
It takes Q from output.
K, V from encoder.
we have N decoders.
New Terms
- Perplexity is an evaluation matrix for machine translation. It quantifies how 'surprised' the model is to see some words together. Lower is better.
- OOV = out of vocabulary
- RNN is keep forgetting the past. This phenomena is called "vanishing gradient"
Label Smoothing Purpose
- prevent overfitting
- introduce noise
- let model be little unsure about prediction.
It improves accuracy and BLEU score of translation.
References
https://cme295.stanford.edu/
Syllabus : https://cme295.stanford.edu/syllabus/
CheatSheet
https://cme295.stanford.edu/cheatsheet/
https://github.com/afshinea/stanford-cme-295-transformers-large-language-models/tree/main/en
https://www.youtube.com/watch?v=Ub3GoFaUcds
Text Book Super Study Guides
------------------------------------------------------
Some more relevant stuff:
Each layer has
1. Attention and
2. Fast Forward
Between two layers we have high dimension 'hidden state vector' in activation space.
LLM encodes concepts as distributed patterns accross layers = Superposition.
Antropic has series of papers on superposition and monosemanticity
https://www.youtube.com/watch?v=F2jd5WuT-zg
https://www.neuronpedia.org
https://huggingface.co/collections/dlouapre/sparse-auto-encoders-saes-for-mechanistic-interpretability
https://huggingface.co/spaces/dlouapre/eiffel-tower-llama
------------------------------------------------------------