Hashicorp User Group Bangalore Meetup #1 : Powering the Multi-Cloud Era


Alternatives for IDP 

(1) https://github.com/JanssenProject/jans  https://github.com/JanssenProject/jans/tree/main/jans-keycloak-link   https://imshakil.medium.com/janssen-mod-auth-openidc-module-to-test-openid-connect-single-sign-on-s…  It is by Glu 

(2) Vault it self support OIDC https://developer.hashicorp.com/vault/docs/secrets/identity/oidc-provider    https://brian-candler.medium.com/using-vault-as-an-openid-connect-identity-provider-ee0aaef2bba2

SQL++ is for JSON data. https://www.couchbase.com/sqlplusplus/

https://techmilap.com/ is free website for hosting event

Vault can provide dynamic temporary secrets to access data for each identity used by consumer. so later on, we can audit, who has accessed data. In our case, pods use ServiceAccount (SA). here we get dynamic secret per serviceaccount. So we cannot audit which pod accessed the data. we can only audit, data is accessed by which ServiceAccount. This dynamic secret has short life so one cannot use it again. SA we can use it as many time as we want.

Vault secure data in-transit with TLS and other encryption method that is called "encryption as a service"

In terraform, state file is the most confidential. 

Nomad is alternative of K8s. It can manage VM also using QEMU driver. Consul is used for networking and service. Fabio is for ingress and load balancing in Nomad.

Identity Provider


https://github.com/pando85/kaniop Kaniop is a Kubernetes operator for managing Kanidm. 

https://kanidm.com/ Kanidm is a modern, secure identity management system that provides authentication and authorization services with support for POSIX accounts, OAuth2, and more. It is simple and written in rust

IDP

(1)

https://github.com/JanssenProject/jans  

https://github.com/JanssenProject/jans/tree/main/jans-keycloak-link

https://imshakil.medium.com/janssen-mod-auth-openidc-module-to-test-openid-connect-single-sign-on-s…  

It is by Glu 

(2) Vault it self support OIDC https://developer.hashicorp.com/vault/docs/secrets/identity/oidc-provider    https://brian-candler.medium.com/using-vault-as-an-openid-connect-identity-provider-ee0aaef2bba2

-------------

Why Choose Keycloak?. Understanding the Need for an Identity… | by J3 | Jungletronics | Medium

Ory

Quickstart | Ory

Quickstart | Ory

Ory Kratos Helm Chart | k8s

GitHub - ory/k8s: Kubernetes Helm Charts for the ORY ecosystem. · GitHub

Ory Hydra: OAuth 2.0 and OpenID Connect server | Ory

GitHub - ory/kratos: Headless cloud-native authentication and identity management written in Go. Scales to a billion+ users. Replace Homegrown, Auth0, Okta, Firebase with better UX and DX. Passkeys, Social Sign In, OIDC, Magic Link, Multi-Factor Auth, SMS, SAML, TOTP, and more. Runs everywhere, runs best on Ory Network. · GitHub

GitHub - ory/hydra: Internet-scale OpenID Certified™ OpenID Connect and OAuth2.1 provider that integrates with your user management through headless APIs. Solve OIDC/OAuth2 user cases over night. Consume as a service on Ory Network or self-host. Trusted by OpenAI and many others for scale and security. Written in Go. · GitHub

The Top 7 Ory Kratos Alternatives

The Paper That Changed Everything: Attention is All You Need


Here are few links

The Paper

https://arxiv.org/pdf/1706.03762.pdf

------------------------

Medium

https://medium.com/@SimplifyingFutureTech/understanding-attention-is-all-you-need-750713a1631b

https://medium.com/codex/attention-is-all-you-need-explained-ebdb02c7f4d4

-------------

PoloClub

https://poloclub.github.io/transformer-explainer/

https://arxiv.org/abs/2408.04619

https://www.youtube.com/watch?v=ECR4oAwocjs

-----------

Last Few videos of https://www.youtube.com/watch?v=2dH_qjc9mFg&list=PLKnIA16_RmvYuZauWaPlRTC54KxSNLtNn

https://hasgeek.com/fifthelephant/paper-reading-meet-up-december-2023/

https://www.linkedin.com/pulse/decoding-attention-all-you-need-how-transformers-ai-yuri-sylse/

--------------

Embedding is representation of text in multi dimensional space

Diffusion model add noise and then remove it. It is for multimodal.  

Multi head = syntax + semantics + position. It improves expressiveness and captures richer patterns. 

Attention is about which embedding to look at. It does not change embedding. 

Few other miscellaneous link from event https://luma.com/d0yhf0ib

1. IronClaw

https://github.com/nearai/ironclaw

https://www.ironclaw.com/

IronClaw is the secure, open-source alternative to OpenClaw that runs in encrypted enclaves on NEAR AI Cloud.  TEE (Trusted Execution Environment) 


VoIP in Agentic AI era


Once upon a time signaling stack is separated from voice as packet switched SS7 network, with its own protocol stack. SS7 over TCP/IP stack is SIGTRAN. VoIP signaling plane has protocols like H.323 (by ITU), SIP (by IETF) and MEGACO. SIP became most popular. VoIP data plane is RTP. Now in era of Agentic AI, we have  business solutions for different verticals to integrate voice with STT, LLM, TTS etc. Here are few resource URLs

All Relevant technologies  

https://www.voip-info.org/

https://telecom.altanai.com/

Signalwire

https://www.linkedin.com/posts/briankwest_github-signalwire-demosveronica-this-activity-7430982255675678720-jsTH/

https://developer.signalwire.com/sdks/agents-sdk/

https://github.com/signalwire-demos

https://signalwire.com/

https://postpromptviewer.signalwire.io/

FreeSWITCH

https://en.wikipedia.org/wiki/FreeSWITCH

https://signalwire.com/freeswitch

https://github.com/signalwire/freeswitch

https://developer.signalwire.com/freeswitch/FreeSWITCH-Explained/


https://github.com/amigniter/mod_audio_stream

https://github.com/sptmru/freeswitch_mod_audio_stream

https://medium.com/@srivastava.vikash/day-9-real-time-voice-ai-starts-here-streaming-audio-from-freeswitch-a45d69547164

https://www.cyberpunk.tools/jekyll/update/2025/11/18/add-ai-voice-agent-to-freeswitch.html


Asterisk

https://www.asterisk.org/

https://en.wikipedia.org/wiki/Asterisk_(PBX)

https://github.com/asterisk/asterisk

Plivo

https://www.plivo.com/

https://github.com/plivo

JsSIP

https://jssip.net/

https://github.com/versatica/JsSIP

https://en.wikipedia.org/wiki/JsSIP

Security

https://www.frafos.com/

OverSIP

https://oversip.versatica.com/

https://github.com/versatica/OverSIP

https://rubygems.org/gems/oversip/versions/2.0.1?locale=en

https://www.voip-info.org/oversip/

OfficeSIP

https://officesip-server.software.informer.com/

https://telecom.altanai.com/2014/10/13/sip-server-officesip/

https://sourceforge.net/projects/officesip/

https://github.com/vf1/sipserver

FlexiSIP

https://github.com/BelledonneCommunications/flexisip

https://www.linphone.org/en/flexisip-sip-server/

https://www.linhome.org/software-products/flexisip/

https://wiki.linphone.org/xwiki/wiki/public/view/Flexisip/

Tools

https://postpromptviewer.signalwire.io/

https://github.com/briankwest/libnemo_normalize

https://github.com/signalwire-demos/utils

https://github.com/xiph/rnnoise

FreePBX

https://www.hostinger.com/in/tutorials/freepbx-tutorial

https://www.freepbx.org/

https://en.wikipedia.org/wiki/FreePBX

https://github.com/freepbx

Others

https://medium.com/@dwilkie_34546/implementing-ai-powered-voice-at-somleng-a-technical-deep-dive-93edbb920e02

https://stringee.com/en/

https://www.kamailio.org/w/

https://github.com/resiprocate/resiprocate/wiki

https://www.kaplansoft.com/teksip/

AI

https://deepgram.com/


Transformers & Large Language Models - 1 of 9


• Background on NLP and tasks

NLP Tasks

1. Classification

- Sentimental analysis :  Amazon reviews, IMDB critiques, Twitter

- Intent detection

- Language detection

- Topic modeling

2. "Multi"-Classification

- Part of speech tagging

- Named entity recognition (NER): Dataset = annotated Reuters newspaper (CONLL-2003, CONLL+)

- Dependency parsing

- Constituency parsing

3. Generation

- Machine translation: Dataset = WMT'14

- Question answering

- Summarization

- Text generation

History of LLM

1980 RNN

1997 LSTM (Theoretical Foundation) 

2013 Word2Vec

2020s LLM

• Tokenization

1. Arbitrary (n/a)

2. Word (multiple tokens with similar meanings need same embedding, so Word variations not handled)  

3. sub-word : focus on common root. Increase sequence length. Tokenization more complex

4. character level: can correct mis-spelled word & CasINg. Sequence length is much longer. No OOV

• Embeddings

Word (Token) Representation by vector

OHE = One Hot Encoding

cosine similarity 

• Word2vec, RNN, LSTM

1. Word2Vec

It is ANN with proxy-task

1. CBOW: Continuous Bag of Words. You predict the target word 

2. Skip-gram : You take the target word and predict words around it

Word order does not matter

Embeddings is not context aware

Dimension size example 768

Special token to indicate "end of sequence" 

2. RNN Recurrent Neural Network

Connection forms a temporal sequence

H = Hidden state = A = Activation Vector = Context Vector. 

RNN is used for all 3 NLP tasks

1. Classification

2. "Multi"-Classification

3. Generation

RNN is keep forgetting the past. This phenomena is called "vanishing gradient"

Word order matters in RNN

3. LSTM = Long short-term memory

1. hidden state

2. cell state

• Attention mechanism

Attention tries to have a direct link between next word that we are predicting and something from the past. 

"self-attention" is main principle of "Attention is all you need" 2017 paper

"self-attention" = Instead of sequential, let direct connection with all part of text at once. 

Concept of Query, Key and Value

We compare Q to K. How they are similar and then take corresponding value

Softmax converts unnormalized network output into probability for different class such that value is [0,1] and sum is 1. 

Formula – Given a query Q, we want to know which key K the query should pay "attention" to with respect to the associated value V. 

attention = softmax ( Q * K ^ T / Sqrt (dimension of K) ) * V

There are three attention layers

1. Attention layer at encoder to compute embeddings from input

2. Decoder-decoder attention OR self-attention layer in decoder, It is is masked, because it only look at those token that are translated. It determines: what other token of output sentence is useful to predict next token. 

3. cross-attention layer : expressed as function of what is seen in input. Last part of encoder. it is fetch to decoder. 

We have direct link to all token. So order words does not matter. (unlike RNN).  So we have Position Encoding: to inform position of word in sequence. 


BOS Token: Beginning of Sequence. 

EOS Token: End of Sequence


• Transformer architecture

Self-attention is achieved by transformer = encoder and decoder

1. Encoder computes meaningful embedding from input text. We have N such encoders. Input layer generates position aware embedding matrix with size d = model size and length = length of input sequence = n

Encoder projects input sequence on 3 spaces Wk, Wq and Wv. so model learns. 

attention = softmax ( Q * K ^ T / Sqrt (dimension of K) ) * V

Projecting on Wq gives a matrix where each row represents a given query Q. So we get matrix Wo that is project back to original dimension of embedding. 

K^T is each column represents key of each token. 

When we multiple K^T and Wq, Each row represents projection of query over each key and then get probability distribution. 

Now multiple with matrix V 

This is self-attention mechanism. means compute representation of each token as function of other tokens. it is done by attention layer. 

Multi-Head Attention (MHA) means this computation is done in different way. So model can learn 

- different representation

- different projections

so all token of input text attend each other. 

It is masked self-attention layer. 

A Multi-Head Attention (MHA) layer performs attention computations across multiple heads, then projects the result in the output space.

Variations of MHA
* Grouped-Query Attention (GQA) and 
* Multi-Query Attention (MQA) 
that reduce computational overhead by sharing keys and values across attention heads.

Head is term given to project matrix that we used to obtain Q, K, V. With more heads, model learns different projection. It is like multiple filters in convolution layer in computer vision. 
h = number of heads
For having h number of heads, the output of attention is h such matrices. Here, because of gradient decent every time we get different result. Each objective function with degree of freedom. We concatenate output of all headers with respect to columns. 

2. FFNN (Feed Forward Neural Network) : so model learn another kind of projection

so we get rich representation of input token

In LLM, hidden layer has higher dimension. So model has enough degree of freedom to learn useful representation. 

3. output is for decoder

It takes Q from output. 

K, V from encoder. 

we have N decoders. 

New Terms

  • Perplexity is an evaluation matrix for machine translation. It quantifies how 'surprised' the model is to see some words together. Lower is better. 
  • OOV = out of vocabulary
  • RNN is keep forgetting the past. This phenomena is called "vanishing gradient"
  • Label Smoothing Purpose

    - prevent overfitting

    - introduce noise

    - let model be little unsure about prediction. 

    It improves accuracy and BLEU score of translation.

References

https://cme295.stanford.edu/

Syllabus : https://cme295.stanford.edu/syllabus/

CheatSheet 

https://cme295.stanford.edu/cheatsheet/ 

https://github.com/afshinea/stanford-cme-295-transformers-large-language-models/tree/main/en 

https://www.youtube.com/watch?v=Ub3GoFaUcds

Text Book Super Study Guides


------------------------------------------------------

Some more relevant stuff: 

Each layer has 

1. Attention and 

2. Fast Forward


Between two layers we have high dimension 'hidden state vector' in activation space. 


LLM encodes concepts as distributed patterns accross layers = Superposition. 

Antropic has series of papers on superposition and monosemanticity

https://www.youtube.com/watch?v=F2jd5WuT-zg

https://www.neuronpedia.org

https://huggingface.co/collections/dlouapre/sparse-auto-encoders-saes-for-mechanistic-interpretability

https://huggingface.co/spaces/dlouapre/eiffel-tower-llama

------------------------------------------------------------