A neural network based sparse embedding generator. An ideal algorithm for keyword matching in RAG.
SPLADE
sparse embedding
retireval
keyword matching
Author
Shataxi Dubey
Published
April 26, 2026
from fastembed import SparseTextEmbeddingfrom transformers import AutoTokenizer
/home/shataxi/practise/bigredsky-ATS/bigredsky-backend/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
There are multiple SPLADE models
model_id=“naver/splade-v3”, it requires some permission
model_id=“naver/splade-cocondenser-selfdistil”, it is the best SPLADE model
But, we will use the model_id=prithivida/Splade_PP_en_v1
SPLADE gives all the words closer to the words in jd skills. If orchestration word is present, it is closely related to music, opera. This is the reason, why we see the words “music” and “opera” in the token weights
# Map index → token for a sparse result token_weights = [ tokenizer.convert_ids_to_tokens(int(idx)) for idx in jd_sparse.indices ]print('======JD requires skills=====')print(token_weights)print(len(token_weights))print(len(jd_req_skills.split()))token_weights = [ tokenizer.convert_ids_to_tokens(int(idx)) for idx in candidate_sparse.indices ]print('======Candidate skills=====')print(token_weights)print(len(token_weights))print(len(candidate_skills.split()))
The cosine similarity computed here will always be greater than 0 because Term frequency and inverse document frequency are always positive. If the cosine similarity is close to 0, then words are unrelated. If the cosine similarity is close to 1, then words are related.
UniCOIL sparse embeddings (for comparison with SPLADE and BM25)
UniCOIL (Uni-Contextualized Inverted Index Language model): - Uses BERT to produce a CONTEXTUAL hidden state per token - Projects each token’s hidden state to a single scalar via Linear(hidden_size → 1) + ReLU - Only tokens present in the input get a weight — NO vocabulary expansion
Where it sits between BM25 and SPLADE: BM25 — exact tokens, statistical weights (TF×IDF), no neural network UniCOIL — exact tokens, CONTEXTUAL learned weights, neural (BERT backbone) SPLADE — expands to related vocab terms, contextual weights, neural (BERT backbone)
Example: “Kubernetes orchestration” BM25: activates → [kubernetes, orchestration] (weights: TF×IDF counts) UniCOIL: activates → [ku, ##ber, ##netes, or, ##che, ##stration, …] (same WordPiece subwords as SPLADE, but weights are contextual scalars) SPLADE: activates → [kubernetes, orchestration, container, cluster, k8, …] (expands to semantically related vocabulary)
NOTE: The linear layer below uses random weights since loading the full pre-trained checkpoint (castorini/unicoil-noexp-msmarco-passage) requires mapping custom state-dict keys. For production use, load that checkpoint. The architecture and token activation pattern are correct regardless.