Loss Functions in LLMs and Embedding Models
January 10, 2024

Archived from an original LinkedIn post by Brian Greenforest.

Original Post

The choice of loss function is a significant difference between training a generative language model (LLM - Language Model) and training a vector search embedding model, like the one used in the "Retrieve" phase of the RAG architecture.

In the case of generative language models, the objective is often to maximize the likelihood of generating the next token in a sequence, given the context. This is typically done using maximum likelihood estimation (MLE) or related objectives.

On the other hand, for vector search embedding models designed for information retrieval tasks, the loss function is tailored to encourage the model to learn representations that place similar documents closer together in the vector space. This is crucial for efficient retrieval during the retrieval phase.

Commonly used loss functions include:

* Contrastive Loss:
This loss function encourages similar documents to have similar representations and dissimilar documents to have dissimilar representations. It often involves pulling positive pairs (similar documents) closer together in the embedding space while pushing negative pairs (dissimilar documents) apart.

* Triplet Loss:
Triplet loss is another popular choice. It involves selecting a triplet of examples: an anchor (a query or prompt), a positive example (a relevant document), and a negative example (an irrelevant document). The model is then trained to minimize the distance between the anchor and the positive example while maximizing the distance between the anchor and the negative example.

* Ranking-based Losses:
Various ranking-based losses are used to explicitly capture the notion of relevance. These losses penalize models when the rank order of relevant documents is lower than that of irrelevant documents.

These loss functions guide the training process to learn embeddings that capture semantic similarity or relevance between documents. The ultimate goal is to create a vector space where similar documents are close together, facilitating efficient retrieval during the subsequent information retrieval phase.

This is in contrast to generative language models, where the focus is on generating coherent and contextually appropriate sequences of tokens.

#RAG
#embedding
#vectorsearch
#semanticsearch
#ai
#mlops