Thursday, September 25, 2025
HomeData Modelling & AIBig dataText embedding made simple

Text embedding made simple


UPDATE 2023-06-06: use new syntax to configure Bert embedder.

Decorative image

“searching data using vector embeddings, unreal engine, high quality render, 4k, glossy, vivid colors, intricate detail” by Stable Diffusion

Text embedding made simple

Embeddings are the basis for modern semantic search and neural ranking,
so the first step in developing such features is to convert your document
and query text to embeddings.

Once you have the embeddings, Vespa.ai makes it easy to use them efficiently
to find neighbors
or evaluate machine-learned models,
but you’ve had to create
them either on the client side or by writing your own Java component.
Now, we’re providing this building block out of the platform as well.

On Vespa 8.54.61 or higher, simply add this to your services.xml file under <container>:

<component id="bert" type="bert-embedder">
    <transformer-model path="model/bert-embedder.onnx"/>
    <tokenizer-vocab path="model/vocab.txt"/>
</component>

The model files here can be any BERT style model and vocabulary,
we recommend this one:
huggingface.co/sentence-transformers/msmarco-MiniLM-L-6-v3.

With this deployed, you can automatically
convert query text
to an embedding by writing embed(bert, “my text”) where you would otherwise supply an embedding tensor. For example:

input.query(myEmbedding)=embed(bert, "Hello world")

And to
create an embedding from a document field
you can add

field myEmbedding type tensor(x[384]) {
    indexing: input myTextField | embed bert
}

to your schema outside the document block.

Semantic search sample application

To get you started we have created a complete and minimal sample application using this:
simple-semantic-search.

Further reading

This should make it easy to get started with embeddings. If you want to dig deeper into the topic,
be sure to check out this blog post series on
using pretrained transformer models for search,
and this on efficiency in
combining vector search with filters.

RELATED ARTICLES

Most Popular

Dominic
32319 POSTS0 COMMENTS
Milvus
84 POSTS0 COMMENTS
Nango Kala
6680 POSTS0 COMMENTS
Nicole Veronica
11853 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11910 POSTS0 COMMENTS
Shaida Kate Naidoo
6794 POSTS0 COMMENTS
Ted Musemwa
7070 POSTS0 COMMENTS
Thapelo Manthata
6752 POSTS0 COMMENTS
Umr Jansen
6761 POSTS0 COMMENTS