The purpose of vectors
Why take the trouble and cost of incorporating vector embedding retrieval into an application?
The right answer is: Because it can improve retrieval quality – recall and relevance in information
retrieval terminology.
Vectors can help with this because they introduce semantics into an application: Rather than just
matching by words and exact fields, vectors allow fuzzily matching the meaning of a query to the
meaning of your content.
Simple vector search works badly
While this sounds great, vectors have limitations. The fuzzy matching nature of vector embeddings
make them unsuitable to cases where precision is important – such as finding a particular product,
detailed address, or warehouse shelf.
When applied to textual data, the academic community has shown that even simple bm25 text search
outperforms vector similarity,
and the
industry
has
converged on combining both text and vector search, both in finding candidates and in scoring them (relevance).
And in all applications, methods that use many
more
detailed
vectors
along with tensor math to retrieve them produce vastly superior results to simply representing each
content item by a single vector.
When you are working with vectors you are doing search
Using vectors effectively requires thinking in terms of information retrieval: A variety of conditions
generate candidates, and those are scored (usually in multiple phases to save on compute) to produce
the most relevant, which are then returned to the user or LLM. This process is very different from a
lookup in a database, and is inherent in vector search since all vectors are matching every query,
just with different proximity scores.
Since using just vector proximity does not provide good results, additional factors must be leveraged,
such as multiple detailed vectors, lexical matching, structured fields and so on, and these signals
must be combined into a final score using tensor math and small machine-learned models. Scaling this
process cost effectively to large amounts of data and high query rates leads to a
different architecture
than a database.
This is why search engines and databases are separate product categories even though text (and now vectors)
can be stored in databases: It is not about the storage, it is about the ranking.
Relevance is crucial for RAG applications
It is well understood that relevance is crucial in information retrieval applications for humans,
whether for actual end user text search, or in implicit searches such as for product recommendations.
What is less widely appreciated is that relevance is much more important when retrieving data for an LLM.
This is because contrary to humans, LLMs are not online learners: They do not constantly pick up useful
information from going to meetings, reading their mail and so on. After training they learn nothing and
so are completely dependent on getting all the information they need delivered to them by information
retrieval at the point in time when they need to do some work.
No amount of model intelligence can compensate for missing the information needed to perform a task,
and therefore most of the work in creating RAG applications that reliably delivers quality are in
search relevance: More precise modeling of the data, leveraging and updating more signals and vectors,
applying ever better machine-learned models over larger amounts of the candidates and all the other
tasks familiar from information retrieval.
The tradeoffs between using a database and a search engine
For organizations that already have much of their data in a database, it can be attractive to simply
leverage the vector support added to that database rather than introducing new technology for vector
use cases. This makes for reuse existing integrations, knowledge, operational practices and vendor
relationships.
However, if the purpose of using vectors is to improve quality or unlock new use cases, it is worth
considering that vector retrieval inherently means moving into the category of search, and that
achieving quality goals with vectors takes a lot more than simple retrieval to a single vector:
Most of the work and features needed lie in relevance, and these cannot be architecturally
separated from the database since this means sending too much data over the network, which is why
search engines both store data and perform the relevance functions.
A fruitful resolution of this tradeoff may be to separate into two cases: Where vectors are applied
without a need for achieving particular quality levels, the existing database can be applied to store
them and do simple lookups by proximity, while where quality matters, a proper vector-enabled search
engine which supports the necessary relevance work can be leveraged.