Over ons 🤖

Laten we elkaar leren kennen

Vertel me de missie en visie

Leg het verhaal achter Mach8 uit

Hallo daar 👋

Hoe kunnen we je helpen?

Mijn gegevens mogen worden gebruikt om me op de hoogte te houden van relevant nieuws van Mach8

AI Tools & Technology·7 min·4 May 2025

What is a vector database and why do you need it for RAG?

If you want to build an AI application that answers questions based on your own documents, you will quickly encounter vector databases. They are the technical backbone of RAG systems, but their workings are not always clear.

Vector databases are a fundamental part of modern AI applications that work with their own knowledge bases. They enable an AI to quickly and semantically search large amounts of text, without every document needing to sit literally in the prompt. This article explains how that works and when you need a vector database.

What is a vector database?

A vector database stores data as vectors: lists of numbers that mathematically represent the meaning of a piece of text. These numerical representations are called embeddings. A sentence like "How do I request leave?" is converted into a vector of hundreds or thousands of numbers that capture the semantic meaning.

The key insight is that semantically similar texts also have similar vectors. That means you can search by meaning rather than by exact words. "How do I take vacation days?" and "Submit a leave application" are close to each other in the vector space, even though the exact words are different.

Why do you need this for RAG?

In Retrieval-Augmented Generation (RAG), the system searches a knowledge base for relevant passages and passes those passages as context to the language model. The quality of that search determines the quality of the answers.

If you have thousands of documents, you cannot put them all in every prompt: that is too expensive and exceeds the context limit of the model. A vector database solves this: you quickly retrieve the most relevant passages based on the user's question, and only pass those to the model.

How does it work technically?

The process runs in two phases:

Indexing: You split your documents into smaller pieces (chunks). Each piece is converted into an embedding via an embedding model (for example text-embedding-3 from OpenAI or an open-source alternative). That embedding is stored in the vector database, together with the original text.

Retrieval: When a user asks a question, you also convert the question into an embedding. The vector database finds the chunks closest to this query embedding. Those chunks are passed as context to the language model.

Which vector databases are available?

Popular options include:

  • Pinecone: cloud-native, fully managed service, easy to integrate
  • Weaviate: open-source with good filtering and hybrid search capabilities
  • Qdrant: open-source, high performance, well-suited for hosting on own infrastructure
  • pgvector: an extension for PostgreSQL that lets you run vector searches in your existing database

For smaller use cases or prototypes, pgvector is a pragmatic choice: you already have a database, you add vector functionality without managing a separate system. For larger or more complex applications, specialised vector databases are better suited.

Chunking: an underestimated detail

How you divide documents into chunks has a major impact on retrieval quality. Chunks that are too small miss context; chunks that are too large contain too much irrelevant information alongside the relevant part.

Good chunking respects the structure of the document: paragraph boundaries, headings and logical units. Overlapping chunks, where the end of one chunk is also the beginning of the next, help avoid losing context at a boundary.

When do you not need a vector database?

For small knowledge bases, fewer than 50-100 documents, you can sometimes use simple keyword searches or even place all documents directly in the context. Vector databases add complexity; that complexity is only justified when scale or quality requirements demand it.

Conclusion

Vector databases are a powerful but technical component of RAG systems. They enable AI to quickly and semantically search large knowledge bases. Mach8 builds RAG architectures with the right choice of vector database technology for each specific use case.

Want to build a RAG system for your documentation? Get in touch with Mach8.

Ready to apply AI?

We help you go from strategy to implementation. Schedule a no-obligation call.

Schedule a call