What's the best vector database for building AI products?
Vector databases are the backbone of retrieval-augmented generation (RAG), a key technique enabling modern AI products to deliver accurate, context-aware answers from private data. This is our comprehensive comparison of leading vector databases, including Turbopuffer, Pinecone, Qdrant, pgvector, and many more.
Choosing the right vector database is critical for any AI product that must
ground responses in private data—customer records, team documentation, internal
metrics, and more. The best choice ensures that your AI can quickly find
accurate information using retrieval-augmented generation (RAG), while scaling
seamlessly and staying affordable.
When we set out to launch AI Copilots, our customizable AI chat
product for React, we faced the challenge of selecting a vector database
firsthand. Because our product manages the entire conversation loop, including
message persistence for each user, we needed a vector database that could serve
proprietary knowledge with multi-tenant isolation, real-time streaming,
scalability, and cost-effectiveness.
It’s a crowded market with many competing solutions, so we spent months testing
different approaches. In the end, we chose a hybrid approach where we run both
BM25 (keyword/semantic) and vector similarity searches, optionally followed by a
rerank step.
In this post we'll outline the criteria we used and the tradeoffs we found, so
you can pick the best vector database for your AI in 2025.
During our research, we discovered that vector databases vary greatly in terms
of features, limitations, and performance. Comparing benchmark speeds alone
isn't enough, and a number of factors helped us make the right decision:
Performance & scalability: Performance is crucial for us, as we need to
provide responsive AI agents for our customers. While we weren't able to
benchmark every solution, we'll discuss available third-party benchmarks.
Features: We focused on indexing strategies, namespace support, which give
us the ability to split data by type and tenant. We also quickly identified
that hybrid search is essential for robust RAG solutions, and because our
agents run close to the user on edge runtimes, an HTTP API or edge-compatible
SDK was a must.
Limitations: Each option varies greatly in terms of limitations,
particularly when it comes to indexes and namespaces.
Enterprise compatibility: As with other enterprise service providers,
compliance and security are key. HIPAA, SOC2, single sign-on, and similar
enterprise features are non-negotiable requirements.
Cost: As a provider, cost of goods directly affects what we pass on to
customers, so pricing is a major factor—especially for systems with large data
ceilings. For consistency we've compared providers with a standard formula*.
Extension vs dedicated database: Building a vector search solution into
your existing database (e.g. Postgres) can be tempting, as it will simplify
looking up data, but may lead to resource contention and scalability issues if
not planned well. Using a separate, dedicated, vector database avoids these
issues, but requires ongoing data synchronization between sources.
* 1536 dimensions, 1 million reads, 1 million writes, and 10 namespaces
(where supported).
Thanks to its performance, low cost,
crazy high limits, and enterprise
features without enterprise costs, Turbopuffer became the obvious choice for us
when building AI Copilots. We experienced firsthand the reason
why they're the choice of some of our favorite tools like Cursor, Notion, and
Linear.
Turbopuffer supports both vector and BM25 indexes, making it a great fit for
both search and RAG use cases. It's serverless, and you only pay for what you
use: storage, writes, and queries. You can pre-warm a namespace via API, which
ensures our Copilots respond instantly.
Multi-tenancy is simple and scalable. Each customer and project gets its own
namespace, and there are no hard limits. Since performance can degrade as vector
stores grow, isolating tenants like this actually improves performance.
SDKs are available in TypeScript, Python, and Go, and when we ran into an issue
with the TypeScript client, their team fixed it in hours.
Turbopuffer also includes enterprise-grade compliance features like HIPAA BAA,
SOC 2, and CMEK, even on the non-enterprise plan. Enterprise plans allow you to
BYOC (bring your own cloud) and native multi-tenant support.
One of the most compelling reasons to use Turbopuffer is its cost. It came in an
order of magnitude cheaper than some other solutions, even when considering open
source self hosted solutions. Using the standard pricing test, the cost comes in
at under $10/month, with a minimum spend of $64/month. Their pricing calculator
is clear and predictable, with no hidden fees.
Pinecone supports vector similarity search with metadata filtering and offers
built-in embeddings at an extra cost. It's available on AWS, GCP, and Azure, and
scales to billions of vectors with solid reliability.
Pinecone's limits include up to 100k namespaces in their standard plan but only
20 indexes, with higher limits available on enterprise plans.
Pinecone pricing can be confusing as there are many different options such as
pods-based pricing, serverless pricing, and extra add-ons for rerank, embedding,
support, and "assistant" features. There is a
pricing calculator available to
help you estimate cost. Based on our standard pricing test, the total cost comes
in at $41. They do offer a free tier and paid plans start at $50/month minimum
usage.
Pinecone's built-in inference covers embedding and re-ranking. We found the
available embedding models somewhat limiting and would prefer to use an external
embedding model anyway.
Qdrant supports filtering, clustering, and hybrid scoring, and works well with
high-cardinality metadata. You can self-host via Docker or Kubernetes, or use
their managed service.
The API is well-documented, with SDKs in
several languages, including
rust which is somewhat rare.
Multi-tenancy is extremely
flexible with a multitude of sharding options.
Their cloud pricing is based on storage and compute use, with a small free tier
available. A pricing calculator is
available, and based on our standard test the price is $102 on AWS us-east
without quantization, which can reduce memory usage. With disk caching and
quantization turned on, this can be reduced to $27.
pgvector is ideal for teams already using Postgres that want to unify structured
data with vector search. You get full SQL support, transactional guarantees, and
the benefits of a mature ecosystem.
It's open source and free to use. Costs come down to whatever infrastructure
you're running Postgres on. For teams already running Postgres in production,
this is a low-friction entry point, although studying the different indexing
options will be prudent. Managing a vector database on pgvector is not as easy
as a dedicated option that's targeted specifically for vector search. It also
comes preinstalled on many popular vendors such as
Supabase,
AWS,
and Neon.
While pgvector can be a great choice if you're already familiar and comfortable
with Postgres, you should be aware of some risks. Having a vector database that
lives next to your main content is very convenient, but vector indexes can use a
lot of memory and performance and costs of your database can be negatively
affected. Based on your usage scenario, you also must pick between IVFFlat and
HNSW, which is a tradeoff between querying performance and memory usage.
To model your data efficiently, you may want to use
partitioning to
reduce the size of your indexes, especially in a multi-tenant situation. If
you're using an ORM, such as Prisma, as of September 2025, they still don't
fully support pgvector and
partitioning without
workarounds.
Vectorize supports 50k namespaces and indexes per account and up to 5M vectors
per index.
The serverless model makes it easy to use and it integrates well with other
Cloudflare products. It's one of the easiest solutions to get up and started
with if you're already on the Workers platform. Vectorize does not yet appear in
the
compatibility
matrix for their data location suite which can make data residency compliance
difficult or impossible.
Cloudflare also offers an auto-RAG feature built on top of vectorize, R2, and
workflows which can work great for simple implementations but we found it to be
a little slow and it's difficult to integrate from a SaaS provider perspective
where we need to communicate indexing status to a multi-tenant dashboard.
Vectorize has an HTTP API but the native SDK is only available inside workers
itself.
Unfortunately Cloudflare does not support full text indexes and only a limited
amount metadata (attributes). This makes a hybrid approach very difficult as
you'd need a different database for FTS.
Pricing is usage-based and for our standard test of 1 million documents with 1
million reads and 1 million writes, the cost is $47. Embeddings can be generated
manually or by using Cloudflare's AI models.
Weaviate provides semantic search, hybrid scoring, gRPC, and GraphQL support. It
supports multi-modal inputs (text, image, video) and offers built-in embedding
options provided by third party integrations. You can self-host or use their
cloud service.
Weaviate has two pricing models, a classic cloud deployable model where you pay
for "AIU"s and a more transparent serverless pricing model. Serverless
pricing is usage based on stored
vector dimensions and query usage, with a starting plan around $25/month. Our
test of 1536 dimensions with 1 million reads and writes works out to $153 but if
you choose the less performant compression version, it's only $25.
Milvus supports distributed deployments on Kubernetes and includes more indexing
strategies than any other competitor we could find such as IVF, HNSW, and
DiskANN. It's best suited for enterprise-scale use cases where infrastructure is
not a bottleneck. Costs come down to your infrastructure and operations
complexity. Milvus uses collections rather than namespaces.
Zilliz has a pricing calculator.
Pricing for serverless 1536 dimension vector with 1 million reads and 1 million
writes is $89. There is also a dedicated version which estimates a cost of $114.
They also offer a free plan with up to 5gb of storage.
SQLite is the world's most deployed database because it's fast and embeddable.
sqlite-vec is an extension to SQLite and also the successor to sqlite-vss, an
earlier and less performant solution by the same author. Sqlite-vec is
particularly appealing in situations where each customer or user has their very
own database instance leading to nearly unlimited horizontal scalability.
If you'd rather not deal with the infrastructure behind using SQLite in the
cloud, Turso.tech offers their own hosted sqlite solution
with built-in vector support.
Turso's solution has no extension to install and the API is easy as it's just a
type of database. However, it's important to note that the libSql version is not
the same as sqlite-vec, so you won't be able to migrate between the two.
If you want to accomplish a hybrid search with both full text search and vector
search, you'll need a separate full text search extension such as FTS5, which
also comes preloaded when using Turso.
Rather than using namespaces, you can use a separate database for each client to
achieve true multi-tenancy or even per-user tenancy. With Turso, reads are also
done from local replicas which makes latency extremely low.
Turso has extremely low pricing including a free tier and up to 25 million
queries for only $5/month. Enterprise features such as SOC2 and HIPAA will bump
to their enterprise plan which starts around $400/month.
If you're running sqlite in embedded manner, the performance will be limited to
what hardware is available on the client. This may be fine for local chat memory
or limited documentation, but this won't scale to millions of documents.
Much like pgvector and sqlite-vec, MongoDB's solution exists within a database
you may already be familiar with. As with those solutions, be prepared for
increased memory usage for vector indexes.
Pricing is difficult to calculate with atlas as their calculator does not have
vector specific pricing and it's usage based on instance size. MongoDB does have
a free community edition under the
Server Side Public License.
MongoDB has many options for clients, supports
hybrid searching,
and is only limited by scaling strategy and hardware.
Chroma has full-text, metadata, and vector search available. Rather than
namespaces, it uses collections and above that is databases and tenants.
Interestingly, internally Chroma uses
sqlite and
object storage much of its functionality.
Chroma has many SDK clients as well as an HTTP API available for use with any
language. Chroma's documentation is lacking in some areas, especially the
open-source clients, but it's simple enough that we found getting started was no
problem.
Chroma's cloud offering has simple usage based pricing with a nice pricing
calculator which works out to $81 for 1536
dimension vector with 1 million writes and 1 million queries.
Redis 8.0 introduced a new native vector type that makes it
one of the fastest in terms of raw speed. Redis also recently switched back to
open-source under an AGPL license after a controversial closing which led to the
success of forks such as Valkey.
If you're already familiar with Redis, then it's a solid choice, but as with
other solutions, you need to consider the size and shape of documents you wish
to store. Redis achieves its performance by keeping everything in memory, and
while this is super fast, it also means you need the hardware to support it. It
can also use SSD, but it will suffer some performance loss.
Redis offers up to 30mb for free and 1gb for $5/month. They also have flexible
options for hosting on AWS, Azure, and GCP.