Vector Database Startups to Watch

In an era where data is the new oil, and AI the new electricity, databases have become a critical component of the modern technology stack.

Traditionally, developers and companies relied on relational databases to store and manage structured data, but with the proliferation of AI and large language models (LLMs), these technologies has presented a set of new requirements.

This is where vector databases come into play.

Unlike traditional databases that store text or numerical values, vector databases handle multi-dimensional arrays—i.e. vectors—that represent complex data like images, audio, or natural language. These databases are particularly useful for efficiently searching, storing, and managing the enormous datasets used by machine learning models and large language models (LLMs).

What is a Vector Database?

A vector database is a specialized data management system designed to handle vectors—ordered arrays of numerical values—that are often used in machine learning and AI applications. These vectors serve as mathematical representations of complex, often unstructured, data such as text, images, or audio.

Vector databases enable faster and more efficient search operations by measuring the similarity between these vectors rather than exact string matches.

Use Cases of Vector Databases

  1. Semantic Search: Vector databases can power advanced search functionalities that go beyond keyword matching to understand the meaning behind the query.
  2. Recommendation Systems: Whether it's suggesting a product or a piece of content, these databases enable highly personalized recommendations based on user behavior and preferences.
  3. Natural Language Processing: From chatbots to sentiment analysis tools, vector databases facilitate real-time language understanding and processing.
  4. Computer Vision: They also can store and search through image embeddings, helping in image recognition, and other vision-based tasks.
  5. Fraud Detection: By analyzing patterns and anomalies, vector databases can significantly improve the accuracy of fraud detection systems.

As the need to process and understand unstructurted data grows, vector databases will be a key sub-field of AI startups to watch. As CB Insights highlights:

By 2025, around 80% of generated data will be unstructured

In this guide, let's look at the top vector database startups that are well-positioned to take advantage of the fact that nearly every tech company is adopting AI at an unprecedented pace.


Pinecone

source

Pinecone is one of the most well-funded and widely used vector database startups. Based in New York, the company was founded in 2019 by Edo Liberty, a former research director at AWS and Yahoo!.

  • Pinecone’s vector database is designed to store all the information and knowledge that AI models and Large Language Models (LLMs) use to function.
  • The company’s vector database model provides a novel way to search through data, which is superior for AI models. Pinecone’s vector database is fully-managed, developer-friendly, and easily scalable.
  • The company offers a free starter plan and transparent, resource-based pricing that puts you in control. 

Pinecone’s pricing is based on hourly billing determined by the per-hour-price of a pod multiplied by the number of pods the index uses.

Pinecone has raised $138 million in total funding, including a $100 million Series B round in April 2023 led by Andreessen Horowitz with participation from ICONIQ Growth and previous investors Menlo Ventures and Wing Venture Capital.

Vector Database for Vector Search | Pinecone
Search through billions of items for similar matches to any object, in milliseconds. It’s the next generation of search, an API call away.

Zilliz

Zilliz is a vector database startup that was founded in 2017 and is headquartered in San Francisco, California. The company develops high-performance vector database management systems for AI applications, including the management and processing of feature vectors for AI algorithms used to represent the deep semantics of unstructured data.

  • Zilliz’s flagship product is Milvus, an open-source vector database that has become the world’s most popular open-source vector database with over a thousand end-users.
  • Milvus is aimed at helping AI applications turn unstructured data into intelligent, usable information for applications such as new drug discovery, computer vision, recommendation engines, and chatbots.

Founded by Charles Xie, the company has raised a total of $113 million in venture capital funding to date, with its latest funding round being a $60 million extension to its initial $43 million Series B round led by Prosperity7 Ventures.

Vector Database built for enterprise-grade AI applications - Zilliz
Zilliz vector database management system - fully managed Milvus - supports billion-scale vector search and is trusted by over 1000 enterprise users.

Weaviate

Weaviate is an Amsterdam-based startup that develops a hybrid SaaS platform to build search and recommendation systems. 

  • Their open-source AI vector search engine has extensions in the platform for specific use cases such as semantic search, plugins to integrate into any application, and a console to understand their data.
  • Weaviate can be used stand-alone or with a variety of modules that can do the vectorization for you and extend the core capabilities.

The company has raised a total of $67.7 million in venture capital funding, with $50 million raised in their latest Series B funding round led by Index Ventures with participation from Battery Ventures and existing investors, including NEA, Cortical Ventures, Zetta Venture Partners, and ING Ventures. 

Welcome | Weaviate - vector database
Welcome to Weaviate

Chroma

Chroma is an AI-native open-source embedding database that operates out of San Francisco, California.

  • Chroma offers an open-source embedding database that allows users to store embeddings and search by nearest neighbors rather than by substrings like a traditional database. 
  • The company uses Sentence Transformers to embed by default, but users can also use OpenAI embeddings, Cohere (multilingual) embeddings, or their own. 

Chroma offers a free version of its product for non-commercial use. For commercial use, the company offers a paid version with additional features and support. 

As of April 2023, Chroma has raised a total of $20.3 million in venture capital funding across two rounds. The latest funding round was a seed round in April 2023, which raised $18 million from investors including Akshay Kothari and Quiet Capital.

the AI-native open-source embedding database
the AI-native open-source embedding database

Redis

Redis is an open-source, in-memory data structure store that is used as a database, cache, and message broker. Redis has a wide range of use cases, including real-time analytics, caching, messaging, and session management.

  • Redis Labs offers a commercial version of Redis called Redis Enterprise.
  • Redis Enterprise provides additional features such as active-active geo-distribution, multi-model support, and enterprise-grade security.
  • It also offers a managed service called Redis Cloud that provides fully managed Redis instances on public clouds such as AWS, GCP, and Azure.

Redis Labs has raised a total of $347 million in venture capital funding to date. The company’s latest funding round was a Series F round in May 2021 that raised $110 million.

Redis
Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker

]


Supabase

Supabase is an open-source toolkit for developing AI applications using Postgres and pgvector. It provides a vector store and embeddings support using Postgres and pgvector, which allows developers to store, index, and query vector embeddings at scale. 

  • Supabase’s AI toolkit includes a Python client for managing unstructured embeddings.
  • Supabase’s Postgres Vector database is a powerful feature that allows developers to develop, integrate, and deploy secure and enterprise-grade AI applications at unprecedented speed.
  • The company’s AI toolkit also includes support for Hugging Face and OpenAI.

Supabase was founded by Paul Copplestone and John Trammell in 2020. The company has raised a total of $116M in funding. Its latest funding round was a Series B round in May 2022, where it raised $80M.

Supabase | The Open Source Firebase Alternative
Build production-grade applications with a Postgres database, Authentication, instant APIs, Realtime, Functions, Storage and Vector embeddings. Start for free.

Qdrant

Qdrant is a Berlin-based startup that provides an open-source vector search engine and database for unstructured data, which is an integral part of AI application development, particularly as it relates to using real-time data that hasn’t been categorized or labeled. 

  • Qdrant’s notable products include a vector similarity search engine and vector database, which provides a production-ready service with a convenient API to store, search, and manage points—vectors with an additional payload.
  • Qdrant’s pricing model includes a free and open-source plan, self-hosted full-featured community support, managed cloud starting from $25 per node/month billed hourly, and enterprise custom features. 

Qdrant’s latest venture capital funding round was a $7.5 million seed financing from lead investor Unusual Ventures, with participation from 42cap, IBB Ventures, and angel investors, including Cloudera co-founder Amr Awadallah.

The company was founded in 2021 and has raised a total of $9.77 million in venture capital funding.

Qdrant - Vector Database
Qdrant is an Open-Source Vector Database and Vector Search Engine written in Rust. It provides fast and scalable vector similarity search service with convenient API.

SingleStore

SingleStore is a database management system that offers high-throughput transactions, low-latency analytics, and real-time AI capabilities.

  • The company’s platform provides a built-in vector database functionality that is well-suited for AI-based applications, chatbots, image recognition, and more. 
  • This eliminates the need for running a specialty vector database solely for vector workloads.
  • The company claims that its database platform can process a trillion rows per second, up to 1,000 times faster than some rivals. 

SingleStore has raised a total of $412 million in venture capital funding as of October 2022. The latest funding round was Series F-2 financing of $30 million.

SingleStoreDB | Real-Time Analytics. Real-Time Applications. Real-Time AI. Right Now.
Backed by streaming data ingestion, a unique table type that supports both transactional (OLTP) and analytical (OLAP) workloads and limitless point-in-time recovery, SIngleStoreDB empowers the world’s makers to build, deploy and scale modern, real-time intelligent applications.

Vespa

Vespa is an open-source big data serving engine that provides a fully-featured search engine and vector database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query.

Vespa scales to any amount of data and traffic, and is built on a C++ core providing hardware-near optimizations and efficient utilization of any amount of memory and cores.

Vespa’s notable products include:

  • Search: Vespa is a fully featured search engine and vector database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query.
  • Recommendation and personalization: Vespa lets you build applications that evaluate recommender models over content items to select the best ones.
  • Conversational AI: Vespa integrates building blocks for large language models in scalable form.
  • Semi-structured navigation: Vespa provides all the features required for structured navigation - grouping data dynamically for navigation and filtering - in combination with search and recommendation.

Vespa is a spinout from Yahoo, which announced in Octobober 2023 they would turn it into an independant company.

Vespa - the big data serving engine

Marqo

Marqo is an Australian startup that specializes in vector search. The company was founded in 2022 by Jesse Clark and Tom Hamer, two former Amazon engineers.

  • Marqo’s core selling point is that it promises a full array of vector search smarts out of the box, including vector generation, storage, and retrieval. 
  • This means that Marqo allows its users to bypass third-party vector-generation tools from the likes of OpenAI or Hugging Face, to offer everything via a single AP.
  • Marqo provides an end-to-end system that brings all of these components together solving a major pain point for developers.

Marqo’s latest seed funding round was led by Blackbird Ventures and Creator Fund, with participation from January Capital and Cohere co-founders Ivan Zhang and Aidan Gomez. The company raised $5.2 million in seed funding.

Marqo | Multimodal Vector Search
Marqo is an end-to-end, multimodal vector search engine. Users can store and query unstructured data such as text, images, and code through a single easy-to-use API.