AI Engineering

MongoDB Atlas Vector Search: RAG Without the Complexity

MongoDB is redefining how AI engineers build RAG applications by eliminating the need for separate vector databases. With MongoDB Atlas Vector Search, you can store embeddings alongside your operational data, scale vector workloads independently, and integrate seamlessly with all major AI frameworks—all in one platform.

"What we've done is we've added in HNSW indexes into MongoDB Atlas which allows you to do approximate nearest neighbor Vector search over data that's stored in your database."

Ben Flast, MongoDB

Watch (00:05:16)
4,096

Max vector dimensions

5+

Framework integrations

100+

Global regions

Free

Forever M0 tier

MongoDB logo

Key Takeaways

Unified Platform

  • MongoDB combines transactional database and vector search capabilities
  • Eliminates the need for separate vector databases and complex ETL pipelines
  • Store embeddings alongside operational data in JSON documents

Independent Scaling

  • Scale vector workloads independently from operational data
  • Search nodes optimize resource allocation and cost
  • Vector indexes stored on dedicated infrastructure

Framework Integration

  • Official integrations with LangChain, LlamaIndex, Microsoft Semantic Kernel, AWS Bedrock
  • Plug-and-play RAG development with familiar tools
  • Multiple primitives: vector stores, chat history, semantic caching

Flexible Document Model

  • Store complex, nested JSON structures with embeddings
  • No schema migrations or data transformation required
  • Naturally horizontally scalable through sharding

Free to Try

  • MongoDB Atlas M0 free tier includes full vector search capabilities
  • Easy to prototype and test RAG applications
  • No credit card required for development

Production-Ready

  • 100+ regions across AWS, Google Cloud, Azure
  • 99.95% uptime SLA with comprehensive security controls
  • SOC 2, HIPAA, GDPR compliance

The Problem with Traditional RAG

Most RAG implementations follow a familiar pattern: user prompt → embedding model → vector database → LLM → response. While this works for basic chatbots and Q&A systems, tomorrow's AI applications need more context.

"If you took kind of vanilla LLM connected to nothing and asked it how much money is in your bank account it wouldn't know... but all of that said if we want to make useful applications with these LLMs then without context there's only so much you can do."

The limitation of LLMs without context

00:02:15
"RAG stands for retrieval augmented Generation. You take a generic AI or model um that you know today we're generally talking about llms but it has a training cut off it you know it's um missing your private data Maybe it hallucinates maybe it doesn't but overall it's not personalized and you take your data right and you augment it at the the time of prompting to give it the context that it needs to answer the questions."

Defining RAG and why it's necessary

00:01:49

Traditional RAG Limitations

Data Silos: Vector data stored separately from operational data

ETL Complexity: Synchronization between databases requires complex pipelines

Limited Context: Chat history, user preferences, and transactional data difficult to incorporate

Scaling Challenges: Vector and transactional workloads compete for resources

Technical Deep-Dive: HNSW Vector Search

MongoDB Atlas Vector Search uses HNSW (Hierarchical Navigable Small World), a state-of-the-art algorithm for approximate nearest neighbor search. HNSW builds a multi-layer graph structure that enables fast, high-recall vector similarity search even at billion-vector scale.

HNSW Algorithm Specifications

Key Specifications

  • Algorithm: HNSW (Hierarchical Navigable Small World)
  • Maximum Dimensions: 4,096 (covers all major embedding models)
  • Similarity Functions: cosine, euclidean, dotProduct
  • Supported Models: OpenAI, Cohere, Google Vertex AI embeddings

Performance Characteristics

  • Latency: 10-100ms for typical queries
  • Throughput: 100-1000+ queries/second
  • Scalability: Tested to billions of vectors
  • Recall: 95-99% with proper HNSW tuning
"You can store vectors that are up to 4,096 Dimensions."

Vector dimension support

00:05:36
"You can use our $vectorSearch aggregation stage to to go ahead and compute an approximate nearest neighbor search."

Vector search query syntax

00:06:26

$vectorSearch Aggregation Pipeline

MongoDB Query
db.financial_reports.aggregate([
  {
    "$vectorSearch": {
      "index": "vector_index",
      "path": "content_embedding",
      "queryVector": [0.0231, -0.1452, 0.0876, ...],
      "numCandidates": 100,
      "limit": 10,
      "filter": {
        "quarter": "Q4 2024",
        "sector": "Technology"
      }
    }
  },
  {
    "$project": {
      "symbol": 1,
      "quarter": 1,
      "content": 1,
      "score": { "$meta": "vectorSearchScore" }
    }
  }
])

numCandidates: Controls accuracy vs. speed (HNSW entry points)

limit: Number of results to return

filter: Pre-filter documents before vector search (hybrid search)

Search Nodes: Independent Scaling

One of MongoDB's most innovative features is the ability to scale vector search independently from transactional workloads using dedicated search nodes.

"What we've done is we've added in a new type of node into the platform that allows you to store your vector indexes on those nodes and scale them independently from the infrastructure that's storing your transactional data."

Search nodes architecture

00:07:14

Traditional Cluster

[Primary] ←→ [Secondary 1]
↓ ↓
[Transactional Data + Vector Indexes]

Vector and transactional workloads compete for resources

With Search Nodes

[Primary] ←→ [Secondary] [Search Node 1]
↓ ↓ ↓
[Transactional Data] [Vector Indexes Only]

Independent scaling based on workload patterns

Search Nodes Benefits

✓ Isolated Compute: Dedicated resources for vector workloads

✓ Independent Scaling: Scale search separately from OLTP

✓ Cost Optimization: Right-size resources for each workload

✓ Improved Performance: High-throughput scenarios optimized

AI Framework Integrations

MongoDB provides official integrations with all major AI frameworks, making it easy to build production-ready RAG applications.

LangChain

Vector store, chat message history, semantic caching, document loaders

MongoDBAtlasVectorSearchMongoDBChatMessageHistoryMongoDBCacheMongoDBDocumentLoader

LlamaIndex

Vector similarity search with metadata filtering, hybrid search

MongoDBVectorStoreAutomatic embedding generationVector + keyword search

Microsoft Semantic Kernel

Memory storage connectors for AI applications

Vector search capabilitiesMemory abstractions

AWS Bedrock

MongoDB Atlas as vector store for Bedrock agents

Native integrationEnterprise-ready

Semantic Caching with LangChain

Reduce LLM API calls by 30-50% while maintaining response quality. Cached queries return in 10-50ms vs. 500-2000ms for LLM generation.

from langchain_mongodb import MongoDBCache

set_llm_cache(MongoDBCache(
    connection_string="mongodb+srv://...",
    collection_name="semantic_cache",
    embedding=embeddings,
    similarity_threshold=0.90
))

Real-World Use Cases

AI Meeting Assistants

Teammates that listen to meetings, track tasks, and proactively provide contextual information

"A cool startup that's using us right now... they're building an AI teammate and not like a coding teammate but instead one that listens to your meeting tracks what you're doing fetches additional information and prompts you the user with that information that you may need to complete a task like write an email or schedule a project."

Watch (00:11:00)

Enterprise Knowledge Bases

Intelligent search across company documents, wikis, and internal resources with role-based access control

E-commerce Personalization

Combine purchase history, browsing behavior, and product descriptions in unified queries

Customer Support Chatbots

Full customer context in every query with conversation history and order history

Financial Research & Analysis

Query financial reports, earnings calls, and market data with semantic search

MongoDB vs. Traditional Vector Databases

FeatureMongoDB AtlasStandalone Vector DBs
Transactional DataNative supportRequires separate database
Vector SearchBuilt-inPrimary feature
Data SynchronizationAutomaticComplex ETL pipelines
ScalingSearch nodesScale entire cluster
ACID TransactionsYesLimited or none
Free TierYes (forever)Usually limited trials

Getting Started

MongoDB Atlas offers a forever-free tier (M0 cluster) with full vector search capabilities. Perfect for development, testing, and small-scale production applications.

M0 Free Tier

  • Storage: 512 MB
  • RAM/CPU: Shared
  • Vector Search: Full capabilities
  • Cost: $0 forever

Quick Start Steps

  1. Create Atlas account (no credit card)
  2. Create M0 free cluster
  3. Enable Vector Search
  4. Install integration (LangChain/LlamaIndex)
  5. Connect and build

Source Video

RAG and the MongoDB Document Model

Ben Flast • MongoDB

Video ID: 2Ey275TX4ZUDuration: ~13 minutes
Watch on YouTube

Research Note: All quotes in this report are timestamped and link to exact moments in the video for validation. This analysis covers MongoDB's HNSW vector search implementation, search nodes architecture, framework integrations (LangChain, LlamaIndex, Semantic Kernel, AWS Bedrock), and real-world use cases including AI meeting assistants, enterprise knowledge bases, and customer support chatbots.

Key Concepts: RAG, HNSW algorithm, vector search, semantic caching, chat history storage, document model, search nodes, ACID transactions, LangChain, LlamaIndex, independent scaling, unified platform