Bluewoo HRMS
Deployment

MongoDB Atlas Setup

Step-by-step guide to setting up MongoDB Atlas for HRMS AI service

MongoDB Atlas Setup

This guide provides detailed instructions for setting up MongoDB Atlas as the vector database for the HRMS AI service.

Overview

SettingValue
PurposeVector storage for AI/RAG
Databasehrms_ai
Collectionsdocuments, document_chunks
Vector Index1536 dimensions (OpenAI)

Why MongoDB Atlas?

  1. Built-in vector search: Native $vectorSearch aggregation
  2. Managed service: No infrastructure to maintain
  3. Free tier available: M0 for development/staging
  4. GCP integration: Deploy in same region as Cloud Run

Step 1: Create Atlas Account and Project

  1. Go to MongoDB Atlas
  2. Sign up or log in
  3. Create new project: Bluewoo HRMS

Step 2: Create Cluster

Staging Cluster (Free)

  1. Click Build a Database
  2. Select M0 (Free)
  3. Provider: Google Cloud
  4. Region: Belgium (europe-west1)
  5. Cluster name: hrms-ai-staging
  6. Click Create

Production Cluster

  1. Click Build a Database
  2. Select M10 (or higher)
  3. Provider: Google Cloud
  4. Region: Belgium (europe-west1)
  5. Cluster name: hrms-ai-prod
  6. Configure:
    • Storage: 10GB (auto-scale)
    • Backup: Enable continuous backup
  7. Click Create

Step 3: Create Database User

  1. Go to Database Access (left sidebar)
  2. Click Add New Database User
  3. Authentication: Password
  4. Username: hrms_ai_app
  5. Password: Generate secure password (save it!)
  6. Database User Privileges:
    • Select Built-in Role: readWrite
    • Database: hrms_ai
  7. Click Add User

Step 4: Configure Network Access

For Development (Allow All)

  1. Go to Network Access (left sidebar)
  2. Click Add IP Address
  3. Click Allow Access from Anywhere (0.0.0.0/0)
  4. Click Confirm

⚠️ Production: Restrict to GCP Cloud Run egress IPs or use VPC Peering.

Option A: IP Access List with GCP NAT

  1. Create Cloud NAT for Cloud Run egress
  2. Add NAT IP to Atlas allowlist

Option B: VPC Peering (Most secure)

  1. Go to Network AccessPeering
  2. Follow Atlas VPC Peering setup for GCP

Step 5: Get Connection String

  1. Go to Database → Click Connect on your cluster
  2. Select Drivers
  3. Driver: Node.js / Version: 5.5 or later
  4. Copy connection string:
mongodb+srv://hrms_ai_app:<password>@hrms-ai-staging.xxxxx.mongodb.net/?retryWrites=true&w=majority
  1. Replace <password> with actual password
  2. Add database name:
mongodb+srv://hrms_ai_app:password@hrms-ai-staging.xxxxx.mongodb.net/hrms_ai?retryWrites=true&w=majority

Step 6: Create Database and Collections

Using mongosh

# Connect to cluster
mongosh "mongodb+srv://hrms-ai-staging.xxxxx.mongodb.net/" \
  --username hrms_ai_app \
  --password <PASSWORD>

# Create database and collections
use hrms_ai

db.createCollection("documents")
db.createCollection("document_chunks")

# Verify
show collections
# Should show: documents, document_chunks

Using Atlas UI

  1. Go to DatabaseBrowse Collections
  2. Click Add My Own Data
  3. Database: hrms_ai
  4. Collection: documents
  5. Click Create
  6. Repeat for document_chunks

Step 7: Create Vector Search Index

This is the critical step for RAG functionality.

  1. Go to DatabaseBrowse Collections → Select hrms_ai.document_chunks
  2. Click Search Indexes tab
  3. Click Create Search Index
  4. Select JSON Editor
  5. Index name: vector_index
  6. Paste this configuration:
{
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "type": "knnVector",
        "dimensions": 1536,
        "similarity": "cosine"
      },
      "tenantId": {
        "type": "string"
      },
      "sourceType": {
        "type": "string"
      }
    }
  }
}
  1. Click Create Search Index
  2. Wait for index status to become Active (may take a few minutes)

Using mongosh

db.document_chunks.createSearchIndex({
  name: "vector_index",
  definition: {
    mappings: {
      dynamic: true,
      fields: {
        embedding: {
          type: "knnVector",
          dimensions: 1536,
          similarity: "cosine"
        },
        tenantId: {
          type: "string"
        },
        sourceType: {
          type: "string"
        }
      }
    }
  }
});

Step 8: Create Standard Indexes

Add indexes for common queries:

// Tenant isolation
db.documents.createIndex({ tenantId: 1 });
db.document_chunks.createIndex({ tenantId: 1 });

// Document lookup
db.documents.createIndex({ tenantId: 1, sourceType: 1, sourceId: 1 }, { unique: true });
db.document_chunks.createIndex({ tenantId: 1, documentId: 1 });

// Verify indexes
db.documents.getIndexes();
db.document_chunks.getIndexes();

Document Schema

documents Collection

interface Document {
  _id: ObjectId;
  tenantId: string;
  sourceType: 'document' | 'employee' | 'policy';
  sourceId: string;
  title: string;
  content: string;
  metadata: Record<string, unknown>;
  createdAt: Date;
  updatedAt: Date;
}

document_chunks Collection

interface DocumentChunk {
  _id: ObjectId;
  tenantId: string;
  documentId: string;
  content: string;
  embedding: number[]; // 1536 dimensions
  chunkIndex: number;
  metadata: {
    sourceType: string;
    sourceId: string;
    title: string;
  };
  createdAt: Date;
}

Vector Search Query

Example RAG query in the AI service:

async function searchSimilarDocuments(
  tenantId: string,
  queryEmbedding: number[],
  limit: number = 5
) {
  const collection = db.collection('document_chunks');
  
  const results = await collection.aggregate([
    {
      $vectorSearch: {
        index: 'vector_index',
        path: 'embedding',
        queryVector: queryEmbedding,
        numCandidates: limit * 10,
        limit: limit,
        filter: {
          tenantId: tenantId
        }
      }
    },
    {
      $project: {
        content: 1,
        metadata: 1,
        score: { $meta: 'vectorSearchScore' }
      }
    }
  ]).toArray();
  
  return results;
}

Connection in AI Service

Environment Variable

MONGODB_URI="mongodb+srv://hrms_ai_app:<password>@hrms-ai-prod.xxxxx.mongodb.net/hrms_ai?retryWrites=true&w=majority"

Service Code

// apps/ai/src/services/mongodb.ts
import { MongoClient, Db } from 'mongodb';

let client: MongoClient | null = null;
let db: Db | null = null;

export async function connectMongo(): Promise<Db> {
  if (db) return db;
  
  const uri = process.env.MONGODB_URI;
  if (!uri) {
    throw new Error('MONGODB_URI not configured');
  }
  
  client = new MongoClient(uri);
  await client.connect();
  
  db = client.db('hrms_ai');
  
  // Verify connection
  await db.command({ ping: 1 });
  console.log('Connected to MongoDB Atlas');
  
  return db;
}

export async function getDocumentsCollection() {
  const database = await connectMongo();
  return database.collection('documents');
}

export async function getChunksCollection() {
  const database = await connectMongo();
  return database.collection('document_chunks');
}

export async function disconnectMongo() {
  if (client) {
    await client.close();
    client = null;
    db = null;
  }
}

Cluster Tiers and Pricing

TierRAMStorageUse CaseMonthly Cost
M0Shared512 MBDevelopmentFree
M2Shared2 GBSmall staging~$9
M5Shared5 GBLarge staging~$25
M102 GB10 GBSmall production~$57
M204 GB20 GBProduction~$140
M308 GB40 GBHigh traffic~$280

Monitoring

Atlas Metrics

  1. Go to DatabaseMetrics
  2. Key metrics:
    • Operations/second
    • Document reads/writes
    • Index size
    • Query targeting (should be >95%)

Vector Search Metrics

  1. Go to Search → Your index
  2. Monitor:
    • Index size
    • Search latency
    • Query volume

Troubleshooting

Connection Timeout

MongoNetworkError: connection timed out

Fix: Check IP whitelist in Network Access. Add your IP or 0.0.0.0/0.

Authentication Failed

MongoServerError: Authentication failed

Fix:

  1. Verify username/password
  2. Check user has access to hrms_ai database
  3. URL-encode special characters in password

Vector Search Not Working

MongoServerError: $vectorSearch is not allowed

Fix:

  1. Ensure you're using Atlas (not self-hosted MongoDB)
  2. Verify vector index exists and is Active
  3. Check index name matches query

Index Not Ready

Index is not ready for queries

Fix: Wait for index status to become "Active" (can take several minutes for large collections).


Security Best Practices

  1. Use dedicated user: Don't use Atlas admin account
  2. Restrict network: Use VPC Peering for production
  3. Enable audit logs: For compliance tracking
  4. Rotate credentials: Update password periodically
  5. Use Secret Manager: Store connection string securely