/api/v1/ai router powers everything related to the in-app assistant. It exposes
exactly two routes today: a public health probe and a unified chat endpoint that can
serve both traditional JSON responses and server-sent event (SSE) streams.
| Flow | Endpoint | Description |
|---|---|---|
| Health | GET /api/v1/ai/health | Returns provider, model, and capability metadata for monitoring. |
| Chat | POST /api/v1/ai/chat | Sends a user message, optional attachments, and returns the assistant reply (streaming or buffered). |
Authentication
- Registered users send
Authorization: Bearer <Auth0 access token>(same token issued by the Auth service). - Guest devices omit the bearer token and instead send:
x-device-id(required): stable per installation.x-platform:android,ios, orweb.x-user-id,x-user-email,x-user-phone(all optional hints that help map the device to an existing profile faster).
userId
and an isGuest flag.
Rate limits & quotas
- Hard rate limit: 20 chat requests per minute per user/device. Exceeding this
returns
429with the standard error envelope plus codeRATE_LIMIT_EXCEEDED. - Soft quota:
usageLimitMiddlewarechecks each message againstFREE_MESSAGE_THRESHOLD. Crossing the free allowance yields429+FREE_LIMIT_EXCEEDEDwithdetails.requires_signupset for guests so clients can prompt for signup.
Request payload
messageis required (1–2000 chars).conversation_id(optional) must be a valid MongoDB ObjectId; omit to start a new thread.model(optional) overrides the default configured model. The backend still enforces provider allowlists.attachmentsaccept up to 20MB each. Either pass a previously uploadedfileIdor inlinedata+mimeType+filename.
Attachments
Inline attachments are decoded, size-checked (20MB), and converted into the AI SDK’s multimodal format before the LLM call. Referenced files go through the file-upload service and are re-hydrated via GCS if necessary. Non-image/PDF files are sent astype: 'file' parts so models like Claude Sonnet and GPT-4o can read them.
Streaming vs non-streaming
- Set
X-Stream-Response: trueorStream: trueto receive SSE chunks intext/event-streamformat. Chunks arrive as JSON objects keyed bytype(token,tool_call,tool_result,done,error). - Omit the header (or set it to
false) to get the regular JSON envelope with the full assistant message, token count, and tool call metadata. - You can reuse the same endpoint for both behaviors, which simplifies client routing.
Example curl (buffered)
Example curl (streaming guest)
Error handling
All responses use the shared{ success, data|error, meta } envelope. Expect:
401when neither bearer auth nor device headers are present.429for both rate limiting and free-tier depletion (checkerror.code).500when upstream providers (LLM, RAG, storage) fail. The controller still attempts to send a friendly fallback message when the LLM output is empty.