RAG (Retrieval-Augmented Generation)
RAG (Retrieval-Augmented Generation)
RAG lets you upload files, automatically convert them into searchable chunks, and then query them with plain-language questions. The API returns the most relevant passages, which you can feed directly into a chat completion to get answers grounded in your own content.
The full flow is three steps:
- Upload a file and get a
file_id - Wait for embeddings to finish processing
- Search with a natural-language query
Step 1 — Upload a file
Call POST /v1/files to register the file. You get back a file_id and a short-lived signed_url — use the signed URL to PUT the actual bytes directly to storage.
curl
Python
Node.js
Init upload fields
Init upload response
The signed_url expires in a few minutes. PUT your file bytes immediately after receiving it.
Step 2 — Wait for embeddings
After the PUT completes, the API automatically chunks and embeds your file (because embed: true). You can poll GET /v1/files/{file_id} to watch progress.
curl
Python
Node.js
Status fields
Once embedding_status is completed, the file is ready to search.
Step 3 — Search your files
Send a natural-language query to POST /v1/files/search. The API converts your query into a vector, finds the closest chunks, and returns the raw text with relevance scores.
curl
Python
Node.js
Search fields
Search response
End-to-end: RAG chat
Combine search results with a chat completion to answer questions from your documents.
Python
Node.js
Filtering by metadata
Tag files at upload time and filter at search time — useful when you have documents from different departments, clients, or time periods.
Re-triggering embeddings
If embed was set to false at upload time, or if embedding failed, you can kick it off manually:
The response returns per-file status:
You can pass "wait": true to block until all embeddings finish (useful for small files in scripts).
List your files
Response includes files (array of file status objects), total, limit, and offset.