RAG (Retrieval-Augmented Generation)
RAG (Retrieval-Augmented Generation)
RAG (Retrieval-Augmented Generation)
RAG lets you upload files, automatically convert them into searchable chunks, and then query them with plain-language questions. The API returns the most relevant passages, which you can feed directly into a chat completion to get answers grounded in your own content.
The full flow is three steps:
file_idCall POST /v1/files to register the file. You get back a file_id and a short-lived signed_url — use the signed URL to PUT the actual bytes directly to storage.
The signed_url expires in a few minutes. PUT your file bytes immediately after receiving it.
After the PUT completes, the API automatically chunks and embeds your file (because embed: true). You can poll GET /v1/files/{file_id} to watch progress.
Once embedding_status is completed, the file is ready to search.
Send a natural-language query to POST /v1/files/search. The API converts your query into a vector, finds the closest chunks, and returns the raw text with relevance scores.
Combine search results with a chat completion to answer questions from your documents.
Tag files at upload time and filter at search time — useful when you have documents from different departments, clients, or time periods.
If embed was set to false at upload time, or if embedding failed, you can kick it off manually:
The response returns per-file status:
You can pass "wait": true to block until all embeddings finish (useful for small files in scripts).
Response includes files (array of file status objects), total, limit, and offset.