Ask a question (RAG)
Retrieves relevant knowledge chunks from the collection and uses an LLM to generate a grounded answer. Supports both synchronous JSON responses and streaming via SSE (stream: true). Token usage is tracked and counted against the team’s quota.
Documentation Index
Fetch the complete documentation index at: https://docs.aigmented.io/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Pass your API key as a Bearer token. Example: Authorization: Bearer sk-xxxxxxxxxxxx
Path Parameters
Collection ID
Body
The question to answer
"What does the document say about data retention?"
LLM model identifier to use for answering
"gpt-4o"
Answer mode. fast uses fewer retrieved chunks for a quicker response; full retrieves more context for a thorough answer.
fast, full If true, the response is streamed as Server-Sent Events (SSE). Each event has a type field: delta (text chunk), done (final metadata), or error.
Number of knowledge chunks to retrieve before generating the answer
1 <= x <= 50Restrict retrieval to the most current document versions only
Optional prior conversation turns for multi-turn context
[
{
"role": "user",
"content": "Summarise the document."
},
{
"role": "assistant",
"content": "The document covers..."
}
]Optional metadata filters to scope retrieval
Response
Answer generated successfully. When stream=false, returns a JSON body. When stream=true, returns an SSE stream (Content-Type: text/event-stream).