Retrieves relevant knowledge chunks from the collection and uses an LLM to generate a grounded answer. Supports both synchronous JSON responses and streaming via SSE (stream: true). Token usage is tracked and counted against the team’s quota.
Pass your API key as a Bearer token. Example: Authorization: Bearer sk-xxxxxxxxxxxx
Collection ID
The question to answer
"What does the document say about data retention?"
LLM model identifier to use for answering
"gpt-4o"
Answer mode. fast uses fewer retrieved chunks for a quicker response; full retrieves more context for a thorough answer.
fast, full If true, the response is streamed as Server-Sent Events (SSE). Each event has a type field: delta (text chunk), done (final metadata), or error.
Number of knowledge chunks to retrieve before generating the answer
1 <= x <= 50Restrict retrieval to the most current document versions only
Optional prior conversation turns for multi-turn context
[
{
"role": "user",
"content": "Summarise the document."
},
{
"role": "assistant",
"content": "The document covers..."
}
]Optional metadata filters to scope retrieval
Answer generated successfully. When stream=false, returns a JSON body. When stream=true, returns an SSE stream (Content-Type: text/event-stream).