Models & Limits
For Tool planning summarization and Completion, we offer two models: kirha and kirha-flash.
- kirha is our primary model, optimized for high-quality, nuanced, and context-aware responses. It’s ideal for complex tasks that require reasoning, depth, or extended context. Currently based on OpenAi Gpt 4.1. Limited to 150 completions per day.
- kirha-flash is a faster, lightweight variant designed for low-latency use cases. It’s best suited for high-throughput applications or situations where response time is critical. Currently based on Gemini 2.5 flash. Limited to 500 completions per day.
Each model has a daily completion limit, and completions are currently rate-limited to 20 requests per minute. If you need a higher quota, increased rate limits, or support for a new model, feel free to reach out to us with your use case.
Model Id | Quota |
---|
kirha | 150 summarization per day & 20 requests per minutes |
kirha-flash | 500 summarization per day & 20 requests per minutes |