Models & Limits

For Tool planning summarization and Completion, we offer two models: kirha and kirha-flash.

kirha is our primary model, optimized for high-quality, nuanced, and context-aware responses. It’s ideal for complex tasks that require reasoning, depth, or extended context. Currently based on OpenAi Gpt 4.1. Limited to 150 completions per day.
kirha-flash is a faster, lightweight variant designed for low-latency use cases. It’s best suited for high-throughput applications or situations where response time is critical. Currently based on Gemini 2.5 flash. Limited to 500 completions per day.

Each model has a daily completion limit, and completions are currently rate-limited to 20 requests per minute. If you need a higher quota, increased rate limits, or support for a new model, feel free to reach out to us with your use case.

Model Id	Quota
kirha	150 summarization per day & 20 requests per minutes
kirha-flash	500 summarization per day & 20 requests per minutes