🧪 Evaluate Your Config

Submit your SOUL.md and see where you rank

Upload your agent config files, choose models and scope, then submit for evaluation. Results appear on the leaderboard automatically.

1
Upload Config
SOUL.md (required) + optional AGENTS.md, TOOLS.md
📄

Drop your SOUL.md here or click to browse

📄 Drop AGENTS.md or click to browse

📄 Drop TOOLS.md or click to browse

❓ FAQ

How does the evaluation work?
Your config files (SOUL.md, AGENTS.md, TOOLS.md) are used as the system prompt for the agent. We run 5–40 realistic tasks against it and use an LLM judge to score the responses across multiple dimensions like memory, emotional intelligence, safety, and more.
Is my config data safe?
Your config is submitted as a public GitHub Issue. If your config contains sensitive information, consider redacting it before submission. We only use it for evaluation purposes.
How long does evaluation take?
Quick (~5 tasks): ~5 minutes. Standard (~19 tasks): ~30 minutes. Full (~40 tasks): ~2 hours. Times may vary based on queue load.
Which models can I test with?
Currently supported: GPT-4.1, GPT-4o, GPT-4o Mini (OpenAI), Claude Sonnet 4 (Anthropic), Gemini 2.5 Pro (Google), DeepSeek V3. We're adding more models regularly.
Can I submit multiple configs?
Yes! Each submission is independent. You can iterate on your config and see how changes affect scores.