πŸ“‹ Task Overview

100 Tasks Β· 10 Categories Β· Personal AI Agent Evaluation

Each task simulates multi-session, multi-day interactions β€” testing memory, judgment, safety, and real-world competence.

100
Total Tasks
10
Categories
2-4
Sessions/Task
7
Eval Dimensions