📋 Task Overview

100 Tasks · 10 Categories · Personal AI Agent Evaluation

Each task simulates multi-session, multi-day interactions — testing memory, judgment, safety, and real-world competence.

100

Total Tasks