CSVAnalystEnv

An OpenEnv-compatible benchmark for tabular reasoning agents.
13
Evaluation Tasks
3
Difficulty Levels
100%
OpenEnv Compliant
Live
FastAPI HTTP

How it works

Agents interact with a fixed CSV dataset representing e-commerce orders. Instead of writing raw code, agents must use a constrained action space (like filter_rows or groupby_aggregate) to explore the data and find the answer.

The environment enforces strict programmatic grading, limits episode length, and shapes behavior via normalized rewards (+1 for success, penalties for invalid tool use).

Core Endpoints

  • GET /tasks lists the question bank.
  • POST /reset begins an episode.
  • POST /step submits an action and returns the next observation.
  • GET /state returns the full episode transcript.
Open API Docs Human Interface Health Check GitHub Repo