
Today's agents typically boast near-80% accuracy in academic benchmarks, which are not representative of real-world applications. Just as self-driving vehicles must meet a much higher standard than human-driven cars, autonomous agents must achieve a similarly high level of trustworthiness—99.9% or better—to be truly hands-off.
By releasing and supporting Pasta-1T, we aim to help the community bridge the usefulness gap so that agents can graduate from simple benchmarks to solving reliably complex tasks autonomously.
Specifically, this release includes:
All data processing scripts are open source and available on our repository. The dataset will be available through the open standard Minari. This is just the first of what we hope will be many contributions to the open community focused on improving AI agents.
For now, users must complete a form, as we need to track usage and potential ethical and copyright issues. After peer review, the dataset will be open-accessed through the proper academic channels.
The dataset consists of web trajectories generated by web agents interacting with websites. These interactions are collected through Silverstream AI's autonomous web agents in a setup that uses a language model-driven exploration policy to perform a self-defined curriculum of tasks within a browser environment. Each data point includes a self-defined task, its expected consequences, and the set of states and actions to achieve it. We ensure that the task is feasible and valuable and that the agent successfully completes it. This approach helps maintain consistency across tasks while capturing diverse interaction patterns.

We apply some filtering to ensure the dataset consists of high-quality and meaningful interactions. We discard trajectories that terminate too quickly without reaching a meaningful number of steps or state changes. This ensures that only trajectories demonstrating substantial interaction and complexity are retained to inform learning signals such as rewards.

Each trajectory is evaluated using an LLM as a judge, inspired by the Generalized Value Functions. The LLM judge assesses the trajectory against:
Trajectories are scored on a 1–5 scale:
Only trajectories scoring 4 or 5 are included in the dataset. To ensure non-triviality, tasks must require at least three meaningful and independent steps. The differential world model analyzes changes in the browser state at each step to confirm that interactions involve distinct and significant actions, avoiding superficial or repetitive tasks.
The dataset is a collection of episodes:
Each episode in the dataset includes the following components:
A trajectory represents a step-by-step interaction between the agent and the webpage. A trajectory is a sequence of transitions, with each transition consisting of:
Observation:
Action:
Want to explore the Pasta-1T dataset and join the Silverstream AI community? Fill out this form to request access. For collaboration inquiries or more information, feel free to contact us.
Tell us about your enterprise use case. We will follow up with a tailored demo.