Evals & Datasets
Grounded exposes eval definitions, eval runs, datasets, and dataset rows through @grounded/client.
Eval Methods
grounded.evals.list({ agentId? })
grounded.evals.create(input)
grounded.evals.run(evalId, snapshotId)
grounded.evals.listRuns({ agentId?, snapshotId?, evalId? })Dataset Methods
grounded.datasets.list({ agentId? })
grounded.datasets.get(datasetId)
grounded.datasets.create(input)
grounded.datasets.listRows(datasetId)Current Eval Model
The backend currently supports classification evals.
const evalDef = await grounded.evals.create({
agentId: 'agt_123',
name: 'refund classification',
config: {
type: 'classification',
prediction: {
source: 'timeline_last_agent_json',
field: 'resolved',
},
groundTruth: {
source: 'tags_contains',
tag: 'resolved',
},
metrics: ['f1', 'precision', 'recall', 'accuracy'],
primaryMetric: 'f1',
},
});Create A Dataset
const dataset = await grounded.datasets.create({
agentId: 'agt_123',
evalId: evalDef.id,
name: 'refund support benchmark',
rows: [
{
inputJson: {
userPrompt: 'where is my refund?',
},
groundTruthJson: {
tags: ['resolved'],
},
metadataJson: {
segment: 'billing',
},
},
],
});Dataset rows use the current backend shape:
inputJsongroundTruthJsonmetadataJson
Run An Eval
Running an eval is a synchronous backend computation on an existing snapshot.
const result = await grounded.evals.run(evalDef.id, 'snap_456');
console.log(result.runId);
console.log(result.metrics);
console.log(result.confusionMatrix);The response shape is:
type EvalRunResult = {
runId: string;
metrics: {
f1: number;
precision: number;
recall: number;
accuracy: number;
};
confusionMatrix: {
tp: number;
fp: number;
tn: number;
fn: number;
};
sessionCount: number;
errorCount: number;
};Inspect Historical Runs
const runs = await grounded.evals.listRuns({
agentId: 'agt_123',
evalId: evalDef.id,
});Common Flow
const evalDef = await grounded.evals.create({
agentId: 'agt_123',
name: 'resolved classification',
config: {
type: 'classification',
prediction: {
source: 'timeline_last_agent_json',
field: 'resolved',
},
groundTruth: {
source: 'tags_contains',
tag: 'resolved',
},
metrics: ['f1', 'precision', 'recall', 'accuracy'],
primaryMetric: 'f1',
},
});
await grounded.datasets.create({
agentId: 'agt_123',
evalId: evalDef.id,
name: 'refund benchmark',
rows: [
{
inputJson: { userPrompt: 'where is my refund?' },
groundTruthJson: { tags: ['resolved'] },
},
],
});
const run = await grounded.evals.run(evalDef.id, 'snap_456');