Research Framework

Evaluation Handoff

What future Teleodynamic AI evaluation packets should contain. This page defines a public-safe field specification and sample static packet. It does not implement a live evaluation runtime, telemetry ingestion, or autonomous evaluation.

Handoff specification — no live evaluation runtime

Research framing: This page is static public explanation. There is no live teleodynamic runtime on Carcinus.org, and this page does not claim artificial life, consciousness, sentience, biological agency, production-safety proof, or UAIX certification.

1. Evaluation Packet Overview

An evaluation packet is a public-safe data bundle that an external system could export to Carcinus.org for public review. It contains evidence about the system's operation, resource usage, structural changes, and safety boundaries — but only what is safe to make public. The packet format is a handoff specification; nothing on this page ingests, processes, or evaluates live packets.

2. Resource Budget Summary

External systems should report their endogenous resource allocations in public-safe terms: compute budget, storage budget, API call budget, maintenance budget per component. The report should show allocated, consumed, and thresholds. Only aggregate public-safe figures should be included — no internal cost details, no pricing data, no proprietary consumption models.

3. Fast Loop Summary

The fast loop (parameter adaptation / inference) summary should report aggregate statistics: number of parameter updates per unit time, average update magnitude, convergence metrics, and resource consumption. It should not include model weights, training data, gradient traces, or internal representations. The summary is a public evidence envelope, not a model dump.

4. Slow Loop Proposal Summary

The slow loop (structural change) summary should report structural proposals considered and their outcomes: accepted, modified, rejected by No-op, or deferred. Each proposal should include a description, resource projection, constraint analysis, and final decision reason. The summary should make visible the rejection path — proposals that were evaluated and refused — as this is key evidence for the No-op safety concept.

5. Structural Operator Trace

The structural operator trace is a timestamped log of all structural operators applied: Split, Merge, Add, Retire, and No-op. Each entry should include the operator type, target component, reason, resource budget impact, and outcome. The trace provides the public evidence backbone for evaluating whether a system is maintaining constraint closure or merely accumulating complexity.

6. No-op Trace

The No-op trace is a specialized subset of the structural operator trace that focuses on rejected proposals. For each No-op event: what was proposed, what resource threshold it would have exceeded, and why rejection was the safest decision. A system that never records No-ops is either not evaluating proposals against resource budgets or is accepting everything — both are red flags in a teleodynamic evaluation framework.

7. Claim Boundary Checklist

Every evaluation packet should include a claim boundary checklist that verifies:

8. Public Symbol Evidence

Evidence should be linked through public symbols — stable URLs, content hashes, or public registries — not through opaque internal embeddings. Each evidence reference in the packet should resolve to a publicly inspectable resource. If evidence is not public, the packet must state that explicitly rather than providing a non-resolving link.

9. Reviewer Reconstruction Notes

Each evaluation packet should include plain-language notes allowing a human reviewer to reconstruct the chain of evidence: what data was collected, from which source, at which timestamps, using which method. The notes should be written for an audience that has access to the public evidence anchors but not to the system's internal state. Reviewers must be able to distinguish between "evidence available" and "evidence claimed but not verifiable."

10. Human Comprehension Notes

The packet should include a section written in plain English (or the reviewer's language) that explains what the packet means in non-technical terms: what changed, why it matters, what risks are introduced or mitigated, and what the reviewer should look at next. This section is the bridge between machine-readable evidence and human understanding. It should not require specialized knowledge of the external system's architecture to understand.

11. Safety Boundary Flags

Each packet should include a set of boolean safety flags. These flags indicate whether specific safety constraints were violated, approached, or maintained during the evaluation period. Example flags:

These flags are proposed handoff fields, not live monitoring metrics on Carcinus.org.

12. Export Formats

Evaluation packets should be exportable in the following public-safe formats. These are specifications, not implemented download endpoints on this site:

A future Evaluation Packet Template Builder could produce downloadable example packets in these formats using predefined sample data only — no live ingestion, no uploads, no agent execution.

Sample Evaluation Packet (Static Example)

This is a static example using hardcoded public-safe sample data. It does not represent any real system.

Static example evaluation packet with public-safe fields only.
FieldValue
packetIdeval-2026-06-02-001-sample
createdUtc2026-06-02T00:00:00Z
agentSlugexample-agent
claimStatusframing / research-handoff
resourceBudgetSummary{"computeBudget": 1000, "consumed": 720, "threshold": 0.8, "status": "OK"}
fastLoopSummary{"updatesThisPeriod": 45000, "avgUpdateMagnitude": 0.0003, "convergenceMetric": 0.94}
slowLoopSummary{"proposalsConsidered": 3, "accepted": 1, "modified": 1, "noOpRejected": 1}
structuralActions[{"op":"merge","target":"module-a+module-b","reason":"duplicate function"},{"op":"no-op","target":"proposed-split-x","reason":"budget insufficient"}]
noOpCount1
blockedActionCount0
publicSymbolAnchors["https://example.com/evidence-001.json"]
reviewerStatuspending-human-review
safetyBoundaryFlags{"resourceBudgetExceeded": false, "structuralAnomalyDetected": false, "noOpFrequencyIncreasing": false, "evidenceChainBroken": false, "claimBoundaryViolated": false, "humanReviewRequested": true}
evidenceLinks["https://example.com/public/evidence/ev-001"]
caveatsSample data only. Not a real evaluation. No live system represented.

Explore Further