Lineage Visualization
The following diagram illustrates the flow of data through the four codified stages of the Codatta protocol.In Stage 1, the system accepts atomic contributions (samples or labels), builds a Contribution Fingerprint (CF) record for each accepted contribution, and then anchors the CF’s fingerprint on-chain.
Lifecycle Stages
1. Assetification
The foundational stage where raw contributions are transformed into trusted, immutable assets. This process involves three critical steps tracked by the lineage graph:- Submission: Contributors initiate this stage by providing samples or labels, along with relevant metadata.
- Screening & validation: Codatta runs automatic pre‑scan checks including heuristic-based checks for format, policy compliance, and deduplication, plus semantic checks with AI agents. Where needed, human review is triggered. Submissions that fail at this step are rejected and never produce a CF.
- CF creation & anchoring: For each accepted atomic contribution, the system builds a CF commit (canonicalizing the content hash, attaching evidence & signals, and binding contributor identity), then anchors the CF’s fingerprint on-chain (per‑CF or in a batched Merkle root) on the selected network.
- Contribution Fingerprint (CF) – how each atomic contribution becomes a signed, anchored record.
2. Data Assembling
In this stage, individual anchored assets are curated and combined to form comprehensive datasets.- Composition: The lineage view visualizes how multiple atomic assets (from Stage 1) flow into a larger “Assembly.”
- Versioning: Assembling allows for different versions of a dataset (e.g., “Training Set v1.0”) to be created from a pool of assets without altering the underlying provenance of the original contributions.
3. Publication & Adoption
Once assembled, datasets are released for utility. This stage tracks the downstream reach of the data.- Publication: Visualizes where the dataset has been pushed (e.g., HuggingFace, decentralized storage).
- Adoption: Tracks commercial or research usage, such as a specific AI model integrating the dataset for training. This node represents the “proof of utility.”
4. Payouts
The final stage closes the economic loop. When an adoption event generates revenue (e.g., a licensing fee), it is visualized as a Payout Event.- Value Flow: A connection is drawn from the Payout node back to the original Anchored Assets.
- Distribution to Backers: Funds are distributed to the current ownership holders at the time of the payout snapshot.
- Initial Owners: Originally, ownership belongs to the contributors (submitters), validators, and initial stakers.
- Backers: Because fractional ownership is tradable, the current holder may differ from the original creator. The lineage view identifies these current owners as Backers—investors or entities who have acquired ownership rights on the open market. The system ensures rewards reach the wallet currently holding the asset fraction.