Data

Anyone can permissionlessly leverage OpenRank Protocol to compute on various app-specific reputation graphs. There are two main operations involved:

Data Sourcing: Any data indexers or open data sets can bring their data sets for pre-prcoessing stage.

Data Pre-Processing: Using the source data, developers and data scientists can perform desired transformations to create app-specific transactions, credentials, attestations schemas or social graph data to generate reputation scores and rankings. The transformed data is a reputation graph, an input to OpenRank.

Data Sets for OpenRank

OpenRank computes on reputation graph data which can be constructed from peer-to-peer trust signals. A context-specific p2p reputation graph expresses this statement - "who trusts whom by how much, for what".

  • Who: the one that extends the trust; the truster

  • Whom: the one that receives the trust; the trustee

  • How much: the level of trust from the truster to the trustee

  • What: the type of trust that the truster extends; the role or context

Implicit and Explicit Reputation graphs

Peer-to-peer trust signals can be captured in several ways.

Implicit - we can take existing data and derive reputation graphs from it. This is very useful in bootstrapping ranking and reputation using existing large scale data sets.

A good candidate for implicit signals is data from social graphs like Farcaster or Lens. This data can be used to compute rankings and recommendations for social networks and apps. An example reputation graph can be a linear combination of engagement actions such as comments, tips, recasts, follows between X and Y.

Another example of implicit data is onchain transaction data: If X sends tokens to Y, X may be assumed to trust Y, and the value of the asset determines the level of trust.

Implicit trust can be derived from any relevant verifiable data set. However, this requires implied assumptions around peer-to-peer trust heuristics. This may not work for certain use cases where the threshold for trust signals is high or must meet objective and explicit expression.

Explicit - we can use attestations and credentials as explicit p2p signals. The schemas or scope of these assertions help form a reputation graph. For example, the MetaMask Snaps Permissionless Distribution (SPD) platform gives users an option to express their trust opinions about other users for software security or software developer skills.

A web2 analogy for this is a 5-star rating system on marketplaces where users leave reviews for a counterparty. These reviews enable a ranking and reputation compute, which can aid in trust and safety, search and discovery. Once a developer decides where to source the data for ranking and reputation compute, we need to pre-process the data for computation. This usually requires applying transformation and linear combination operations on the source data. The output of these operations gives us the desired reputation graph for the desired use case.

For instance, lets consider an onchain graph constructed by using peer-to-peer token transfers between any two EOAs. To create a quality input data set for OpenRank computation, we must create a graph of only EOAs as nodes (vertices). The directed edges should capture value of token transfers between any two peers. Pre-processing operations involve removing contract events, filtering out non-desirable EOAs (labelled as spam/scam/CEX) from the graph.

Another example of pre-processing data is using onchain attestations of a particular schema. The first step is to chose the specific schema which captures the peer-to-peer trust heuristic. For example X attesting that Y is a good software developer. Once we have all the attestations related to this schema, we can combine any other useful heuristics or schemas and do a linear combination of these to formulate an [i,j,v] matrix.

Data Verifiability and Provenance

Sourcing Data

OpenRank computes on openly available data sets such as onchain transactions, attestations or open social graphs. There are at least two ways for developers to use open or public data sets - trustless and trusted.

In the trustless approach, the correctness and veracity of data can be proved using the system that hosts the data. For example, onchain transactions are verified by means of inclusion in a block considered part of the canonical chain.

In the trusted approach, the correctness and veracity of data is not directly proven; instead, the dataset is published by a trusted party, who signs the dataset. The consumer of the dataset establishes trust relationship with the publisher of the data.

The right approach often depends on the use case. For example, high-stakes use cases (such as p2p lending) require trust-minimized setup, where trustless source data is essential. Low-risk use cases such as social graph data for feeds and rankings can be sourced via reliable 3rd party data providers that developers and apps already leverage, as long as the data set is open and verifiable.

Pre-processing Data

After identifying the data source, there may be several pre-processing operations before getting to the desired reputation graph for OpenRank compute. For instance, let's consider a profile ranking system on Farcaster powered by OpenRank. The first step is to get the entire social graph data from a node (Farcaster hubble). Next, we transform the data into different peer-to-peer actions, such as X likes Y's casts, X recasts Y, or X mentions Y. Next, we do a linear combination of these user actions to form a reputation graph - which is an [i,j] matrix capturing peer-to-peer engagement on Farcaster.

Each of these pre-processing steps operate on one set of data (originating in source Farcaster graph), and produces another set of data. The output from one step is used as the input in a subsequent step, with the final data set becoming the input for OpenRank compute. A simple way to verify that the reputation graph data has not been tampered with during the pre-processing phase is for anyone to run the same transformation and linear combination operations on the public dataset and comparing the outputs. The provenance record of the input data for compute is signed by a data provider when they request for compute from OpenRank.

In the future, a trustless setup for data pre-processing will be made possible by submitting a proof of transformation along with the output dataset as a part of the provenance record.

With a data verifiability system, the end-user who consumes OpenRank compute can now check the data provenance of the input data itself. It first backtracks through the pre-processing graph towards all terminal input datasets used by the graph, then for each interim dataset it encounters, it checks the signature or proof submitted by the data provider.

To summarize, while submitting the reputation graph to the compute node, a data provider is responsible for:

  • Verification of all input data it has used. This may be signature verification or correctness proof verification.

  • Correct execution of the core logic inside the pre-processing stage.

Incremental Processing

In a steady state, most datasets will continue to get incremental updates. A data node may maintain a pipeline to incorporate incremental updates.

Using such update or maintenance actions, a data provider will produce a new version of output dataset (reputation graph) for the next epoch of compute. This fresh input data set for the compute will be marginally different from the previous version. The data provider may opt to publish the output in the form of deltas (patches), which anyone who has the previous version of the data can apply to mutate the previous version into the subsequent version, instead of publishing a full snapshot of the new version of data.

Even in this case, the data provider should still periodically emit full snapshots. Using a base snapshot and all the deltas published after the snapshot, anyone can reconstruct the up-to-date version of the data. Multiple checkpointing strategies exist for these full/diff data publication methods. OpenRank does not mandate any given strategy. Instead, data providers may clarify, given the desired point in time, how to recreate the dataset as of that point in time by combining a base snapshot with a series of deltas, as well as how to obtain such snapshots and deltas.

Current Integrations and Roadmap

For the current integrations (see details in this section), Karma3 Labs is sourcing and pre-processing data with the help of 3rd party data providers.

OpenRank will offer an opportunity for data infrastructure providers to participate in the protocol and start offering indexed onchain or open public datasets for ranking and reputation use cases. Developers who require ranking and reputation compute will simply rely on data and compute infrastructure powered OpenRank protocol.

In the future, a custom VM wil be enabled to handle data verification, provenance and cheap transformation operations for pre-processing large data sets ahead of the core compute.

Last updated

Logo

Copyright 2024