Verifiable data

Introduction

The ability to prove the integrity and authenticity of shared data is a key component to establishing trust online. Given that we produce so much data and are constantly sharing and moving that data around, it is a complex task to identify a solution that will work for the vast majority of internet users across a variety of different contexts.

The fundamental problem to address is how to establish authority on a piece of data, and how to enable mechanisms to trust those authorities in a broad set of contexts. Solving this problem on a basic level allows entities to have greater trust in the data they’re sharing, and for relying parties to understand the integrity and authenticity of the data being shared.

We use the overarching term verifiable data to refer to this problem domain. Verifiable data can be further expanded into three key pillars:

  1. Verifiable data

  2. Verifiable relationships

  3. Verifiable processes

Verifiable data

This refers to the authenticity and integrity of the actual data elements being shared.

https://www.datocms-assets.com/38428/1617659489-verifiable-data.png?auto=format

Verifiable relationships

This refers to the ability to audit and understand the connections between various entities as well as how each of these entities are represented in data.

https://www.datocms-assets.com/38428/1620707997-verifiable-relationships.svg

Verifiable processes

This describe the ability to verify any digital process such as onboarding a user or managing a bank account (particularly with respect to how data enables the process to be managed and maintained).

https://www.datocms-assets.com/38428/1620708178-verifiable-processes.svg

These closely-related, interdependent concepts rely on verifiable data technology becoming a reality.

Verifiable credentials

The basic data model of W3C verifiable credentials may be familiar to developers and architects that are used to working with attribute-based credentials and data technologies. The issuer, or the authority on some information about a subject (e.g. a person), issues a credential containing this information in the form of claims to a holder. The holder is responsible for storing and managing that credential, and in most instances uses a piece of software that acts on their behalf, such as a digital wallet. When a verifier (sometimes referred to as a relying party) needs to validate some information, they can request from the holder some data to meet their verification requirements. The holder unilaterally determines if they wish to act upon the request and is free to present the claims contained in their verifiable credentials using any number of techniques to preserve their privacy.

Verifiable credentials form the foundation for verifiable data in the emerging web of trust. They can be thought of as a container for many different types of information as well as different types of credentials. Because it is an open standard at the W3C, verifiable credentials are able to be widely implemented by many different software providers, institutions, governments, and businesses. Due to the wide applicability of these standards, similar content integrity protection and guarantees are provided regardless of the implementation.

Semantics and schemas

The authenticity and integrity-providing mechanisms presented by verifiable credentials provide additional benefits beyond the evaluation of verifiable data. They also provide a number of extensibility mechanisms that allow data to be linked to other kinds of data in order to be more easily understood in the context of relationships and processes.

One concrete example of this is the application of data schemas or data vocabularies. Schemas are a set of types and properties that are used to describe data. In the context of data sharing, schemas are an incredibly useful and necessary tool in order to represent data accurately from the point of creation to sharing and verification. In essence, data schemas in the Verifiable Credential ecosystem are only useful if they are strongly reused by many different parties. If each implementer of Verifiable Credentials chooses to describe and represent data in a slightly different way, it creates incoherence and inconsistency in data and threatens to diminish the potential of ubiquitous adoption of open standards and schemas.

Verifiable credentials make use of JSON-LD to extend the data model to support dynamic data vocabularies and schemas. This allows us to not only use existing JSON-LD schemas, but to utilize the mechanism defined by JSON-LD to create and share new schemas as well. To a large extent this is what JSON-LD was designed for; the adoption and reuse of common data vocabularies.

This type of verifiable credential is best characterised as a kind of Linked Data Proof. It allows issuers to make statements that can be shared without loss of trust because their authorship can be verified by a third party. Linked Data Proofs define the capability for verifying the authenticity and integrity of Linked Data documents with mathematical proofs and asymmetric cryptography. It provides a simple security protocol which is native to JSON-LD. Due to the nature of linked data, they are built to compactly represent proof chains and allow a Verifiable Credential to be easily protected on a more granular basis; on a per-attribute basis rather than a per-credential basis.

This mechanism becomes particularly useful when evaluating a chain of trusted credentials belonging to organisations and individuals. A proof chain is used when the same data needs to be signed by multiple entities and the order in which the proofs were generated matters. For example, such as in the case of a notary counter-signing a proof that had been created on a document. Where order needs to be preserved, a proof chain is represented by including an ordered list of proofs with a “proof chain” key in a Verifiable Credential. This kind of embedded proof can be used to establish the integrity of verifiable data chains.

Overall, the ability for data to be shared across contexts whilst retaining its integrity and semantics is a critical building block of the emerging web of trust.