Web semantics

One of the oldest challenges in building digital infrastructure has been to consistently establish meaning and context to this data. The semantic web (opens in a new tab) is a set of technologies whose goal is to make all data on the web machine-readable. Its usage allows for a shared understanding around data that enables a variety of real-world applications and use cases.

Semantic web technologies

The vision for the semantic web has been remarkably consistent throughout its evolution, although the specifics around how to accomplish this and at what layer has developed over the years. W3C’s semantic web stack (opens in a new tab) offers an overview of these foundational technologies and the function of each component in the stack.

The ultimate goal is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network. The shared architecture as defined by the W3C supports the internet ability to become a global database based on linked data (opens in a new tab).

Semantic Web technologies enable creating data stores, building vocabularies, and writing rules for handling data. Linked data is empowered by the following technologies:

Linked data

Linked data is the theory behind much of the semantic web effort. It describes a general mechanism for publishing structured data on the internet using vocabularies like schema.org (opens in a new tab) that can be connected together and interpreted by machines.

Using linked data, statements encoded in triples (subject → predicate → object) can be spread across different websites in a standard way. These statements form the substrate of knowledge that spans across the entire internet. The reality is that the bulk of useful information on the internet today is unstructured data, or data that is not organised in a way which makes it useful to anyone beyond the creators of that data. This is fine for the cases where data remains in a single context throughout its lifecycle, but it becomes problematic when trying to share data across contexts while retaining its semantic meaning. The vision for linked data is for the internet to become a kind of global database where all data can be represented and understood in a similar way.

did 1 arrow issues a credential arrow did 2

One of the biggest challenges to realising the vision of the internet as a global database is enabling a common set of underlying semantics that can be consumed by all this data. A proliferation of data becomes much less useful if the data is redundant, unorganised, or otherwise messy and complicated. Ultimately, we need to double down on the usage of common data vocabularies and common data schemas.

Common data schemas combined with the security features of verifiable data will make fraud more difficult, making it easier to transmit and consume data so that trust-based decisions can be made. Moreover, the proliferation of common data vocabularies will help make data portability a reality, allowing data to be moved across contexts while retaining the semantics of its original context.

Semantics and schemas

By enriching data with additional context and meaning, more people (and machines) can understand and use that data to greater effect. One concrete example of this is the application of data schemas or data vocabularies. Schemas are a set of types and properties that are used to describe data. They are an incredibly useful and necessary tool in order to represent data accurately, but are only useful if they are strongly reused by many different parties. When each implementer describes and represents data in a slightly different way, it creates incoherence and inconsistency in data and threatens to diminish the potential of ubiquitous adoption of open standards and schemas.

The VC data model (opens in a new tab) defines two concrete data syntaxes: JSON (opens in a new tab) and JSON-LD (Linked Data) (opens in a new tab). We have chosen to use JSON-LD in our implementation, as it allows the VC data model to be extensible and interoperable while remaining distributed in its architecture. To learn more about our approach, read our blog JWT vs Linked Data Proofs: comparing Verifiable Credentials (opens in a new tab).

Verifiable credentials make use of JSON-LD (opens in a new tab) to extend the data model to support dynamic data vocabularies and schemas. This allows us to not only use existing JSON-LD schemas, but to utilise the mechanism defined by JSON-LD to create and share new schemas.

This type of verifiable credential is best characterised as a kind of Linked data proof (opens in a new tab). It allows issuers to make statements that can be shared without loss of trust because their authorship can be verified by a third party. Linked data proofs define the capability for verifying the authenticity and integrity of digital documents with mathematical proofs and asymmetric cryptography. It provides a simple security protocol which is native to JSON-LD. Due to the nature of linked data, they are built to compactly represent proof chains and allow a Verifiable Credential to be easily protected on a more granular basis; on a per-attribute basis rather than a per-credential basis.

JSON-LD

JSON-LD (opens in a new tab) is a serialisation format that extends JSON to support linked data, enabling the sharing and discovery of data in web-based environments. Its purpose is to be isomorphic to RDF, which has broad usability across the web and supports additional technologies for querying and language classification. RDF has been used to manage industry ontologies for the last couple decades, so creating a representation in JSON is incredibly useful in certain applications such as those found in the context of verifiable credentials (VCs) (opens in a new tab).

The Linked Data Proofs (opens in a new tab) representation of verifiable credentials makes use of a simple security protocol which is native to JSON-LD. The primary benefit of the JSON-LD format used by LD-Proofs is that it builds on a common set of semantics that allow for broader ecosystem interoperability of issued credentials. It provides a standard vocabulary that makes data in a credential more portable as well as easy to consume and understand across different contexts. In order to create a crawl-able web of verifiable data, it’s important that we prioritize strong reuse of data schemas as a key driver of interoperability efforts. Without it, we risk building a system where many different data schemas are used to represent the same exact information, creating the kinds of data silos that we see on the majority of the internet today. JSON-LD makes semantics a first-class principle and is therefore a solid basis for constructing VC implementations.

JSON-LD is also widely adopted on the web today, with W3C reporting it is used by 30% of the web (opens in a new tab) and Google making it the de facto technology for search engine optimisation. When it comes to Verifiable Credentials, it's advantageous to extend and integrate the work around VCs with the existing burgeoning ecosystem of linked data.

There are several (opens in a new tab) available (opens in a new tab) guides (opens in a new tab) online that describe JSON-LD in detail. We’re going to focus on two required properties of JSON-LD that consistently show up in the VC data model:

Referenced contexts must be whitelisted for credential issuance to succeed. In-line context definitions would result in an error. If you require introducing a different context for your production implementation, please contact us.

  • type (opens in a new tab): Expresses what kind of information is in the document: is it a verifiable credential? Is it a presentation? Is it an object containing credentials, or presentations? For convenience and semantic interoperability, type is often specified as a set of terms that are defined in the JSON-LD @context.

In addition to JSON-LD properties that are defined in the credential, Web Credentials created on MATTR VII include the DID and in some cases the domain name of the issuer, as well as some kind of identifier for the subject (typically also a DID).