Web Semantics

One of the oldest challenges in building digital infrastructure has been to consistently establish meaning and context to this data. The semantic web is a set of technologies whose goal is to make all data on the web machine-readable. Its usage allows for a shared understanding around data that enables a variety of real-world applications and use cases.

Semantic web technologies

The vision for the semantic web has been remarkably consistent throughout its evolution, although the specifics around how to accomplish this and at what layer has developed over the years. W3C’s semantic web stack offers an overview of these foundational technologies and the function of each component in the stack.

The ultimate goal is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network. The shared architecture as defined by the W3C supports the internet ability to become a global database based on linked data. Semantic Web technologies enable creating data stores, building vocabularies, and writing rules for handling data. Linked data is empowered by the following technologies: such as RDF, SPARQL, OWL, and SKOS:

  • Resource Description Framework (RDF) provides the foundation for publishing and linking data. It’s a standard data model for representing information resources on the internet and describing the relationships between data and other pieces of information in a graph format.

  • Web Ontology Language (OWL) is a language which is used to build data vocabularies, or “ontologies”, that represent rich knowledge or logic.

  • Simple Knowledge Organization System (SKOS) is a standard way to represent knowledge organisation systems such as classification systems in RDF.

  • Semantic Protocol And RDF Query Language (SPARQL) is the query language for the Semantic Web; it is able to retrieve and manipulate data stored in an RDF graph. Query languages go hand-in-hand with databases. If the Semantic Web is viewed as a global database, then it is easy to understand why one would need a query language for that data.

Linked data

Linked data is the theory behind much of the semantic web effort. It describes a general mechanism for publishing structured data on the internet using vocabularies like schema.org that can be connected together and interpreted by machines.

Using linked data, statements encoded in triples (subject → predicate → object) can be spread across different websites in a standard way. These statements form the substrate of knowledge that spans across the entire internet. The reality is that the bulk of useful information on the internet today is unstructured data, or data that is not organised in a way which makes it useful to anyone beyond the creators of that data. This is fine for the cases where data remains in a single context throughout its lifecycle, but it becomes problematic when trying to share data across contexts while retaining its semantic meaning. The vision for linked data is for the internet to become a kind of global database where all data can be represented and understood in a similar way.

https://www.datocms-assets.com/38428/1620707068-linked-data-focus.svg

One of the biggest challenges to realising the vision of the internet as a global database is enabling a common set of underlying semantics that can be consumed by all this data. A proliferation of data becomes much less useful if the data is redundant, unorganised, or otherwise messy and complicated. Ultimately, we need to double down on the usage of common data vocabularies and common data schemas.

Common data schemas combined with the security features of verifiable data will make fraud more difficult, making it easier to transmit and consume data so that trust-based decisions can be made. Moreover, the proliferation of common data vocabularies will help make data portability a reality, allowing data to be moved across contexts while retaining the semantics of its original context.

https://www.datocms-assets.com/38428/1620707105-linked-data-expand.svg

Semantics and schemas

By enriching data with additional context and meaning, more people (and machines) can understand and use that data to greater effect. One concrete example of this is the application of data schemas or data vocabularies. Schemas are a set of types and properties that are used to describe data. They are an incredibly useful and necessary tool in order to represent data accurately, but are only useful if they are strongly reused by many different parties. When each implementer describes and represents data in a slightly different way, it creates incoherence and inconsistency in data and threatens to diminish the potential of ubiquitous adoption of open standards and schemas.

The VC data model defines two concrete data syntaxes, JSON and JSON-LD (Linked Data). We have chosen to use JSON-LD in our implementation, as it allows the VC data model to be extensible and interoperable while remaining distributed in its architecture. To learn more about our approach, read our blog JWT vs Linked Data Proofs: comparing Verifiable Credentials.

Verifiable credentials make use of JSON-LD to extend the data model to support dynamic data vocabularies and schemas. This allows us to not only use existing JSON-LD schemas, but to utilise the mechanism defined by JSON-LD to create and share new schemas.

This type of verifiable credential is best characterised as a kind of Linked data proof. It allows issuers to make statements that can be shared without loss of trust because their authorship can be verified by a third party. Linked data proofs define the capability for verifying the authenticity and integrity of digital documents with mathematical proofs and asymmetric cryptography. It provides a simple security protocol which is native to JSON-LD. Due to the nature of linked data, they are built to compactly represent proof chains and allow a Verifiable Credential to be easily protected on a more granular basis; on a per-attribute basis rather than a per-credential basis.

JSON-LD

JSON-LD is a serialisation format that extends JSON to support linked data, enabling the sharing and discovery of data in web-based environments. Its purpose is to be isomorphic to RDF, which has broad usability across the web and supports additional technologies for querying and language classification. RDF has been used to manage industry ontologies for the last couple decades, so creating a representation in JSON is incredibly useful in certain applications such as those found in the context of verifiable credentials (VCs).

The Linked Data Proofs representation of verifiable credentials makes use of a simple security protocol which is native to JSON-LD. The primary benefit of the JSON-LD format used by LD-Proofs is that it builds on a common set of semantics that allow for broader ecosystem interoperability of issued credentials. It provides a standard vocabulary that makes data in a credential more portable as well as easy to consume and understand across different contexts. In order to create a crawl-able web of verifiable data, it’s important that we prioritize strong reuse of data schemas as a key driver of interoperability efforts. Without it, we risk building a system where many different data schemas are used to represent the same exact information, creating the kinds of data silos that we see on the majority of the internet today. JSON-LD makes semantics a first-class principle and is therefore a solid basis for constructing VC implementations.

JSON-LD is also widely adopted on the web today, with W3C reporting it is used by 30% of the web and Google making it the de facto technology for search engine optimisation. When it comes to Verifiable Credentials, it's advantageous to extend and integrate the work around VCs with the existing burgeoning ecosystem of linked data.

There are several available guides online that describe JSON-LD in detail. We’re going to focus on two required properties of JSON-LD that consistently show up in the VC data model:

  • @context: When two software systems exchange data, they need to use terminology that both systems understand. This is referred to as the context of their data exchange. For example, in a given context "lastName" may be an attribute that refers to a person’s surname whereas in a different context it might be the latest chronologically assigned name.

    Contexts map terms to URIs that explain what those terms mean in that context. The @context element is expressed as an ordered set, and all VCs share the same common context that always appears first in the set - https://www.w3.org/2018/credentials/v1. Contexts are often cached for the sake of performance.

  • type: Expresses what kind of information is in the document: is it a verifiable credential? Is it a presentation? Is it an object containing credentials, or presentations? For convenience and semantic interoperability, type is often specified as a set of terms that are defined in the JSON-LD @context.

In addition to JSON-LD properties that are defined in the credential, Web Credentials created on MATTR VII include the DID & in some cases the domain name of the issuer, as well as some kind of identifier for the subject (typically also a DID).