CBOR is a compact binary data serialization and messaging format. This specification defines CBOR-LD 1.0, a CBOR-based format to serialize Linked Data. The encoding is designed to leverage the existing JSON-LD ecosystem, which is deployed on hundreds of millions of systems today, to provide a compact serialization format for those seeking efficient encoding schemes for Linked Data. By utilizing semantic compression schemes, compression ratios in excess of 60% better than generalized compression schemes are possible. This format is primarily intended to be a way to use Linked Data in storage and bandwidth constrained programming environments, to build interoperable semantic wire-level protocols, and to efficiently store Linked Data in CBOR-based storage engines.

This document is experimental.

There is a reference implementation that is capable of demonstrating the features described in this document.

Introduction

CBOR is a compact binary data serialization and messaging format. This specification defines CBOR-LD 1.0, a CBOR-based format to serialize Linked Data. The encoding is designed to leverage the existing JSON-LD ecosystem, which is deployed on hundreds of millions of systems today, to provide a compact serialization format for those seeking efficient encoding schemes for Linked Data. By utilizing semantic compression schemes, compression ratios in excess of 60% better than generalized compression schemes are possible. This format is primarily intended to be a way to use Linked Data in storage and bandwidth constrained programming environments, to build interoperable semantic wire-level protocols, and to efficiently store Linked Data in CBOR-based storage engines.

How to Read this Document

This document is a detailed specification for a serialization of Linked Data in CBOR. The document is primarily intended for the following audiences:

Contributing

There are a number of ways that one may participate in the development of this specification:

Design Goals and Rationale

CBOR-LD satisfies the following design goals:

Simplicity
Implementations should be simple to implement given an existing JSON-LD implementation.
Efficient Storage
The encoding process should generate an aggressively compact Linked Data binary format.
Generalized Algorithm
The encoding algorithm must be generalized.
Semantic Compression
The encoding format should maximize compression of Linked Data URLs (terms and values). Focusing here ensures that the algorithms can achieve compression ratios better than generalized compression algorithms.
Raw Binary
Base-encoded binary values, and other compressible data types, should be translated to their raw binary forms from base-encoded formats when possible without sacrificing generality.

Similarly, the following are non-goals.

The following minefields have been identified while working on this specification:

Basic Concept

The general CBOR-LD encoding algorithm takes a JSON-LD Document and does the following:

CBOR Tags for CBOR-LD

The first step in decoding a CBOR-LD payload is to recreate the term codec map that was used to encode it by processing the contexts in the payload. However, the contexts needed to create the term codec map can have their URLs encoded as integers by CBOR-LD. If a CBOR-LD payload contains context URLs compressed in such a way, the consumer of the CBOR-LD needs to know what compression tables (maps from JSON-LD terms to integers) were used to compress the context URLs during creation to be able to reconstruct the term codec map. The following sections define the exact mechanism by which this can be accomplished, allowing an arbitrary CBOR-LD consumer to decompress any CBOR-LD payload that conforms to this specification.

To this end, we have registered the range of CBOR tags 1536-1791** (0x0600-0x06FF) to be used for CBOR-LD, where data that includes tag value is used to lookup what compression table(s) are needed to decompress the CBOR-LD context URLs.

This exact range of tag values has not yet been officially registered with the IANA CBOR Tag Registry. The exact range is subject to change.

CBOR-LD Varint

To enable unbounded extension on possible use cases for CBOR-LD that require different compression table material for consumption while working within a fixed number of CBOR tag values, we define the following.

Implementers MUST interpret the last byte of the two-byte CBOR tag value on a CBOR-LD payload as the beginning of a varint. If the CBOR tag is in the range `0x0600`–`0x067F`, the last byte of the CBOR tag is a one-byte varint. If the CBOR tag is `0x0680` or greater, the first item in the CBOR payload MUST be a major type 2 byte string containing the rest of the varint. See Algorithm for more information.

The value of this varint is then used to lookup a CBOR-LD Varint Registry Entry in the CBOR-LD Varint Registry.

CBOR-LD Varint Registry

The CBOR-LD Registry is a global list that provides consumers of CBOR-LD payloads the information they need to reconstruct the term codec map required for decompression. A CBOR-LD Varint Registry Entry contains the following:

  1. Registry Entry Value: a positive integer.
  2. Use Case: what type of CBOR-LD payload this entry is used for.
  3. `typeTables`: an array containing what `Type Tables` are to be used for this type of payload.
  4. `processingModel`: what processing model is used for this registry entry. A processing model specifies how auto-generated CBOR-LD values are created from JSON-LD contexts as well as what type encoders are used alongside the `Type Tables` (e.g. how to partially compress an `xsd:dateTime` value that does not appear in `Type Table`). The default processing model, which will be defined later in this specification, will be used unless otherwise specified in the Registry Entry.

The `typeTables` associated with a CBOR-LD Varint Registry Entry MUST be an array of or JSON objects. The only exception is the string "callerProvidedTable", which may appear in this array, denoting that for this use case, a `Type Table` is required which is not globally defined.

Dereferencing one of these URLs MUST result in a JSON object with the following properties:
  1. `type`: a JSON-LD type.
  2. `table`: a JSON object that maps values of the above type to integers.

If a JSON object is present in the `typeTables` array, it MUST be in the above format.

Registry

The following is the current CBOR-LD registry:

Registry Entry Id Use Case typeTables Processing Model
0 Uncompressed CBORLD None DEFAULT
1 Compressed CBORLD, default use case. DEFAULT DEFAULT
100 Verifiable Credential Barcodes Specification Test Vectors [ { type: "context", table: { "https://www.w3.org/ns/credentials/v2": 32768, "https://w3id.org/vc-barcodes/v1": 32769, "https://w3id.org/utopia/v2": 32770 } }, { type: "https://w3id.org/security#cryptosuiteString", table: { "ecdsa-rdfc-2019": 1, "ecdsa-sd-2023": 2, "eddsa-rdfc-2022": 3, "ecdsa-xi-2023": 4 } } ] DEFAULT

Algorithms

JSON-LD to CBOR-LD Algorithm

This algorithm takes JSON-LD objects `jsonldDocument` and `options` as well as an integer `registryEntryId` as input.

  1. Let `result` be an empty CBOR-encoded byte array.
  2. Set {`varintTagValue`, `varintBytesValue`} to the return value of the "Get CBOR-LD Varint Structure Algorithm", passing `registryEntryId` as input.
  3. If the "Get CBOR-LD Varint Structure Algorithm" resulted in an error, set `result` to the return value of the "Generate Uncompressed CBOR-LD Algorithm".
  4. Otherwise:
    1. Initialize `typeTables` to an empty map
    2. For each entry in the `typeTables` array in the CBOR-LD Varint Registry Entry associated with `registryEntryId`, dereference the URL if necessary and add ${type}: ${table} from the resulting document to `typeTables`. If "callerProvidedTable" appears in `contextTables`, populate `typeTables` with `options.callerProvidedTable` as well.
    3. Set `result` to the return value of the "Generate Compressed CBOR-LD Algorithm" passing `typeTables` as `options.typeTable`, `varintTagValue` as `options.varintTagValue`, and `varintBytesValue` as `options.varintBytesValue`.
  5. Return `result`.

Uncompressed CBOR-LD Buffer Algorithm

This algorithm takes a JSON-LD object `jsonldDocument`, integer `registryEntryId` and `options` as input.

  1. Let `result` be an empty CBOR-encoded byte array.
  2. Set the first two bytes (CBOR Tag) to 0x0600 (CBOR-LD - 0x06, Uncompressed - 0x00)
  3. For every key-value in the map, generate the Uncompressed CBOR-LD Buffer by converting it to the associated CBOR-LD header and value. For complex values (maps, arrays), recursively convert the value to something that will losslessly encode and decode back to JSON-LD.
  4. Return the Uncompressed CBOR-LD Buffer.

Compressed CBOR-LD Buffer Algorithm

This algorithm takes a JSON-LD object `jsonldDocument` and `options` as input. The `options` MUST contain:

`typeTable`
A map of JSON-LD types to maps. Each of these maps will map values of the associated type to their encoded CBOR-LD values.
`varintTagValue`
The CBOR tag value that will be used in the resulting payload.
`varintBytesValue`
The bytes that will make up the rest of the CBOR-LD varint. `options.varintBytesValue` MAY be `null`.
  1. Let `result` be an empty CBOR-encoded byte array.
  2. Set the first three bytes of `result` to `options.varintTagValue`.
  3. If `options.varintBytesValue` is not `null`, set the next bytes of `result` to `options.varintBytesValue`.
  4. Initialize `termCodecMap` to the result of the , passing `options.contextUrls` and `options.termMap` as input.
  5. Add to `result` by recursively processing every name-value pair in `jsonldDocument`
    1. Let `termHint` be the value associated with the JSON name in the `termCodecMap`.
    2. Set the CBOR key to the `termHint.value` value.
    3. Set the CBOR value to the result of the `termHint.valueCompressor` function.
  6. Return `result`.

Get Term Codec Map Algorithm

This algorithm takes a map `typeTable` and returns a CBOR-LD term codec map that maps JSON-LD terms to their associated byte values and value compression functions.

  1. Let `result` be an ordered map.
  2. For each value in `contextUrls`, dereference the JSON-LD contexts and process every entry.
    1. Set the entry key to the JSON-LD term key.
    2. Set the entry value to an unordered map with two entries.
      1. The first entry should be set to `value` with an undefined value.
      2. Let `compressor` be a known compressor function associated with the `@type` property from `typeTable`, a type-specific generic compressor associated with the processing model in use, or the generic CBOR compressor function, which returns the bytes associated with a typical CBOR compression of the given datatype.
  3. Let `sortedTerms` be the value of sorting all of the keys in `result`.
  4. For every value in the list of `sortedTerms` set the associated `termHint.value` value to the associated index of `sortedTerms`.
  5. Return `result`.

Get Context URLs Algorithm

  1. Let `result` be a ordered map.
  2. Walk the JSON tree, for each JSON name-value pair:
    1. If the name is `@context`
      1. Add all values that are referenced by a URL to `result` where the key in the map is set to the JSON value associated with `@id`.
    2. If a non-URL value is detected, throw an ERR_NON_URL_JSONLD_CONTEXT_DETECTED error.
  3. Return `result`.

Get CBOR-LD Varint Structure Algorithm

This algorithm takes as input an integer `registryEntryId`.
  1. If `registryEntryId` is less than 128:
    1. Set `varintEncoded` to the byte encoding of `registryEntryId`.
    2. Set `varintTagValue` to the result of appending `varintEncoded` to the end of the bytes 0xD906, and set `varintBytesValue` to `null`.
  2. Otherwise:
    1. Set `varintArray` to an array containing the varint representation of `registryEntryId`.
    2. Set `varintTagValue` to `varintArray[0]` appended to the end of the bytes 0xD906.
    3. Set `varintBytesValue` to a CBOR array containing the rest of `varintArray`.
  3. Return {varintTagValue, varintBytesValue}.

Term Codec Registry

**Note: This term codec registry is deprecated and has been replaced by the CBOR-LD Varint Registry.

The following is a registry of well-known term codecs. These will be registered on a first-come first-serve basis.

Value Context URL Context Name
0x00 - 0x0F RESERVED Reserved for future use.
0x10 https://www.w3.org/ns/activitystreams ActivityStreams 2.0
0x11 https://www.w3.org/2018/credentials/v1 Verifiable Credentials Data Model v1
0x12 https://www.w3.org/ns/did/v1 Decentralized Identifiers (DID) Core Spec v1
0x13 https://w3id.org/security/suites/ed25519-2018/v1 Ed25519Signature2018 Suite
0x14 https://w3id.org/security/suites/ed25519-2020/v1 Ed25519Signature2020 Suite
0x15 https://w3id.org/cit/v1 Concealed Id Token
0x16 https://w3id.org/age/v1 Age Verification
0x17 https://w3id.org/security/suites/x25519-2020/v1 X25519KeyAgreementKey2020 Suite
0x18 https://w3id.org/veres-one/v1 Veres One DID Method
0x19 https://w3id.org/webkms/v1 WebKMS (Key Management System)
0x1A https://w3id.org/zcap/v1 Authorization Capabilities (zCap)
0x1B https://w3id.org/security/suites/hmac-2019/v1 Sha256HmacKey2019 Crypto Suite
0x1C https://w3id.org/security/suites/aes-2019/v1 AesKeyWrappingKey2019 Crypto Suite
0x1D https://w3id.org/vaccination/v1 Vaccination Certificate Vocabulary v0.1
0x1E https://w3id.org/vc-revocation-list-2020/v1 Verifiable Credentials Revocation List 2020
0x1F https://w3id.org/dcc/v1 DCC (Decentralized Credentials Consortium) Core Context
0x20 https://w3id.org/vc/status-list/v1 Verifiable Credentials Status List
0x21 https://www.w3.org/ns/credentials/v2 Verifiable Credentials Data Model v2
0x22 - 0x2F Available for use.
0x30 https://w3id.org/security/data-integrity/v1 Data Integrity v1.0
0x31 https://w3id.org/security/multikey/v1 Multikey v1.0
0x32 Reserved for future use.
0x33 https://w3id.org/security/data-integrity/v2 Data Integrity v2.0
0x34 - 0x36 RESERVED Reserved for future use.