Preservation of Guix Schema

Overview

The database consists of two primary tables, “fods” (fixed-output derivations) and “commits”. They have a many-to-many relationship managed by the table “fod_commit_links”. The table “fod_references” contains some more information about each fixed-output derivation (described below). Finally, the table “failed_fetches” is a log for when the “identify” procedure cannot fetch the source of a fixed-output derivation.

Graphical
          representation of table relationships.  The table “fods” has
          a many-to-many relationship to “commits”, and a one-to-many
          relationship to both “fod_references” and “failed_fetches”.
Figure 1: Database table relationships.

Tables in Detail

fods

The rows of this table represent fixed-output derivations from the Guix package graph.

fod_id

INTEGER PRIMARY KEY

A unique integer identifying the fixed-output derivation.

algorithm

TEXT NOT NULL

The algorithm used to compute the hash (e.g., “sha256”).

hash

TEXT NOT NULL

The hash of the source.

swhid

TEXT

The Software Heritage ID (SWHID) of the source.

is_in_swh

INTEGER (Boolean)

Whether or not the source is in the Software Heritage archive. If this is FALSE, then the source could not be found in the archive. If this is NULL, then it has not been checked.

is_in_disarchive

INTEGER (Boolean)

Reserved for future use. 🙂

derivation

TEXT NOT NULL

The serialized derivation à la write-derivation.

commits

The rows of this table represent commits in the Guix Git repository.

commit_id

INTEGER PRIMARY KEY

A unique integer identifying the commit.

hash

TEXT NOT NULL UNIQUE

The commit ID (hash) from Git.

push_time

INTEGER (Unix timestamp)

The time the commit was pushed. This comes from the Guix Data Service, which in turn gets it from the guix-commits mailing list.

fod_commit_links

A simple table linking fixed-output derivations to commits. Every record is unique (the primary key is both columns).

fod_id

INTEGER NOT NULL REFERENCES fods (fod_id)

A pointer to a fixed-output derivation.

commit_id

INTEGER NOT NULL REFERENCES commits (commit_id)

A pointer to a commit.

fod_references

A simple table attaching “references” to fixed-output derivations. Every record is unique (the primary key is both columns).

fod_id

INTEGER NOT NULL REFERENCES fods (fod_id)

A pointer to a fixed-output derivation.

reference

TEXT NOT NULL

A serialized form of the “uri” field of the origin record that corresponds to the fixed-output derivation. For a plain URL, this will be a serialized string: "https://...". For a Git reference, it will look like:

(git-reference
  (url "...")
  (commit "...")
  (recursive? #f))

It will be similar for other kinds of references (Subversion, CVS, etc.). Oftentimes the reference will have more information and be easier to use than the serialized derivation, which makes it extremely useful.

failed_fetched

This table records failed attempts at identifying (finding the SWHID) of fixed-output derivations. It’s main purpose is to prevent trying the same fixed-output derivation over and over. Every record is unique (the primary key is both columns).

fod_id

INTEGER NOT NULL REFERENCES fods (fod_id)

A pointer to a fixed-output derivation.

fetch_time

INTEGER (Unix timestamp)

The time that the identification attempt failed.

Raw Schema

CREATE TABLE commits
  (commit_id INTEGER PRIMARY KEY,
   hash TEXT NOT NULL UNIQUE,
   push_time INTEGER);

CREATE TABLE fods
  (fod_id INTEGER PRIMARY KEY,
   algorithm TEXT NOT NULL,
   hash TEXT NOT NULL,
   swhid TEXT,
   is_in_swh INTEGER,
   is_in_disarchive INTEGER,
   derivation TEXT NOT NULL,
   UNIQUE (hash, algorithm));

CREATE TABLE fod_commit_links
  (fod_id INTEGER NOT NULL REFERENCES fods (fod_id),
   commit_id INTEGER NOT NULL REFERENCES commits (commit_id),
   PRIMARY KEY (fod_id, commit_id));

CREATE INDEX fod_commit_links_reverse_index ON fod_commit_links
  (commit_id);

CREATE TABLE fod_references
  (fod_id INTEGER NOT NULL REFERENCES fods (fod_id),
   reference TEXT NOT NULL,
   PRIMARY KEY (fod_id, reference));

CREATE TABLE failed_fetches
  (fod_id INTEGER NOT NULL REFERENCES fods (fod_id),
   fetch_time INTEGER NOT NULL,
   PRIMARY KEY (fod_id, fetch_time));