The database consists of two primary tables, “fods” (fixed-output derivations) and “commits”. They have a many-to-many relationship managed by the table “fod_commit_links”. The table “fod_references” contains some more information about each fixed-output derivation (described below). Finally, the table “failed_fetches” is a log for when the “identify” procedure cannot fetch the source of a fixed-output derivation.
This query lists all the Git references that are missing from the Software Heritage Archive.
SELECT f.algorithm, f.hash, fr.reference FROM fods f LEFT JOIN fod_references fr USING (fod_id) WHERE NOT f.is_in_swh AND fr.reference_type = 'git'
This query lists all the Git references that could not be processed (because they were unavailable during processing or because of a bug in the processing script). This query excludes recursive Git references, as they cannot be processed yet.
SELECT f.algorithm, f.hash, fr.reference FROM fods f LEFT JOIN fod_references fr USING (fod_id) WHERE f.is_in_swh IS NULL AND fr.reference_type = 'git' AND fr.reference NOT LIKE '%(recursive? #t)%'
This query lists all the “.tar.gz” references that are missing from the Software Heritage Archive.
SELECT f.algorithm, f.hash, fr.reference FROM fods f LEFT JOIN fod_references fr USING (fod_id) WHERE NOT f.is_in_swh AND fr.reference_type = 'tar-gz'
This query lists all the “.tar.xz” references that could not be processed (because they were unavailable during processing or because of a bug in the processing script).
SELECT f.algorithm, f.hash, fr.reference FROM fods f LEFT JOIN fod_references fr USING (fod_id) WHERE f.is_in_swh IS NULL AND fr.reference_type = 'tar-xz'
The rows of this table represent fixed-output derivations from the Guix package graph.
INTEGER PRIMARY KEY
A unique integer identifying the fixed-output derivation.
TEXT NOT NULL
The algorithm used to compute the hash (e.g., “sha256”).
TEXT NOT NULL
The hash of the source.
TEXT
The Software Heritage ID (SWHID) of the source. Some
fixed-output derivations corresponds to multiple SWHIDs. In
that case, this column will
contain 'composite'
,
and the SWHIDs will be stored in the “subfods” table.
INTEGER
(Boolean)
Whether or not the source is in the Software Heritage
archive. If this is FALSE
, then the source
could not be found in the archive. If this
is NULL
, then it has not been checked.
INTEGER
(Boolean)
Reserved for future use. 🙂
TEXT NOT NULL
The serialized derivation à la write-derivation
.
The rows of this table represent commits in the Guix Git repository.
INTEGER PRIMARY KEY
A unique integer identifying the commit.
TEXT NOT NULL UNIQUE
The commit ID (hash) from Git.
INTEGER
(Unix timestamp)
The time the commit was pushed. This comes from the Guix Data Service, which in turn gets it from the guix-commits mailing list.
A simple table linking fixed-output derivations to commits. Every record is unique (the primary key is both columns).
INTEGER NOT NULL REFERENCES fods (fod_id)
A pointer to a fixed-output derivation.
INTEGER NOT NULL REFERENCES commits (commit_id)
A pointer to a commit.
This table stores is used to attach multiple SWHIDs to a single fixed-output derivation. This is used, for example, to represent composing multiple directories together from a single Subversion repository.
INTEGER NOT NULL REFERENCES fods (fod_id)
A pointer to the parent fixed-output derivation.
INTEGER NOT NULL
The position of this component in the list.
TEXT NOT NULL
The location to put this component (as a file or directory name).
TEXT NOT NULL
The SWHID of this component.
INTEGER
(Boolean)
Whether or not this component is in the Software Heritage
archive. If this is FALSE
, then the component
could not be found in the archive. If this
is NULL
, then it has not been checked.
A simple table attaching “references” to fixed-output derivations. Every record is unique (the primary key is both columns).
INTEGER NOT NULL REFERENCES fods (fod_id)
A pointer to a fixed-output derivation.
TEXT
The type of the reference. See the schema below for
possible values. Note that this column may
be NULL
, which means that the type is
unknown.
BOOLEAN
A flag indicating that a reference is probably a mistake and should not be used.
TEXT NOT NULL
A serialized form of the “uri” field of the origin
record that corresponds to the fixed-output derivation.
For a plain URL, this will be a serialized
string: "https://..."
. For a Git reference,
it will look like:
(git-reference (url "...") (commit "...") (recursive? #f))
It will be similar for other kinds of references (Subversion, CVS, etc.). Oftentimes the reference will have more information and be easier to use than the serialized derivation, which makes it extremely useful.
This table records failed attempts at identifying (finding the SWHID) of fixed-output derivations. It’s main purpose is to prevent trying the same fixed-output derivation over and over. Every record is unique (the primary key is both columns).
INTEGER NOT NULL REFERENCES fods (fod_id)
A pointer to a fixed-output derivation.
INTEGER
(Unix timestamp)
The time that the identification attempt failed.
TEXT
The type of the failure. See the schema below for possible values.
CREATE TABLE commits (commit_id INTEGER PRIMARY KEY, hash TEXT NOT NULL UNIQUE, push_time INTEGER); CREATE TABLE fods (fod_id INTEGER PRIMARY KEY, algorithm TEXT NOT NULL, hash TEXT NOT NULL, swhid TEXT, is_in_swh INTEGER, is_in_disarchive INTEGER, derivation TEXT NOT NULL, UNIQUE (hash, algorithm)); CREATE TABLE fod_commit_links (fod_id INTEGER NOT NULL REFERENCES fods (fod_id), commit_id INTEGER NOT NULL REFERENCES commits (commit_id), PRIMARY KEY (fod_id, commit_id)); CREATE INDEX fod_commit_links_reverse_index ON fod_commit_links (commit_id); CREATE TABLE subfods (fod_id INTEGER NOT NULL REFERENCES fods (fod_id), ordinal INTEGER NOT NULL, location TEXT NOT NULL, swhid TEXT NOT NULL, is_in_swh INTEGER, UNIQUE (fod_id, ordinal)); CREATE TABLE reference_types (reference_type TEXT NOT NULL PRIMARY KEY); INSERT INTO reference_types VALUES ('git'), ('svn'), ('hg'), ('cvs'), ('bzr'), ('tar-gz'), ('tar-xz'), ('tar-bz2'), ('tar'), ('zip'), ('text'); CREATE TABLE fod_references (fod_id INTEGER NOT NULL REFERENCES fods (fod_id), reference_type TEXT REFERENCES reference_types (reference_type), -- The 'is_error' flag is set for rows that are clearly a mistake. -- Sometimes, if the reference is wrong but the file is already in -- the Guix store, a committer won't notice that the reference is -- bogus. This flag lets us ignore those references. is_error BOOLEAN NOT NULL DEFAULT FALSE, reference TEXT NOT NULL, PRIMARY KEY (fod_id, reference)); CREATE TABLE failure_types (failure_type TEXT NOT NULL PRIMARY KEY); INSERT INTO failure_types VALUES ('bail'), ('verify'), ('fetch'), ('disarchive'); CREATE TABLE failed_fetches (fod_id INTEGER NOT NULL, reference TEXT NOT NULL, fetch_time INTEGER NOT NULL, failure_type TEXT REFERENCES failure_types (failure_type), PRIMARY KEY (fod_id, reference) FOREIGN KEY (fod_id, reference) REFERENCES fod_references (fod_id, reference));