Command-Line Interface

The netrc file is supported by commands that interact with Elasticsearch.

sphinx

Prints the URL and the documents to index from the OCDS documentation as JSON.

ocdsindex sphinx DIRECTORY BASE_URL
  • DIRECTORY: the directory to crawl, containing language directories and HTML files

  • BASE_URL: the URL of the website whose files are crawled

Example:

ocdsindex sphinx path/to/standard/build/ https://standard.open-contracting.org/staging/1.1-dev/ > data.json

The output looks like:

{
  "base_url": "https://standard.open-contracting.org/staging/1.1-dev/",
  "created_at": 1577880000,
  "documents": {
    "en": [
      {
        "url": "https://standard.open-contracting.org/staging/1.1-dev/en/#about",
        "title": "Open Contracting Data Standard: Documentation - About",
        "text": "The Open Contracting Data Standard …"
      }
    ]
  }
}

with additional keys for each language and additional objects for each document.

extension-explorer

Prints the URL and the documents to index from the Extension Explorer as JSON.

ocdsindex extension-explorer FILE

Example:

ocdsindex extension-explorer path/to/extension_explorer/data/extensions.json > data.json

index

Adds documents to Elasticsearch indices.

ocdsindex index HOST FILE
  • HOST: the connection URI for Elasticsearch, like https://user:pass@host:9200

  • FILE: the file containing the output of the sphinx or extension-explorer command

Example:

ocdsindex index https://user:pass@host:9200 data.json

reindex

Reindexes documents into a new versioned index.

For each ocdsindex_XX alias, creates a new ocdsindex_XX-NNNN index, copies all documents into it, atomically updates the alias to point to the new index, and deletes the old index.

ocdsindex reindex HOST
  • HOST: the connection URI for Elasticsearch, like https://user:pass@host:9200

Example:

ocdsindex reindex https://user:pass@host:9200

copy

Adds a document with a DESTINATION base URL for each document with a SOURCE base URL.

ocdsindex copy HOST SOURCE DESTINATION
  • HOST: the connection URI for Elasticsearch, like https://user:pass@host:9200

  • SOURCE: the base URL of the documents to copy

  • DESTINATION: the base URL of the documents to create

Example:

ocdsindex copy https://user:pass@host:9200 https://standard.open-contracting.org/staging/latest/ https://standard.open-contracting.org/latest/

expire

Deletes documents from Elasticsearch indices that were crawled more than 180 days ago.

ocdsindex expire HOST --exclude-file FILENAME
  • HOST: the connection URI for Elasticsearch, like https://user:pass@host:9200

  • --exclude-file FILENAME: exclude any document whose base URL is equal to a line in this file

Example:

ocdsindex expire https://user:pass@host:9200 --exclude-file exclude.txt

Where exclude.txt contains:

https://standard.open-contracting.org/latest/
https://standard.open-contracting.org/1.1/