Extract¶

extract_ methods that return the documents to index as a list of dicts.

Each dict sets these keys:

url: The remote URL of the document, which might include a fragment identifier
title: The title of the document, which might be the page title and the heading text
text: The plain text content of the document

ocdsindex.extract.extract_sphinx(url, tree)[source]¶

Extract one document per section of the page.

Parameters:

Returns:

a list of dicts representing the documents to index

Return type:

list