CSOCatalogue

CSOCatalogue(cache=True, sanitise=False)

Browse and search available CSO datasets.

The catalogue provides access to the CSO’s table of contents and allows searching by various criteria. It also provides hierarchical navigation through subjects, products, and datasets.

Parameters

Name Type Description Default
cache bool Whether to cache API responses. Defaults to True. True
sanitise bool Whether to sanitise variable names for consistency. When True, applies standardised transformations to variable names. False

Examples

>>> catalogue = CSOCatalogue()
>>> toc = catalogue.toc()
>>> results = catalogue.search(title="population")
>>> print(results.head())

Methods

Name Description
search Search the table of contents for datasets matching criteria.
toc Get the table of contents for all CSO datasets.

search

CSOCatalogue.search(
    code=None,
    title=None,
    variables=None,
    time_variable=None,
    time_range=None,
    from_date=None,
    organisation=None,
    exceptional=None,
)

Search the table of contents for datasets matching criteria.

All criteria are combined with AND logic. Text searches support boolean expressions with AND, OR, NOT operators and parentheses. Use quotation marks for exact phrase matching.

Parameters

Name Type Description Default
code str | None Filter by table code (substring match). None
title str | None Filter by title. Supports boolean expressions: - “population” - titles containing “population” - “population AND county” - must contain both - “population OR census” - must contain either - “population NOT census” - contains population but not census - ‘“exact phrase”’ - matches exact phrase None
variables str | None Filter by variable names. Supports boolean expressions with AND/OR/NOT operators and parentheses. Examples: - “County” - datasets with a variable containing “County” - “County AND Year” - must have both - “Cork OR Dublin” - must have either - “County AND NOT Electoral” - has County but not Electoral - “(Cork OR Dublin) AND Population” - complex expressions None
time_variable str | None Filter by time variable label. Supports boolean expressions like title. None
time_range str | None Filter by time range. Accepts: - Single date: “2023”, “January 2023”, “2023Q1”, “2023-01-15” - Date range tuple: “(2020, 2023)” or “(January 2020, December 2023)” Returns datasets whose date range overlaps with the specified date/range. None
from_date str | None Only include datasets updated on or after this date. None
organisation str | None Filter by organisation name (substring match). None
exceptional bool | None Filter by exceptional release status. None

Returns

Name Type Description
pd.DataFrame A DataFrame containing matching datasets.

Examples

>>> catalogue = CSOCatalogue()
>>> results = catalogue.search(title="census AND population")
>>> results = catalogue.search(variables="County AND NOT Electoral")
>>> # Find datasets covering 2020-2023
>>> results = catalogue.search(time_range="(2020, 2023)")

toc

CSOCatalogue.toc(from_date=None)

Get the table of contents for all CSO datasets.

Parameters

Name Type Description Default
from_date str | None Only return tables modified after this date (YYYY-MM-DD). Defaults to 2000-01-01 to include all datasets. None

Returns

Name Type Description
pd.DataFrame A DataFrame with columns: Code, Title, Variables, Time Variable, Date Range, Updated, Organisation, Exceptional.

Examples

>>> catalogue = CSOCatalogue()
>>> toc = catalogue.toc(from_date="2023-01-01")
>>> print(len(toc))