CSODataset

CSODataset(
    table_code,
    *,
    filters=None,
    include_ids=None,
    drop_filtered_cols=False,
    drop_national_data=False,
    convert_dates=False,
    sanitise=False,
    cache=True,
)

A dataset from Ireland’s Central Statistics Office.

This class provides a convenient interface for loading CSO datasets, with optional spatial data integration. Data is lazily loaded on first access, and results are cached for subsequent access.

Parameters

Name Type Description Default
table_code str The CSO table code (e.g., ‘FY003A’). required
filters dict Filters to apply to the dataset dimensions. None
include_ids str Which ID columns to include in output. Can be: - “all”: Include all ID columns for every dimension. - “spatial_only”: Include only the ID column for the spatial dimension. - “none” (default): Exclude all ID columns. - A list of column names: Include only ID columns for the specified columns (e.g., [“County”, “Sex”] to include “County ID” and “Sex ID”). If the list contains column names that do not correspond to dimensions in the dataset, a ValidationError is raised. None
drop_filtered_cols bool Whether to drop columns for filtered dimensions. False
drop_national_data bool Whether to exclude national-level (Ireland) rows. False
convert_dates bool Whether to parse temporal columns as datetime. False
sanitise bool Whether to sanitise column names for consistency. When True, applies standardised transformations: replacing ‘&’ with ‘and’, normalising slashes and spaces, and applying standard name mappings. False
cache bool Whether to cache API responses. Defaults to True. True

Examples

>>> dataset = CSODataset(
...     "FY003A",
...     filters={"CensusYear": ["2022"], "Sex": ["Both sexes"]},
...     include_ids="spatial_only",
... )
>>> df = dataset.df()
>>> gdf = dataset.gdf()
>>> # Include specific ID columns
>>> dataset = CSODataset("FY003A", include_ids=["County", "Sex"])

Methods

Name Description
describe Print a summary of the dataset metadata.
df Get the dataset as a DataFrame.
gdf Get the dataset as a GeoDataFrame with spatial data.

describe

CSODataset.describe()

Print a summary of the dataset metadata.

Displays information about the dataset including its code, title, variables, units, tags, and other relevant metadata.

Examples

>>> dataset = CSODataset("FY003A")
>>> dataset.describe()

df

CSODataset.df(pivot_format='long', *, copy=True)

Get the dataset as a DataFrame.

Parameters

Name Type Description Default
pivot_format str The output format for the data. Options: “long” (default), “wide”, “tidy”. 'long'
copy bool Whether to return a copy of the cached DataFrame. Defaults to True to prevent accidental mutation of cached data. Set to False for better performance if you won’t modify the result. True

Returns

Name Type Description
pd.DataFrame The dataset as a pandas DataFrame.

Examples

>>> df = dataset.df("wide")

gdf

CSODataset.gdf(pivot_format='long', *, copy=True)

Get the dataset as a GeoDataFrame with spatial data.

The returned GeoDataFrame contains all rows from the dataset, including aggregate regions (e.g., “State”, “Leinster”) that may not have corresponding geometries in the spatial data. These rows will have null (None) geometries.

Parameters

Name Type Description Default
pivot_format str The output format for the data. Options: “long” (default), “wide”, “tidy”. 'long'
copy bool Whether to return a copy of the cached GeoDataFrame. Defaults to True to prevent accidental mutation of cached data. Set to False for better performance if you won’t modify the result. True

Returns

Name Type Description
gpd.GeoDataFrame The dataset as a GeoDataFrame with geometry column. Rows for aggregate regions without spatial boundaries will have null geometries.

Examples

>>> gdf = dataset.gdf()
>>> gdf.plot(column="value")
>>> # Check for null geometries
>>> gdf[gdf.geometry.isna()]