CSODataset

CSODataset(
    table_code,
    *,
    filters=None,
    include_ids=None,
    drop_filtered_cols=False,
    drop_national_data=False,
    convert_dates=False,
    sanitise=False,
    cache=True,
)

A dataset from Ireland’s Central Statistics Office.

This class provides a convenient interface for loading CSO datasets, with optional spatial data integration. Data is lazily loaded on first access, and results are cached for subsequent access.

Parameters

Name	Type	Description	Default
table_code	str	The CSO table code (e.g., ‘FY003A’).	required
filters	dict	Filters to apply to the dataset dimensions.	`None`
include_ids	str	Which ID columns to include in output. Can be: - “all”: Include all ID columns for every dimension. - “spatial_only”: Include only the ID column for the spatial dimension. - “none” (default): Exclude all ID columns. - A list of column names: Include only ID columns for the specified columns (e.g., [“County”, “Sex”] to include “County ID” and “Sex ID”). If the list contains column names that do not correspond to dimensions in the dataset, a ValidationError is raised.	`None`
drop_filtered_cols	bool	Whether to drop columns for filtered dimensions.	`False`
drop_national_data	bool	Whether to exclude national-level (Ireland) rows.	`False`
convert_dates	bool	Whether to parse temporal columns as datetime.	`False`
sanitise	bool	Whether to sanitise column names for consistency. When True, applies standardised transformations: replacing ‘&’ with ‘and’, normalising slashes and spaces, and applying standard name mappings.	`False`
cache	bool	Whether to cache API responses. Defaults to True.	`True`

Examples

>>> dataset = CSODataset(
...     "FY003A",
...     filters={"CensusYear": ["2022"], "Sex": ["Both sexes"]},
...     include_ids="spatial_only",
... )
>>> df = dataset.df()
>>> gdf = dataset.gdf()
>>> # Include specific ID columns
>>> dataset = CSODataset("FY003A", include_ids=["County", "Sex"])

Methods

Name	Description
describe	Print a summary of the dataset metadata.
df	Get the dataset as a DataFrame.
gdf	Get the dataset as a GeoDataFrame with spatial data.

describe

CSODataset.describe()

Print a summary of the dataset metadata.

Displays information about the dataset including its code, title, variables, units, tags, and other relevant metadata.

Examples

>>> dataset = CSODataset("FY003A")
>>> dataset.describe()

df

CSODataset.df(pivot_format='long', *, copy=True)

Get the dataset as a DataFrame.

Parameters

Name	Type	Description	Default
pivot_format	str	The output format for the data. Options: “long” (default), “wide”, “tidy”.	`'long'`
copy	bool	Whether to return a copy of the cached DataFrame. Defaults to True to prevent accidental mutation of cached data. Set to False for better performance if you won’t modify the result.	`True`

Returns

Name	Type	Description
	pd.DataFrame	The dataset as a pandas DataFrame.

Examples

>>> df = dataset.df("wide")

gdf

CSODataset.gdf(pivot_format='long', *, copy=True)

Get the dataset as a GeoDataFrame with spatial data.

The returned GeoDataFrame contains all rows from the dataset, including aggregate regions (e.g., “State”, “Leinster”) that may not have corresponding geometries in the spatial data. These rows will have null (None) geometries.

Parameters

Name	Type	Description	Default
pivot_format	str	The output format for the data. Options: “long” (default), “wide”, “tidy”.	`'long'`
copy	bool	Whether to return a copy of the cached GeoDataFrame. Defaults to True to prevent accidental mutation of cached data. Set to False for better performance if you won’t modify the result.	`True`

Returns

Name	Type	Description
	gpd.GeoDataFrame	The dataset as a GeoDataFrame with geometry column. Rows for aggregate regions without spatial boundaries will have null geometries.

Examples

>>> gdf = dataset.gdf()
>>> gdf.plot(column="value")
>>> # Check for null geometries
>>> gdf[gdf.geometry.isna()]