CSODataset
CSODataset(
table_code,
*,
filters=None,
include_ids=None,
drop_filtered_cols=False,
drop_national_data=False,
convert_dates=False,
sanitise=False,
cache=True,
)A dataset from Ireland’s Central Statistics Office.
This class provides a convenient interface for loading CSO datasets, with optional spatial data integration. Data is lazily loaded on first access, and results are cached for subsequent access.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| table_code | str | The CSO table code (e.g., ‘FY003A’). | required |
| filters | dict | Filters to apply to the dataset dimensions. | None |
| include_ids | str | Which ID columns to include in output. Can be: - “all”: Include all ID columns for every dimension. - “spatial_only”: Include only the ID column for the spatial dimension. - “none” (default): Exclude all ID columns. - A list of column names: Include only ID columns for the specified columns (e.g., [“County”, “Sex”] to include “County ID” and “Sex ID”). If the list contains column names that do not correspond to dimensions in the dataset, a ValidationError is raised. | None |
| drop_filtered_cols | bool | Whether to drop columns for filtered dimensions. | False |
| drop_national_data | bool | Whether to exclude national-level (Ireland) rows. | False |
| convert_dates | bool | Whether to parse temporal columns as datetime. | False |
| sanitise | bool | Whether to sanitise column names for consistency. When True, applies standardised transformations: replacing ‘&’ with ‘and’, normalising slashes and spaces, and applying standard name mappings. | False |
| cache | bool | Whether to cache API responses. Defaults to True. | True |
Examples
>>> dataset = CSODataset(
... "FY003A",
... filters={"CensusYear": ["2022"], "Sex": ["Both sexes"]},
... include_ids="spatial_only",
... )
>>> df = dataset.df()
>>> gdf = dataset.gdf()
>>> # Include specific ID columns
>>> dataset = CSODataset("FY003A", include_ids=["County", "Sex"])Methods
| Name | Description |
|---|---|
| describe | Print a summary of the dataset metadata. |
| df | Get the dataset as a DataFrame. |
| gdf | Get the dataset as a GeoDataFrame with spatial data. |
describe
CSODataset.describe()Print a summary of the dataset metadata.
Displays information about the dataset including its code, title, variables, units, tags, and other relevant metadata.
Examples
>>> dataset = CSODataset("FY003A")
>>> dataset.describe()df
CSODataset.df(pivot_format='long', *, copy=True)Get the dataset as a DataFrame.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| pivot_format | str | The output format for the data. Options: “long” (default), “wide”, “tidy”. | 'long' |
| copy | bool | Whether to return a copy of the cached DataFrame. Defaults to True to prevent accidental mutation of cached data. Set to False for better performance if you won’t modify the result. | True |
Returns
| Name | Type | Description |
|---|---|---|
| pd.DataFrame | The dataset as a pandas DataFrame. |
Examples
>>> df = dataset.df("wide")gdf
CSODataset.gdf(pivot_format='long', *, copy=True)Get the dataset as a GeoDataFrame with spatial data.
The returned GeoDataFrame contains all rows from the dataset, including aggregate regions (e.g., “State”, “Leinster”) that may not have corresponding geometries in the spatial data. These rows will have null (None) geometries.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| pivot_format | str | The output format for the data. Options: “long” (default), “wide”, “tidy”. | 'long' |
| copy | bool | Whether to return a copy of the cached GeoDataFrame. Defaults to True to prevent accidental mutation of cached data. Set to False for better performance if you won’t modify the result. | True |
Returns
| Name | Type | Description |
|---|---|---|
| gpd.GeoDataFrame | The dataset as a GeoDataFrame with geometry column. Rows for aggregate regions without spatial boundaries will have null geometries. |
Examples
>>> gdf = dataset.gdf()
>>> gdf.plot(column="value")
>>> # Check for null geometries
>>> gdf[gdf.geometry.isna()]