Tutorial

This tutorial provides a short overview of the functionalities of the pycsodata package, using the dataset NDQ05 as an example.

Installing the package

Installation is via pip:

pip install pycsodata

Finding datasets in the catalogue

We can get a dataframe containing all CSO datasets, sorted by date updated, as follows:

from pycsodata import CSOCatalogue

cat = CSOCatalogue()

# Load catalogue's entire table of contents
toc = cat.toc()
toc.head()
Code Title Variables Time Variable Date Range Updated Organisation Exceptional
0 TFA13 Road Freight Transport Activity [Year, Main use of Vehicle, Axle Configuration... Year 2006 - 2024 2026-02-03 Central Statistics Office, Ireland False
1 TFA23 Road Freight Activity [Year, NST Group] Year 2009 - 2024 2026-02-03 Central Statistics Office, Ireland False
2 TALISA51 Early Learning & Care (ELC) workers preferred ... [Year, Group, Preferred number of weeks worked... Year 2024 2026-02-03 Central Statistics Office, Ireland False
3 TALISA50 Early Learning & Care (ELC) workers current nu... [Year, Group, Current number of weeks worked p... Year 2024 2026-02-03 Central Statistics Office, Ireland False
4 TALISA49 Early Learning & Care (ELC) workers preferred ... [Year, Group, Preferred working hours] Year 2024 2026-02-03 Central Statistics Office, Ireland False

Now, suppose that we are interested in finding datasets on new dwelling completions at county level. We can search for these in the catalogue as follows:

# Search the catalogue by its various fields
results = cat.search(
    title="new dwelling",
    variables="county OR counties OR 'local authority'")
results.head()
Code Title Variables Time Variable Date Range Updated Organisation Exceptional
0 NDQ06 New Dwelling Completion [Quarter, Type of House, Local Authority] Quarter 2011Q1 - 2025Q4 2026-01-29 Central Statistics Office, Ireland False
1 NDQ05 New Dwelling Completion [Quarter, Local Authority] Quarter 2011Q1 - 2025Q4 2026-01-29 Central Statistics Office, Ireland False
2 BHQ17 Planning Permissions Granted for New Houses an... [Quarter, County, Type of Dwelling] Quarter 1975Q1 - 2025Q3 2025-12-10 Central Statistics Office, Ireland False
3 BHA14 Planning Permissions Granted for New Houses an... [Year, Type of Dwelling, County] Year 1975 - 2024 2025-03-12 Central Statistics Office, Ireland False
4 SAP2022T6T10CTY Households with Renewable Energy Source [Census Year, Administrative Counties, Renewab... Census Year 2022 2023-09-15 Central Statistics Office, Ireland False

Loading datasets

We can load the dataset with table code NDQ05 and print its metadata as follows:

from pycsodata import CSODataset

# Load the CSO dataset with code "NDQ05"
ds = CSODataset("NDQ05")

# Print its metadata
ds.describe()
Code:                NDQ05
Title:               New Dwelling Completion

Variables:           [1] Statistic
                        (1) New Dwelling Completion
                            Unit: Number
                     [2] Quarter
                     [3] Local Authority

Tags:                Official Statistics, Geographic Data
Time Variable:       Quarter
Geographic Variable: Local Authority

Last Updated:        2026-01-29
Reason for Release:  Planned Routine Revision

Notes:             * Classification into local authorities has taken into account
                     boundary changes between Cork City and Cork County which came
                     into effect in May 2019. All data within this table for all
                     quarters is now based on these new boundaries.
                   * For more information, please go to the statistical release page
                     (https://www.cso.ie/en/statistics/construction/newdwellingcompletions/)
                     on our website.

Contact Name:        Steven Conroy
Contact Email:       housing@cso.ie
Contact Phone:       (+353) 1 498 4311
Copyright:           Central Statistics Office, Ireland (https://www.cso.ie/)

We can then load this dataset into a pandas DataFrame:

df = ds.df()
df.head()
Statistic Quarter Local Authority value
0 New Dwelling Completion 2011Q1 Ireland 1875.0
1 New Dwelling Completion 2011Q1 Cork City Council 39.0
2 New Dwelling Completion 2011Q1 Clare County Council 52.0
3 New Dwelling Completion 2011Q1 Cavan County Council 61.0
4 New Dwelling Completion 2011Q1 Cork County Council 168.0

Pivoting

There are three available pivot_format options for the resulting DataFrame. The default is "long", where both the Statistic and Time Variable (typically Year, Quarter, or similar) columns are stacked.

Alternatively, pivot_format="wide" can be used to give each Time Variable its own column:

from pycsodata import CSODataset

# Load the CSO dataset with code "NDQ05"
ds = CSODataset("NDQ05")

df = ds.df("wide")
df.head()
Statistic Local Authority 2011Q1 2011Q2 2011Q3 2011Q4 2012Q1 2012Q2 2012Q3 2012Q4 ... 2023Q3 2023Q4 2024Q1 2024Q2 2024Q3 2024Q4 2025Q1 2025Q2 2025Q3 2025Q4
0 New Dwelling Completion Ireland 1875.0 1791.0 1687.0 1641.0 1131.0 1117.0 1205.0 1458.0 ... 8393.0 10207.0 5797.0 6813.0 8878.0 8659.0 5914.0 9163.0 9213.0 11994.0
1 New Dwelling Completion Cork City Council 39.0 36.0 51.0 25.0 20.0 27.0 26.0 22.0 ... 156.0 319.0 237.0 301.0 306.0 376.0 278.0 327.0 495.0 399.0
2 New Dwelling Completion Clare County Council 52.0 49.0 47.0 57.0 44.0 38.0 39.0 39.0 ... 95.0 123.0 127.0 102.0 159.0 146.0 129.0 187.0 145.0 164.0
3 New Dwelling Completion Cavan County Council 61.0 28.0 45.0 37.0 20.0 23.0 23.0 33.0 ... 92.0 70.0 50.0 69.0 60.0 65.0 57.0 80.0 115.0 98.0
4 New Dwelling Completion Cork County Council 168.0 200.0 164.0 194.0 122.0 115.0 129.0 148.0 ... 539.0 522.0 458.0 532.0 678.0 647.0 442.0 512.0 544.0 730.0

5 rows × 62 columns

Or pivot_format="tidy" can be used to give each Statitistic its own column:

df = ds.df("tidy")
df.head()
Quarter Local Authority New Dwelling Completion
0 2011Q1 Ireland 1875.0
1 2011Q1 Cork City Council 39.0
2 2011Q1 Clare County Council 52.0
3 2011Q1 Cavan County Council 61.0
4 2011Q1 Cork County Council 168.0

Including ID codes

The include_ids argument can be used to choose for which dimensions of the dataset an additional ID codes column should be included. The options are "none" (default), "all", "spatial_only" (which only applies to datasets including spatial data), or a list of columns for which IDs should be included. For example:

from pycsodata import CSODataset

# Include all ID code columns
ds = CSODataset("NDQ05", include_ids="all")

df = ds.df()
df.head()
Statistic Statistic ID Quarter Quarter ID Local Authority Local Authority ID value
0 New Dwelling Completion NDQ05 2011Q1 20111 Ireland - 1875.0
1 New Dwelling Completion NDQ05 2011Q1 20111 Cork City Council 2ae19629-1434-13a3-e055-000000000001 39.0
2 New Dwelling Completion NDQ05 2011Q1 20111 Clare County Council 2ae19629-14a2-13a3-e055-000000000001 52.0
3 New Dwelling Completion NDQ05 2011Q1 20111 Cavan County Council 2ae19629-149d-13a3-e055-000000000001 61.0
4 New Dwelling Completion NDQ05 2011Q1 20111 Cork County Council 2ae19629-14a3-13a3-e055-000000000001 168.0

Dropping national aggregates

The drop_national_data argument can be used to attempt to drop national aggregate rows (note that this may not always work, but in this case rows with Local Authority ‘Ireland’ are successfully dropped):

from pycsodata import CSODataset

# Drop rows corresponding to national data
ds = CSODataset("NDQ05", drop_national_data=True)

df = ds.df()
df.head()
Statistic Quarter Local Authority value
0 New Dwelling Completion 2011Q1 Cork City Council 39.0
1 New Dwelling Completion 2011Q1 Clare County Council 52.0
2 New Dwelling Completion 2011Q1 Cavan County Council 61.0
3 New Dwelling Completion 2011Q1 Cork County Council 168.0
4 New Dwelling Completion 2011Q1 Carlow County Council 17.0

Applying filters

The data can be conveniently filtered on any of its dimensions by passing filters, a dictionary mapping each dimension to a list containing a subset of values:

from pycsodata import CSODataset

# Filter the data
ds = CSODataset("NDQ05", filters={
        "Quarter":["2025Q1", "2025Q2", "2025Q3", "2025Q4"],
        "Local Authority":["Wicklow County Council"]})
df = ds.df()
df.head()
Statistic Quarter Local Authority value
0 New Dwelling Completion 2025Q1 Wicklow County Council 305.0
1 New Dwelling Completion 2025Q2 Wicklow County Council 273.0
2 New Dwelling Completion 2025Q3 Wicklow County Council 336.0
3 New Dwelling Completion 2025Q4 Wicklow County Council 466.0

The drop_filtered_cols argument can be used to drop the filtered columns when they no longer provide useful information:

# Filter the data and drop filtered columns
ds = CSODataset("NDQ05", filters={
        "Local Authority":["Wicklow County Council"]
        }, drop_filtered_cols=True)
df = ds.df()
df.head()
Statistic Quarter value
0 New Dwelling Completion 2011Q1 74.0
1 New Dwelling Completion 2011Q2 37.0
2 New Dwelling Completion 2011Q3 60.0
3 New Dwelling Completion 2011Q4 47.0
4 New Dwelling Completion 2012Q1 23.0

Working with dates and times

By default, all columns except the value column contain strings (object dtypes). The Time Variable column can be converted to a pandas datetime format by setting convert_dates=True when loading a dataset:

from pycsodata import CSODataset

ds = CSODataset("NDQ05", drop_national_data=True, convert_dates=True)
df = ds.df("tidy")
df.dtypes
Quarter                    period[Q-DEC]
Local Authority                   object
New Dwelling Completion          float64
dtype: object

This allows us to easily perform additional manipulations, like extracting the Year from the Quarter, and then summing all new dwelling completions in each year:

df["Year"] = df["Quarter"].dt.year
df_grouped = df.groupby('Year')['New Dwelling Completion'].sum().reset_index()
df_grouped.head()
Year New Dwelling Completion
0 2011 6994.0
1 2012 4911.0
2 2013 4575.0
3 2014 5518.0
4 2015 7219.0

We can then plot this:

import matplotlib.pyplot as plt

plt.plot(df_grouped['Year'], df_grouped['New Dwelling Completion'], marker='o')
plt.title('New Dwelling Completions in Ireland by Year')
plt.xlabel('Year')
plt.ylabel('Number of Completions')
plt.show()

Line plot showing total annual new dwelling completions in Ireland for each year between 2011 and 2025

Loading spatial data

Since the NDQ05 dataset’s .describe() method shows that it has the tag Geographic Data, we can get a geopandas GeoDataFrame for this dataset as follows:

from pycsodata import CSODataset

# Get data just for most recent quarter as a GeoDataFrame
ds = CSODataset("NDQ05",
                filters={"Quarter":["2025Q4"]},
                drop_filtered_cols=True,
                drop_national_data=True,
                convert_dates=True
                )
gdf = ds.gdf("tidy")
gdf.head()
Local Authority New Dwelling Completion geometry
0 Cork City Council 399.0 POLYGON ((-8.38406 51.90423, -8.38062 51.90335...
1 Clare County Council 164.0 MULTIPOLYGON (((-8.31674 52.98514, -8.30983 52...
2 Cavan County Council 98.0 POLYGON ((-7.75102 54.10173, -7.75103 54.10212...
3 Cork County Council 730.0 MULTIPOLYGON (((-8.18321 52.28768, -8.17858 52...
4 Carlow County Council 101.0 POLYGON ((-6.9719 52.80941, -6.97141 52.80927,...

We can then easily plot the data on a chloropleth map:

import matplotlib.pyplot as plt

gdf.plot(column='New Dwelling Completion', legend=True, cmap='OrRd')
plt.title('New Dwelling Completions by Local Authority - 2025 Q4')
plt.show()

Chloropleth map of Ireland showing total new dwelling completions per Local Authority (County Council) for 2025 Q4.

Managing the cache

The cache is managed using the CSOCache class, which is shared across all CSODataset and CSOCatalogue instances.

from pycsodata import CSOCache

cache = CSOCache()

cache.info()
CacheInfo(size=4, maxsize=256, ttl_seconds=86400, hit_rate=77.8%)

To flush the cache, simply call .flush():

cache.flush()