What and Why?
The pseudonymisation solution of Research Services is being updated in phases during 2025–2026. The reasons for the update are changes to Statistics Finland’s internal pseudo identifiers and the need to strengthen data protection. The new pseudo identifiers will provide significantly better protection and allow more flexibility in the use of special characters and identifier lengths.
The renewal work has been divided into two phases:
- The first phase, started earlier this year, covered pseudonymisation of companies and other identifiers.
- We are now in the second phase, which focuses on protecting personal identifiers.
Centralised Work by Research Services
Research Services will centrally replace identifiers in continuously updated ready-made datasets. This work began on 5 December, and re-pseudonymisation is being carried out as quickly as possible.
For users, the changes appear as follows: files in dataset folders are replaced with new versions of the same name, where the old shnro identifier has been replaced with the new hid_e identifier. The old version is moved to the shnro_suojattu subfolder, where it remains available during the transition period until 30 April 2026.
What Researchers Need to Do Themselves
Manually delivered datasets—such as customised datasets, non-continuous ready-made datasets, extracts from ready-made datasets, external datasets, and researchers’ own work files—cannot unfortunately be processed centrally due to limited resources. Research projects must therefore replace pseudo identifiers in these datasets themselves. For this purpose, link tables will be provided during the transition.
Workload
We have received extensive feedback from researchers about the scale and complexity of the task. We fully understand the frustration caused by the additional time-consuming work. Concerns about limited storage space are also valid. We are currently exploring ways to reduce the workload for research projects. We welcome ideas and feedback on this matter and apologise for the inconvenience caused.
Practical Guidance
Instructions for Changes
Link tables will be made available in FIONA for projects. With these, old identifiers can be replaced with new ones on the W-drive, after which the dataset can be transferred to the D-drive upon request by email. A separate instruction for re-pseudonymisation is available in the FIONA folder D:\keys.
Duplicate Rows in Link Tables
The shnro-hid_e link file contains about 700 hid_e identifiers that are linked to two different shnro identifiers. These are always exceptions, typically related to changes in personal id codes (hetu) that have not been updated in time. Such cases are being corrected continuously, but some pending corrections will always remain. For now, it is advisable to search for hid_e based on shnro (rather than the other way around), as this should yield only one match. We are investigating additional solutions for this issue.
Annual Foldering Brings Further Code Changes
As previously announced, the data protection project of Research Services will introduce annual foldering of ready-made datasets in spring 2026. Annual foldering means that datasets suitable for division by year will be organised into yearly folders in FIONA. This enables granting access rights only for the period relevant to the research, without additional costs to projects.
With annual foldering, the location folders of all ready-made datasets in FIONA will change, requiring adjustments in research projects’ analysis codes. In addition, some datasets previously available as continuous time series will be split into separate yearly files, which will also affect analysis codes. More detailed information on these changes and new dataset locations will be provided early in the year.
Storage Capacity
Re-pseudonymisation of datasets is carried out on the W-drive, which has limited capacity. Large datasets have therefore caused difficulties for research projects. We aim to transfer re-pseudonymised datasets quickly from the W-drive to the D-drive to free up space, and we are exploring options to increase storage capacity.
Timeline
The deadline for the pseudonymisation update is 30 April 2026. By this date, all identifiers must be converted to new ones.
Datasets protected with old shnro identifiers will no longer be produced by Research Services, meaning no new data will be delivered to FIONA’s shnro_suojattu folders.
Annual foldering is scheduled for February 2026, with a planned transition period between the old and new folder structures during spring 2026.