[2506.13792] ICE-ID: A Novel Historical Census Dataset for Longitudinal Identity Resolution

[2506.13792] ICE-ID: A Novel Historical Census Dataset for Longitudinal Identity Resolution

arXiv - Machine Learning 3 min read Article

Summary

ICE-ID is a comprehensive historical census dataset featuring over 984,000 records from 16 census waves in Iceland, aimed at improving longitudinal identity resolution in AI applications.

Why It Matters

This dataset addresses significant challenges in identity resolution by providing a rich historical context, which is crucial for developing more accurate AI models. It offers insights into temporal data handling and enhances the understanding of person identification across time, benefiting researchers and practitioners in AI and data science.

Key Takeaways

  • ICE-ID includes 984,028 records from 220 years of Icelandic census data.
  • The dataset addresses unique challenges like hierarchical geography and patronymic naming conventions.
  • It provides tools for interactive exploration and analysis of identity resolution.
  • Baseline model comparisons are included to benchmark performance against classical datasets.
  • The dataset is publicly available for research and development purposes.

Computer Science > Artificial Intelligence arXiv:2506.13792 (cs) [Submitted on 11 Jun 2025 (v1), last revised 23 Feb 2026 (this version, v2)] Title:ICE-ID: A Novel Historical Census Dataset for Longitudinal Identity Resolution Authors:Gonçalo Hora de Carvalho, Lazar S. Popov, Sander Kaatee, Mário S. Correia, Kristinn R. Thórisson, Tangrui Li, Pétur Húni Björnsson, Eiríkur Smári Sigurðarson, Jilles S. Dibangoye View a PDF of the paper titled ICE-ID: A Novel Historical Census Dataset for Longitudinal Identity Resolution, by Gon\c{c}alo Hora de Carvalho and 8 other authors View PDF HTML (experimental) Abstract:We introduce \textbf{ICE-ID}, a benchmark dataset comprising 984,028 records from 16 Icelandic census waves spanning 220 years (1703--1920), with 226,864 expert-curated person identifiers. ICE-ID combines hierarchical geography (farm$\to$parish$\to$district$\to$county), patronymic naming conventions, sparse kinship links (partner, father, mother), and multi-decadal temporal drift -- challenges not captured by standard product-matching or citation datasets. This paper presents an artifact-backed analysis of temporal coverage, missingness, identifier ambiguity, candidate-generation efficiency, and cluster distributions, and situates ICE-ID against classical ER benchmarks (Abt--Buy, Amazon--Google, DBLP--ACM, DBLP--Scholar, Walmart--Amazon, iTunes--Amazon, Beer, Fodors--Zagats). We also define a deployment-faithful temporal OOD protocol and release the dataset, splits, reg...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

[D] ICML 2026 Average Score

Hi all, I’m curious about the current review dynamics for ICML 2026, especially after the rebuttal phase. For those who are reviewers (or...

Reddit - Machine Learning · 1 min ·
Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
[2512.24420] Virasoro Symmetry in Neural Network Field Theories
Machine Learning

[2512.24420] Virasoro Symmetry in Neural Network Field Theories

Abstract page for arXiv paper 2512.24420: Virasoro Symmetry in Neural Network Field Theories

arXiv - Machine Learning · 3 min ·
More in Data Science: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime