[2603.04595] A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments

[2603.04595] A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2603.04595: A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments

Computer Science > Machine Learning arXiv:2603.04595 (cs) [Submitted on 4 Mar 2026] Title:A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments Authors:Mohammed Omer Shakeel Ahmed View a PDF of the paper titled A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments, by Mohammed Omer Shakeel Ahmed View PDF Abstract:Duplicate records pose significant challenges in customer relationship management (CRM)and healthcare, often leading to inaccuracies in analytics, impaired user experiences, and compliance risks. Traditional deduplication methods rely heavily on direct identifiers such as names, emails, or Social Security Numbers (SSNs), making them ineffective under strict privacy regulations like GDPR and HIPAA, where such personally identifiable information (PII) is restricted or masked. In this research, I propose a novel, scalable, multimodal AI framework for detecting duplicates without depending on sensitive information. This system leverages three distinct modalities: semantic embeddings derived from textual fields (names, cities) using pre-trained DistilBERT models, behavioral patterns extracted from user login timestamps, and device metadata encoded through categorical embeddings. These heterogeneous modalities are combined using a late fusion approach and clustered via DBSCAN, an unsupervised density-based algorithm. This proposed model is evaluate...

Originally published on March 06, 2026. Curated by AI News.

Related Articles

Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min ·
Ai Safety

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min ·
Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min ·
More in Ai Safety: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime