[2603.13566] EmDT: Embedding Diffusion Transformer for Tabular Data

[2603.13566] EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection

arXiv - Machine Learning May 01, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.13566: EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection

Statistics > Machine Learning arXiv:2603.13566 (stat) [Submitted on 13 Mar 2026 (v1), last revised 29 Apr 2026 (this version, v2)] Title:EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection Authors:En-Ya Kuo, Sebastien Motsch View a PDF of the paper titled EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection, by En-Ya Kuo and Sebastien Motsch View PDF HTML (experimental) Abstract:Imbalanced datasets pose a difficulty in fraud detection, as classifiers are often biased toward the majority class and perform poorly on rare fraudulent transactions. Synthetic data generation is therefore commonly used to mitigate this problem. In this work, we propose the Clustered Embedding Diffusion-Transformer (EmDT), a diffusion model designed to generate fraudulent samples. Our key innovation is to leverage UMAP clustering to identify distinct fraudulent patterns, and train a Transformer denoising network with sinusoidal positional embeddings to capture feature relationships throughout the diffusion process. Once the synthetic data has been generated, we employ a standard decision-tree-based classifier (e.g., XGBoost) for classification, as this type of model remains better suited to tabular datasets. Experiments on a credit card fraud detection dataset demonstrate that EmDT significantly improves downstream classification performance compared to existing oversampling and generative methods, while maintaining comparable privac...

Originally published on May 01, 2026. Curated by AI News.

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · 34 minutes ago

Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min · 34 minutes ago

Machine Learning

Weird ICML decision [D]

Hello, A friend of mine had a paper with borderline scores accepted at ICML. However, the comment made by the meta reviewers feels like t...

Reddit - Machine Learning · 1 min · about 1 hour ago

Machine Learning

[2603.10252] Bayesian Hierarchical Models and the Maximum Entropy Principle

Abstract page for arXiv paper 2603.10252: Bayesian Hierarchical Models and the Maximum Entropy Principle

arXiv - Machine Learning · 3 min · about 3 hours ago

[2603.13566] EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection

About this article

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence

Accelerating science with AI and simulations

Weird ICML decision [D]

[2603.10252] Bayesian Hierarchical Models and the Maximum Entropy Principle

No comments

Stay updated with AI News