[2604.00339] When Career Data Runs Out: Structured Feature Engineering

[2604.00339] When Career Data Runs Out: Structured Feature Engineering and Signal Limits for Founder Success Prediction

arXiv - Machine Learning April 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.00339: When Career Data Runs Out: Structured Feature Engineering and Signal Limits for Founder Success Prediction

Computer Science > Machine Learning arXiv:2604.00339 (cs) [Submitted on 1 Apr 2026] Title:When Career Data Runs Out: Structured Feature Engineering and Signal Limits for Founder Success Prediction Authors:Yagiz Ihlamur View a PDF of the paper titled When Career Data Runs Out: Structured Feature Engineering and Signal Limits for Founder Success Prediction, by Yagiz Ihlamur View PDF HTML (experimental) Abstract:Predicting startup success from founder career data is hard. The signal is weak, the labels are rare (9%), and most founders who succeed look almost identical to those who fail. We engineer 28 structured features directly from raw JSON fields -- jobs, education, exits -- and combine them with a deterministic rule layer and XGBoost boosted stumps. Our model achieves Val F0.5 = 0.3030, Precision = 0.3333, Recall = 0.2222 -- a +17.7pp improvement over the zero-shot LLM baseline. We then run a controlled experiment: extract 9 features from the prose field using Claude Haiku, at 67% and 100% dataset coverage. LLM features capture 26.4% of model importance but add zero CV signal (delta = -0.05pp). The reason is structural: anonymised_prose is generated from the same JSON fields we parse directly -- it is a lossy re-encoding, not a richer source. The ceiling (CV ~= 0.25, Val ~= 0.30) reflects the information content of this dataset, not a modeling limitation. In characterizing where the signal runs out and why, this work functions as a benchmark diagnostic -- one that points...

Originally published on April 02, 2026. Curated by AI News.

Llms

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The AI lab's Project Glasswing will bring together Apple, Google, and more than 45 other organizations. They'll use the new Claude Mythos...

Wired - AI · 7 min · about 2 hours ago

Machine Learning

[for hire] Open for contracts – Veteran Data Scientist (AI / ML / OR) focused on delivering real‑world solutions.

Hi Reddit, I've spent 20 years working with data, and I've learned how to crack problems that AI systems struggle with. I've got a knack ...

Reddit - ML Jobs · 1 min · about 2 hours ago

Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Machine Learning

[D] ICML final justification

Do we get notified if any reviewer put their final justification into their original review comment? submitted by /u/tuejan11 [link] [com...

Reddit - Machine Learning · 1 min · about 3 hours ago

[2604.00339] When Career Data Runs Out: Structured Feature Engineering and Signal Limits for Founder Success Prediction

About this article

Related Articles

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

[for hire] Open for contracts – Veteran Data Scientist (AI / ML / OR) focused on delivering real‑world solutions.

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

[D] ICML final justification

No comments

Stay updated with AI News