[P] I trained an XGBoost model with DuckLake and ADBC

Reddit - Machine Learning 1 min read Article

Summary

The article discusses training an XGBoost model using Apache ADBC and DuckLake, highlighting efficient data handling and model training without scikit-learn.

Why It Matters

This content is relevant for data scientists and machine learning practitioners as it explores innovative methods for handling large datasets and model training in a memory-efficient manner. The use of Apache ADBC and DuckLake represents a growing trend in leveraging lakehouse architectures for advanced data processing.

Key Takeaways

  • XGBoost can efficiently utilize Arrow tables for model training.
  • Apache ADBC allows for streaming data larger than memory.
  • Custom functions can replace scikit-learn for data splitting.
  • DuckLake architecture enhances data handling capabilities.
  • Memory overhead is minimized when using Arrow tables.

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Related Articles

Meta AI app climbs to No. 5 on the App Store after Muse Spark launch | TechCrunch
Machine Learning

Meta AI app climbs to No. 5 on the App Store after Muse Spark launch | TechCrunch

The app was ranking No. 57 on the App Store just before Meta AI's new model launched. Now it's No. 5 — and rising.

TechCrunch - AI · 4 min ·
Machine Learning

Detecting mirrored selfie images: OCR the best way? [D]

I'm trying to catch backwards "selfie" images before passing them to our VLM text reader and/or face embedding extraction. Since models l...

Reddit - Machine Learning · 1 min ·
Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Cold start latency on GPU cloud platforms in 2026 — p99 specifically, not p50. Anyone have real data? [D]

doing infrastructure evaluation for inference workloads and running into the same problem everywhere: every platform publishes p50 cold s...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime