[P] Using YouTube as a data source (lessons from building a coffee domain dataset)

Reddit - Machine Learning 1 min read

About this article

I started working on a small coffee coaching app recently - something that could answer questions around brew methods, grind size, extraction, etc. I was looking for good data and realized most written sources are either shallow or scattered. YouTube, on the other hand, has insanely high-quality content (James Hoffmann, Lance Hedrick, etc.), but it’s not usable out of the box for RAG. Transcripts are messy, chunking is inconsistent, getting everything into a usable format took way more effort...

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Originally published on March 30, 2026. Curated by AI News.

Related Articles

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?
Llms

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

Abstract page for arXiv paper 2601.13227: Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

arXiv - AI · 3 min ·
[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations
Llms

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

Abstract page for arXiv paper 2601.22440: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Value...

arXiv - AI · 4 min ·
[2601.13222] Incorporating Q&A Nuggets into Retrieval-Augmented Generation
Nlp

[2601.13222] Incorporating Q&A Nuggets into Retrieval-Augmented Generation

Abstract page for arXiv paper 2601.13222: Incorporating Q&A Nuggets into Retrieval-Augmented Generation

arXiv - AI · 3 min ·
[2512.01707] StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos
Llms

[2512.01707] StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

Abstract page for arXiv paper 2512.01707: StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

arXiv - AI · 4 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime