[2602.24262] Coverage-Aware Web Crawling for Domain-Specific Supplier

[2602.24262] Coverage-Aware Web Crawling for Domain-Specific Supplier Discovery via a Web--Knowledge--Web Pipeline

arXiv - Machine Learning March 02, 2026 3 min read

About this article

Abstract page for arXiv paper 2602.24262: Coverage-Aware Web Crawling for Domain-Specific Supplier Discovery via a Web--Knowledge--Web Pipeline

Computer Science > Machine Learning arXiv:2602.24262 (cs) [Submitted on 27 Feb 2026] Title:Coverage-Aware Web Crawling for Domain-Specific Supplier Discovery via a Web--Knowledge--Web Pipeline Authors:Yijiashun Qi, Yijiazhen Qi, Tanmay Wagh View a PDF of the paper titled Coverage-Aware Web Crawling for Domain-Specific Supplier Discovery via a Web--Knowledge--Web Pipeline, by Yijiashun Qi and 2 other authors View PDF HTML (experimental) Abstract:Identifying the full landscape of small and medium-sized enterprises (SMEs) in specialized industry sectors is critical for supply-chain resilience, yet existing business databases suffer from substantial coverage gaps -- particularly for sub-tier suppliers and firms in emerging niche markets. We propose a \textbf{Web--Knowledge--Web (W$\to$K$\to$W)} pipeline that iteratively (1)~crawls domain-specific web sources to discover candidate supplier entities, (2)~extracts and consolidates structured knowledge into a heterogeneous knowledge graph, and (3)~uses the knowledge graph's topology and coverage signals to guide subsequent crawling toward under-represented regions of the supplier space. To quantify discovery completeness, we introduce a \textbf{coverage estimation framework} inspired by ecological species-richness estimators (Chao1, ACE) adapted for web-entity populations. Experiments on the semiconductor equipment manufacturing sector (NAICS 333242) demonstrate that the W$\to$K$\to$W pipeline achieves the highest precision (0.138...

Originally published on March 02, 2026. Curated by AI News.

Nlp

[D] Simple Questions Thread

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instea...

Reddit - Machine Learning · 1 min · about 6 hours ago

Llms

Which LLM is the best for writing a scientific paper?

I'll need to write a scientifc research paper for university. We're allowed and encouraged to use AI for our work. Be it for language or ...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Llms

The Claude Code leak accidentally published the first complete blueprint for production AI agents. Here's what it tells us about where this is all going.

Most coverage of the Claude Code leak focuses on the drama or the hidden features. But the bigger story is that this is the first time we...

Reddit - Artificial Intelligence · 1 min · about 8 hours ago

Llms

[For Hire] Junior AI/ML Engineer | RAG · LLMs · FastAPI · Vector DBs | Remote

Posting this for a friend who isn't on Reddit. A recent graduate, entry level, no commercial production experience but spent the past yea...

Reddit - ML Jobs · 1 min · about 10 hours ago

[2602.24262] Coverage-Aware Web Crawling for Domain-Specific Supplier Discovery via a Web--Knowledge--Web Pipeline

About this article

Related Articles

[D] Simple Questions Thread

Which LLM is the best for writing a scientific paper?

The Claude Code leak accidentally published the first complete blueprint for production AI agents. Here's what it tells us about where this is all going.

[For Hire] Junior AI/ML Engineer | RAG · LLMs · FastAPI · Vector DBs | Remote

No comments

Stay updated with AI News