[D] What surprised us while collecting training data from the public web been pulling training data from public web
About this article
been pulling training data from public web sources for a bit now. needed it to scale, not return complete garbage, and not immediately blow the budget. tested three things bright data first. like yeah its genuinely good, infra is kind of insane honestly. but its also very clearly built for companies bigger than us. setup wasnt straightforward either if all you actually want is clean text from a url, felt like a lot of stuff between you and that firecrawl.dev is nicer to get started with. api ...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket