SMRTR TechFeb 23, 2026lobste.rs

Crawling a billion web pages in just over 24 hours

SMRTR summary

A developer successfully crawled over one billion web pages in 25.5 hours for just $462 using a cluster of 12 AWS machines, proving that massive web crawling has become dramatically cheaper since similar experiments cost $41,000 in 2012. The project revealed that parsing HTML has become a major bottleneck due to average page sizes growing from 51KB to 242KB, while SSL encryption now consumes 25% of CPU time during crawling operations.

SMRTR provides this summary for quick context. The original article belongs to lobste.rs.

Read the original article
SMRTR Tech

Get the next batch of curated summaries in your inbox.

This archive is built from SMRTR newsletter summaries. Subscribe for hand-picked stories without the extra noise.