TimeCapsuleLLM: LLM trained only on data from 1800-1875
SMRTR summary
TimeCapsuleLLM is a language model trained exclusively on historical texts from 1800-1875 London to eliminate modern bias and authentically replicate the vocabulary and worldview of that era. The model evolved from producing incoherent Victorian-style sentences in early iterations to successfully connecting real historical events with actual figures by version 1. Version 2 uses a 90GB dataset containing over 136,000 historical documents, demonstrating that training from scratch on period-specific data creates more authentic historical AI than fine-tuning modern models.
SMRTR provides this summary for quick context. The original article belongs to Hacker News.
Read the original article