How Did Open Food Facts Fix OCR-Extracted Ingredients Using Open-Source LLMs?
SMRTR summary
Open Food Facts has developed an Ingredients Spellcheck feature using a custom-trained Large Language Model to improve ingredient list accuracy from product images. The system corrects OCR-extracted text errors, reducing unrecognized ingredients by 11%. The project involved creating guidelines, a benchmark dataset, and an evaluation algorithm. A fine-tuned Mistral-7B model achieved results comparable to proprietary LLMs. The spellcheck is integrated via batch processing, with corrected data stored for user review, enhancing database accuracy while maintaining community involvement in quality control.
SMRTR provides this summary for quick context. The original article belongs to Daily.dev.
Read the original article