You can now run prompts against images, audio and video in your terminal using LLM
SMRTR summary
LLM 0.17, a tool for interacting with large language models, now supports multi-modal inputs including images, audio, and video. Users can process various media types with models like GPT-4o and Gemini, enabling tasks such as image description, audio transcription, and video analysis at low costs (e.g., less than $0.01 for a 7-minute audio transcription).
SMRTR provides this summary for quick context. The original article belongs to Daily.dev.
Read the original article