Instella: New Open 3B Language Models
SMRTR summary
AMD has introduced Instella, a family of open-source 3-billion-parameter language models trained on AMD Instinct MI300X GPUs. These models surpass existing open models of comparable size and rival leading open-weight models like Llama-3.2-3B and Qwen-2.5-3B. Trained on up to 4.15 trillion tokens using advanced techniques, Instella's release includes pre-trained and instruction-tuned versions, along with all model weights, configurations, datasets, and code to promote AI research collaboration.
SMRTR provides this summary for quick context. The original article belongs to Hacker News.
Read the original article