Aya 23, with 8 billion and 35 billion parameters, supports 23 languages and offers advanced text and code generation capabilities.
Summary: Cohere’s new open-source large model, Aya 23, supports 23 languages and offers advanced text, code generation, and summarization capabilities.
(AIM)—Renowned open-source large model provider, Cohere, has released its latest generation model, Aya 23. This advanced model comes in two versions, with 8 billion and 35 billion parameters respectively, and supports 23 languages including Arabic, Chinese (Simplified and Traditional), Czech, Dutch, English, French, German, Greek, Hebrew, among others. Aya 23 is capable of generating text, code, summaries, and more. Notably, Cohere has made Aya 23’s weights fully open under the CC-BY-NC and C4AI policies, allowing for commercial use.
Key Features and Capabilities
Aya 23 is designed to cater to a wide range of applications with its comprehensive language support and robust performance metrics:
- Multilingual Support: Aya 23 supports 23 languages, making it highly versatile for global applications.
- Model Parameters: Available in two versions with 8 billion and 35 billion parameters.
- Open Weights: The model’s weights are openly accessible, promoting transparency and collaboration. Access them on Hugging Face:
Pre-training and Architecture
Aya 23 builds on the Cohere Command series models, pre-trained using a diverse mixture of texts in 23 languages. The 35B model is a further fine-tuned version of the Cohere Command R model. Key technical features include:
- Transformer Architecture: Utilizes a standard decoder-only Transformer architecture.
- Parallel Attention and FFN Layers: Enhances computational efficiency.
- SwiGLU Activation and Bias-free Design: Improves model performance.
- RoPE (Rotary Positional Embeddings): Enhances positional encoding.
- BPE Tokenizer and Grouped Query Attention (GQA): Optimizes tokenization and attention mechanisms.
Performance and Evaluation
Aya 23 has demonstrated outstanding performance across a variety of tasks:
- Discriminative Tasks: Excels in tasks such as XWinograd, XCOPA, and XStoryCloze using zero-shot evaluations.
- Multilingual Evaluation: Outperforms other models in multilingual MMLU evaluations across 14 languages.
- Mathematical Reasoning: Surpasses baseline models in multilingual mathematical reasoning on the MGSM benchmark.
- Generative Tasks: Achieves superior results in machine translation and summarization compared to other models with similar parameters.
Aya 23 marks a significant advancement in multilingual AI capabilities, offering robust performance across a range of discriminative and generative tasks. Its open-source nature and comprehensive language support make it a valuable tool for researchers and developers worldwide.
Follow us on Facebook: AI Insight Media on Facebook
Get updates on Twitter: AI Insight Media on Twitter
Explore AI INSIGHT MEDIA: www.aiinsightmedia.com
Keywords: Cohere, Aya 23, open-source model, multilingual AI, large language model, text generation, code generation, AI performance, AI research.