Curated by THEOUTPOST
On Wed, 21 Aug, 8:01 AM UTC
3 Sources
[1]
Microsoft's Phi-3.5 series unveils triple threat
Microsoft is stepping up its game in the AI world with the new Phi-3.5 series, offering three cutting-edge models designed for different tasks. These models aren't just powerful -- they're also versatile, making it easier for developers to tackle everything from basic coding to complex problem-solving and even visual tasks. Whether you're working with limited resources or need advanced artificial intelligence capabilities, the Phi-3.5 models have something to offer, and here is a quick glimpse of them. Microsoft's latest release, the Phi 3.5 series, introduces three advanced AI models: Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct, and Phi-3.5-vision-instruct. Each model is crafted to address specific needs, from basic reasoning to advanced multimodal tasks. All three Microsoft Phi-3.5 models are available under the MIT license, which allows developers to use, modify, and distribute the models with minimal restrictions. This open-source approach supports widespread adoption and fosters innovation across various applications and research domains. The Microsoft Phi-3.5 Mini Instruct model is designed to perform exceptionally well in environments with limited computational resources. With 3.8 billion parameters, it is tailored for tasks that require strong reasoning capabilities but do not demand extensive computational power. Trained on 3.4 trillion tokens using 512 H100-80G GPUs over 10 days. Key features: Phi-3.5 Mini Instruct's efficient design allows it to deliver robust performance while being mindful of resource constraints. This makes it suitable for deployment in scenarios where computational resources are limited but high performance is still required. The Microsoft Phi-3.5 MoE (Mixture of Experts) model represents a sophisticated approach to AI architecture by combining multiple specialized models into one. It features a unique design where different "experts" are activated depending on the task, optimizing performance across various domains. Trained on 4.9 trillion tokens with 512 H100-80G GPUs over 23 days. Key features: The Phi-3.5 MoE architecture enhances scalability and efficiency by activating only a subset of parameters relevant to a given task. This enables the model to handle a wide range of applications while maintaining high performance across different languages and subjects. The Microsoft Phi-3.5 Vision Instruct model is designed to handle both text and image data, making it a powerful tool for multimodal AI tasks. It integrates advanced image processing with textual understanding, supporting a variety of complex visual and textual analysis tasks. Trained on 500 billion tokens using 256 A100-80G GPUs over 6 days. Key features: Phi-3.5 Vision Instruct's ability to process and integrate both text and images makes it highly versatile for applications requiring detailed visual analysis. This capability is particularly valuable for tasks involving diverse data types and formats. The Phi-3.5 Vision Instruct model is also accessible through Azure AI Studio.
[2]
Microsoft releases powerful new Phi-3.5 models, beating Google, OpenAI and more
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft isn't resting its AI success on the laurels of its partnership with OpenAI. No, far from it. Instead, the company often known as Redmond for its headquarters location in Washington state today came out swinging with the release of 3 new models in its evolving Phi series of language/multimodal AI. The three new Phi 3.5 models include the 3.82 billion parameter Phi-3.5-mini-instruct, the 41.9 billion parameter Phi-3.5-MoE-instruct, and the 4.15 billion parameter Phi-3.5-vision-instruct, each designed for basic/fast reasoning, more powerful reasoning, and vision (image and video analysis) tasks, respectively. All three models are available for developers to download, use, and fine-tune customize on Hugging Face under a Microsoft-branded MIT License that allows for commercial usage and modification without restrictions. Amazingly, all three models also boast near state-of-the-art performance across a number of third-party benchmark tests, even beating other AI providers including Google's Gemini 1.5 Flash, Meta's Llama 3.1, and even OpenAI's GPT-4o in some cases. That performance, combined with the permissive open license, has people praising Microsoft on the social network X: Let's review each of the new models today, briefly, based on their release notes posted to Hugging Face Phi-3.5 Mini Instruct: Optimized for Compute-Constrained Environments The Phi-3.5 Mini Instruct model is a lightweight AI model with 3.8 billion parameters, engineered for instruction adherence and supporting a 128k token context length. This model is ideal for scenarios that demand strong reasoning capabilities in memory- or compute-constrained environments, including tasks like code generation, mathematical problem solving, and logic-based reasoning. Despite its compact size, the Phi-3.5 Mini Instruct model demonstrates competitive performance in multilingual and multi-turn conversational tasks, reflecting significant improvements from its predecessors. It boasts near-state-of-the-art performance on a number of benchmarks and overtakes other similarly-sized models (Llama-3.1-8B-instruct and Mistral-7B-instruct) on the RepoQA benchmark which measures "long context code understanding." Phi-3.5 MoE: Microsoft's 'Mixture of Experts' The Phi-3.5 MoE (Mixture of Experts) model appears to be the first in this model class from the firm, one that combines multiple different model types into one, each specializing in different tasks. This model leverages an architecture with 42 billion active parameters and supports a 128k token context length, providing scalable AI performance for demanding applications. However, it operates nly with 6.6B active parameters, according to the HuggingFace documentation. Designed to excel in various reasoning tasks, Phi-3.5 MoE offers strong performance in code, math, and multilingual language understanding, often outperforming larger models in specific benchmarks, including, again, RepoQA: It also impressively beats GPT-4o mini on the 5-shot MMLU (Massive Multitask Language Understanding) across subjects such as STEM, the humanities, the social sciences, at varying levels of expertise. The MoE model's unique architecture allows it to maintain efficiency while handling complex AI tasks across multiple languages. Phi-3.5 Vision Instruct: Advanced Multimodal Reasoning Completing the trio is the Phi-3.5 Vision Instruct model, which integrates both text and image processing capabilities. This multimodal model is particularly suited for tasks such as general image understanding, optical character recognition, chart and table comprehension, and video summarization. Like the other models in the Phi-3.5 series, Vision Instruct supports a 128k token context length, enabling it to manage complex, multi-frame visual tasks. Microsoft highlights that this model was trained with a combination of synthetic and filtered publicly available datasets, focusing on high-quality, reasoning-dense data. Training the new Phi trio The Phi-3.5 Mini Instruct model was trained on 3.4 trillion tokens using 512 H100-80G GPUs over 10 days, while the Vision Instruct model was trained on 500 billion tokens using 256 A100-80G GPUs over 6 days. The Phi-3.5 MoE model, which features a mixture-of-experts architecture, was trained on 4.9 trillion tokens with 512 H100-80G GPUs over 23 days. Open-source under MIT License All three Phi-3.5 models are available under the MIT license, reflecting Microsoft's commitment to supporting the open-source community. This license allows developers to freely use, modify, merge, publish, distribute, sublicense, or sell copies of the software. The license also includes a disclaimer that the software is provided "as is," without warranties of any kind. Microsoft and other copyright holders are not liable for any claims, damages, or other liabilities that may arise from the software's use. Microsoft's release of the Phi-3.5 series represents a significant step forward in the development of multilingual and multimodal AI. By offering these models under an open-source license, Microsoft empowers developers to integrate cutting-edge AI capabilities into their applications, fostering innovation across both commercial and research domains.
[3]
Microsoft Launches New Phi-3.5 Models, Outperforms Google Gemini 1.5 Flash, Meta's Llama 3.1, and OpenAI's GPT-4o
The Phi-3.5-MoE-instruct, with 41.9 billion parameters, handles more advanced reasoning. The Phi-3.5-vision-instruct, with 4.15 billion parameters, is designed for vision tasks like image and video analysis. Phi-3.5-MoE instruct is a 42-billion-parameter open-source model and demonstrates significant improvements in reasoning capabilities, outperforming larger models such as Llama 3.1 8B and Gemma 2 9B across various benchmarks. Despite its competitive performance, Phi-3.5-MoE falls slightly behind GPT-4o-mini but surpasses Gemini 1.5 Flash in benchmarks. The model supports multilingual applications, although the specific languages covered remain unclear. Phi-3.5-MoE features 16 experts, with two being activated during generation, and has 6.6 billion parameters engaged in each inference. Phi-3.5-MoE supports multilingual capabilities and extends its context length to 128,000 tokens. The model was trained over 23 days using 512 H100-80G GPUs, with a total training dataset of 4.9 trillion tokens. The model's development included supervised fine-tuning, proximal policy optimisation, and direct preference optimisation to ensure precise instruction adherence and robust safety measures. The model is intended for use in memory and compute-constrained environments and latency-sensitive scenarios. Key use cases for Phi-3.5-MoE include general-purpose AI systems, applications requiring strong reasoning in code, mathematics, and logic, and as a foundational component for generative AI-powered features. The model's tokenizer supports a vocabulary size of up to 32,064 tokens, with placeholders for downstream fine-tuning. Microsoft provided a sample code snippet for local inference, demonstrating its application in generating responses to user prompts. With 3.8 billion parameters, this model is lightweight yet powerful, outperforming larger models such as Llama3.1 8B and Mistral 7B. It supports a 128K token context length, significantly more than its main competitors, which typically support only up to 8K. Microsoft's Phi-3.5-mini is positioned as a competitive option in long-context tasks such as document summarisation and information retrieval, outperforming several larger models like Llama-3.1-8B-instruct and Mistral-Nemo-12B-instruct-2407 on various benchmarks. The model is intended for commercial and research use, particularly in memory and compute-constrained environments, latency-bound scenarios, and applications requiring strong reasoning in code, math, and logic. The Phi-3.5-mini model was trained over 10 days using 512 H100-80G GPUs. The training process involved processing 3.4 trillion tokens, leveraging a combination of synthetic data and filtered publicly available websites to enhance the model's reasoning capabilities and overall performance. Phi-3.5 Vision is a 4.2 billion parameter model and it excels in multi-frame image understanding and reasoning. It has shown improved performance in benchmarks like MMMU, MMBench, and TextVQA, demonstrating its capability in visual tasks. It even outperforms OpenAI GPT-4o on several benchmarks. The model integrates an image encoder, connector, projector, and the Phi-3 Mini language model. It supports both text and image inputs and is optimised for prompts using a chat format, with a context length of 128K tokens. The model was trained over 6 days using 256 A100-80G GPUs, processing 500 billion tokens that include both vision and text data. The Phi-3.5 models are now available on the AI platform Hugging Face under an MIT license, making them accessible for a wide range of applications. This release aligns with Microsoft's commitment to providing open-source AI tools that are both efficient and versatile.
Share
Share
Copy Link
Microsoft has released new Phi-3.5 models, including Vision, Instruct, and Mini-MoE variants. These models demonstrate superior performance compared to offerings from Google, Meta, and OpenAI across various benchmarks.
Microsoft has made a significant leap in the artificial intelligence arena with the release of its new Phi-3.5 models. These models, which include Vision, Instruct, and Mini-MoE variants, have demonstrated remarkable capabilities, outperforming competitors such as Google, Meta, and OpenAI in various benchmarks 1.
The Phi-3.5 models have shown exceptional results in multiple evaluations. Notably, the Phi-3.5 Vision model has surpassed Google's Gemini 1.5 Pro Vision in the challenging MME Bench test, achieving an impressive score of 67.7% compared to Gemini's 66.1% 2. This benchmark assesses a model's ability to understand and interpret visual information, highlighting Microsoft's advancements in multimodal AI capabilities.
Microsoft's new lineup includes specialized models for different tasks. The Phi-3.5 Instruct model has demonstrated superior performance in instruction-following tasks, while the Mini-MoE (Mixture of Experts) variant offers a balance of efficiency and capability 1. This diverse range of models allows for more tailored applications across various AI use cases.
The release of the Phi-3.5 models positions Microsoft at the forefront of AI development. These models have not only outperformed Google's offerings but have also shown advantages over Meta's Llama 3 1.3B and OpenAI's GPT-4 in certain benchmarks 3. This achievement underscores Microsoft's commitment to pushing the boundaries of AI technology.
The introduction of these powerful models has significant implications for the AI community. Researchers and developers now have access to more advanced tools for tasks ranging from natural language processing to computer vision. The superior performance of Phi-3.5 models could accelerate progress in various AI-driven fields, including robotics, autonomous systems, and human-computer interaction 2.
Microsoft has made these models available to the public, fostering collaboration and innovation within the AI community. This open approach allows researchers and developers to build upon and refine these models, potentially leading to even more advanced AI systems in the future 3.
As the AI landscape continues to evolve rapidly, Microsoft's Phi-3.5 models represent a significant milestone in the ongoing race for more capable and efficient artificial intelligence systems. The tech industry and research community will be watching closely to see how these models are applied and what new innovations they might inspire.
Reference
[1]
Microsoft has released a new series of Phi-3.5 AI models, showcasing impressive performance despite their smaller size. These models are set to compete with offerings from OpenAI and Google, potentially reshaping the AI landscape.
4 Sources
Microsoft introduces a suite of specialized AI models tailored for various industries, aiming to enhance operational efficiency and innovation across sectors like agriculture, manufacturing, and finance.
3 Sources
OpenAI has introduced GPT-4o Mini, a more affordable version of its top AI model. This new offering aims to make advanced AI technology more accessible to developers and businesses while potentially reshaping the competitive landscape in the AI industry.
5 Sources
Meta has released Llama 3, an open-source AI model that can run on smartphones. This new version includes vision capabilities and is freely accessible, marking a significant step in AI democratization.
3 Sources
NVIDIA quietly released a new open-source AI model, Llama-3.1-Nemotron-70B-Instruct, which has reportedly outperformed leading models from OpenAI and Anthropic in benchmark tests, signaling a shift in NVIDIA's AI strategy.
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved