Curated by THEOUTPOST
On Thu, 12 Sept, 12:06 AM UTC
7 Sources
[1]
Mistral unveils Pixtral 12B: A multimodal AI model
On 11th September, Mistral AI announced its latest advanced AI model capable of processing both images and text. Pixtral 12B, a one of a kind model, employs about 12 billion parameters and is capable of vision encoding, enabling it to interpret images alongside text. It is built on their previous model, Mistral's Nemo 12B, capable of understanding text, with the addition of features like the 400 million-parameter vision adapter. In a post on X, Sophia Yang, Head of Developer Relations at MistralAI, shared that the model can be downloaded via a torrent link on GitHub and Hugging Face, and used under an Apache 2.0 license without restrictions. It will be available on le Chat and la Plateforme soon. What is Mistral AI? Based in France, and built by former employees of Meta and Google in 2023, the founders - Arthur Mensch, Guillaume Lample, Timothée Lacroix - aim to make GenAI more fun and accessible. Last year, they closed a seed-stage financing round of over €105 million led by Lightspeed, a US based VC firm. Earlier this year, they held a hackathon in Paris, and provided GPUs for their first hackathon in Europe. Mistral AI follows an open-source approach, releasing all models under open licenses for free use and modification. They focus on creating efficient, accessible models which are trained on diverse datasets -- text, code, and images -- making them more versatile than those trained on a single type. While still a year old, it competes with the likes of Anthropic PBC's Claude family, OpenAI's GPT-4o and Google LLC's Gemini, among others.
[2]
Mistral unveils Pixtral 12B, a multimodal AI model that can process both text and images - SiliconANGLE
Mistral unveils Pixtral 12B, a multimodal AI model that can process both text and images Mistral AI, a Paris-based artificial intelligence startup, today unveiled its latest advanced AI model capable of processing both images and text. The new model, called Pixtral 12B, employs around 12 billion parameters and is the first of its models capable of vision encoding, making it possible for it to "see" images alongside text. The new model is based on Mistral's Nemo 12B, an AI model previously released by the company capable of understanding text, with the addition of a 400-million-parameter vision adapter. The adapter allows users to add images through URLs or encode them via base64 within the inputted text. Many other AI large language models have also added multimodal capabilities that allow users to input images such as Anthropic PBC's Claude family, OpenAI's GPT-4o and Google LLC's Gemini. The addition of image reasoning capabilities to Pixtral 12B should provide it the ability similarly to answer questions about images, provide captioning, count objects and more. The company released the parameters and code via a torrent link on GitHub and the AI distribution platform Hugging Face. The company has encouraged developers to start downloading and using it. Now that the model is available for download, developers will be able to fine-tune and train the model for their own purposes. The company offers some of its models open-source under the Apache 2.0 license without restrictions. For others, Mistral offers a dev license that is free for development, but requires a paid license for commercial applications, but not for research uses. The company has not clarified what license Pixtral 12B will fall under. Sophia Yang, head of Mistral developer relations, said in a post on X, that the model will soon be available for testing on Mistral's chatbot and application programming interface platforms, Le Chat and Le Platforme, soon.
[3]
Mistral releases Pixtral, its first multimodal model | TechCrunch
French AI startup Mistral has released its first model that can process images as well as text. Called Pixtral 12B, the 12-billion-parameter model is roughly 24GB size. (Parameters roughly correspond to a model's problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.) Available on GitHub as well as the AI and machine learning development platform Hugging Face, the model can be downloaded, fine-tuned and used under Mistral's standard license, which requires a paid license for any commercial applications but not research and academic ones. Built on Mistral's text model Nemo 12B, Pixtral 12B can answer questions about an arbitrary number of images of an arbitrary size given either image URLs or images encoded using the binary-to-text encoding scheme base64. Like other multimodal models (e.g. Anthropic's Claude family, GPT-4o and so on), Pixtral 12B should -- at least in theory -- be able to perform tasks like captioning images and counting the number of objects in photo. This writer wasn't able to take Pixtral 12B for a spin, unfortunately -- there weren't any working web demos as of publication time. In a post on X, Sophia Yang, head of Mistral developer relations, said that Pixtral 12B will be available for testing on Mistral's chatbot and API-serving platforms, Le Chat and Le Platforme, "soon." Unclear is which image data Mistral might've used to develop Pixtral 12B. Most generative AI models, including Mistral's other models, are trained on vast quantities of public -- and often copyrighted -- data from around the web. Some model vendors argue that fair use entitles them to scape any public data. Many copyright holders disagree -- and have filed lawsuits against the larger vendors, including OpenAI and Midjourney, to attempt to put a stop to the practice. The release of Pixtral 12B comes shortly after Mistral closed a $645 million funding round led by General Catalyst that valued the company at $6 billion. Just over a year old, Mistral is seen by many in the AI community as Europe's answer to OpenAI; its strategy thus far has involved releasing free "open" models, charging for managed versions of those models and providing consulting services to corporate customers.
[4]
Mistral Releases Text and Image Model Pixtral 12B
Mistral released Pixtral 12B, the French startup's first artificial intelligence model capable of taking in images as well as text, on Wednesday. The release follows similar multimodal models from rivals Anthropic, Google and OpenAI. Meta is also working on vision capabilities for its open-source Llama models. It is not yet known how Pixtral compares to these other models on vision and
[5]
French startup Mistral unveils Pixtral 12B, its first multimodal AI model
French AI startup Mistral has dropped its first multimodal model, Pixtral 12B, capable of processing both images and text. The 12-billion-parameter model, built on Mistral's existing text-based model Nemo 12B, is designed for tasks like captioning images, identifying objects, and answering image-related queries. Weighing in at 24GB, the model is available for free under the Apache 2.0 license, meaning anyone can use, modify, or commercialize it without restrictions. Developers can download it from GitHub and Hugging Face, but functional web demos aren't live yet. According to Mistral's head of developer relations, Pixtral 12B will soon be integrated into the company's chatbot, Le Chat, and API platform, La Platforme. Multimodal models like Pixtral 12B could be the next frontier for generative AI, following in the footsteps of tools like OpenAI's GPT-4 and Anthropic's Claude. However, questions loom over the data sources used to train these models. As noted by Tech Crunch, Mistral, like many AI firms, likely trained Pixtral 12B using vast quantities of publicly available web data -- a practice that's sparked lawsuits from copyright holders challenging the "fair use" argument often made by tech companies. The release follows Mistral raising $645 million in funding, pushing its valuation to $6 billion. With Microsoft among its backers, Mistral is positioning itself as Europe's response to OpenAI.
[6]
Mistral's New AI Model Can Understand Images And Run Locally
Mistral AI, the company behind the open source Mistral, Mathstral, and Codestral language models, just introduced its first multimodal AI model. The new Pixtral 12B can process links and images, alongside text. Sophia Yang, the head of developer relations at Mistral AI, first announced the new model on Twitter. The GitHub repo and the Mistral AI database on Hugging Face have already been updated with the new Pixtral model. The Pixtral 12B draws from Nemo 12B (another free language model from Mistral), but builds on it with added image processing capabilities. The "12B" in the name refers to the 12 billion parameters of this model. For comparison, ChatGPT 4 has more than a trillion parameters, so Pixtral is a relatively small model. And while Pixtral technically is multimodal, it's not quite on par with ChatGPT or Anthropic's Claude, which also understand voice prompts and documents. You can chat about images with Pixtral and get useful answers like captions or identify what's in an image. You can feed it single or multiple image files or image URLs with prompts like "what's this plant?" or "create a caption for this image." Right now, you can download the Pixtral model for free via a torrent magnet link. It's a 24GB file that you can run locally on supported hardware. Mistral AI provides it under an Apache 2.0 license, which means it's free for personal and commercial purposes. And developers can modify it in any way. Mistral didn't disclose details of the training dataset for this model. Mistral AI has plans to offer Pixtral 12B as an official API in the "Le Platforme" stack, and it'll also appear in the "Le Chat" chatbot soon, presumably as a free demo with a button for uploading images. Mistral charges a fee to access the APIs, so you'll likely need a subscription to get the Pixtral API keys. Source: Twitter
[7]
Pixtral 12B is here: Mistral releases its first-ever multimodal AI model
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Mistral AI is finally venturing into the multimodal arena. Today, the French AI startup taking on the likes of OpenAI and Anthropic released Pixtral 12B, its first ever multimodal model with both language and vision processing capabilities baked in. While the model is not available on the public web at present, its source code can be downloaded from Hugging Face or GitHub to test on individual instances. The startup, once again, bucked the typical release trend for AI models by first dropping a torrent link to download the files for the new model. However, Sophia Yang, the head of developer relations at the company, did note in an X post that the company will soon make the model available through its web chatbot, allowing potential developers to take it for a spin. It will also come on Mistral's La Platforme, which provides API endpoints to use the company's models. What does Pixtral 12B bring to the table? While the official details of the new model, including the data it was trained upon, remain under wraps, the core idea appears that Pixtral 12B will allow users to analyze images while combining text prompts with them. So, ideally, one would be able to upload an image or provide a link to one and ask questions about the subjects in the file. The move is a first for Mistral, but it is important to note that multiple other models, including those from competitors like OpenAI and Anthropic, already have image-processing capabilities. When an X user asked Yang what makes the Pixtral 12-billion parameter model unique, she said it will natively support an arbitrary number of images of arbitrary sizes. As shared by initial testers on X, the 24GB model's architecture appears to have 40 layers, 14,336 hidden dimension sizes and 32 attention heads for extensive computational processing. On the vision front, it has a dedicated vision encoder with 1024×1024 image resolution support and 24 hidden layers for advanced image processing. This, however, can change when the company makes it available via API. Mistral is going all in to take on leading AI labs With the launch of Pixtral 12B, Mistral will further democratize access to visual applications such as content and data analysis. Yes, the exact performance of the open model remains to be seen, but the work certainly builds on the aggressive approach the company has been taking in the AI domain. Since its launch last year, Mistral has not only built a strong pipeline of models taking on leading AI labs like OpenAI but also partnered with industry giants such as Microsoft, AWS and Snowflake to expand the reach of its technology. Just a few months ago, it raised $640 million at a valuation of $6B and followed it up with the launch of Mistral Large 2, a GPT-4 class model with advanced multilingual capabilities and improved performance across reasoning, code generation and mathematics.
Share
Share
Copy Link
Mistral AI, a prominent player in the AI industry, has introduced Pixtral-12B, a cutting-edge multimodal AI model capable of processing both text and images. This release marks a significant advancement in AI technology and positions Mistral as a strong competitor in the field.
Mistral AI, a rising star in the artificial intelligence landscape, has unveiled its latest innovation: Pixtral-12B. This groundbreaking multimodal AI model represents a significant milestone for the company, as it can process both text and images with remarkable efficiency 1.
Pixtral-12B is built on a 12-billion parameter architecture, positioning it as a formidable player in the AI arena. The model demonstrates impressive capabilities in understanding and generating content based on both textual and visual inputs 2. This advancement allows for more nuanced and context-aware interactions, potentially revolutionizing various applications across industries.
In a move that aligns with Mistral's commitment to open innovation, Pixtral-12B has been released under the Apache 2.0 license. This decision makes the model freely available for both research and commercial use, fostering a collaborative environment for further development and implementation 3.
The release of Pixtral-12B positions Mistral AI as a serious contender in the multimodal AI space, challenging established players like OpenAI and Anthropic. This move is particularly significant given the growing demand for AI models that can seamlessly integrate different types of data 4.
Pixtral-12B's ability to process both text and images opens up a wide range of potential applications. From enhancing content creation and analysis to improving visual search capabilities, the model's versatility makes it a valuable tool across various sectors 5.
As with any advanced AI technology, the release of Pixtral-12B raises important questions about ethical use and potential limitations. Mistral AI has emphasized the need for responsible development and deployment of such powerful models, acknowledging the ongoing challenges in ensuring fairness and mitigating biases in AI systems.
The AI community has responded with enthusiasm to Pixtral-12B's release. Experts highlight the model's potential to accelerate innovation in fields such as computer vision, natural language processing, and human-computer interaction. However, some caution that thorough testing and evaluation will be crucial to fully understand the model's capabilities and limitations.
Reference
[1]
[2]
[4]
Mistral AI introduces two new AI models, Ministral 3B and 8B, designed for on-device and edge computing. These models offer high performance in a compact size, challenging larger cloud-based AI systems.
6 Sources
Mistral AI, a French startup, has released significant updates to its Le Chat platform, introducing new AI models and features that rival those of ChatGPT and other leading AI chatbots.
6 Sources
Mistral AI, a French startup, has released Large 2, an open-source AI model that rivals offerings from tech giants like OpenAI, Meta, and Anthropic. The model demonstrates exceptional performance in coding and mathematics tasks, potentially reshaping the AI landscape.
6 Sources
Mistral AI and NVIDIA have jointly announced Mistral NeMo 12B, a new language model designed for enterprise use. This collaboration marks a significant advancement in AI technology, offering improved performance and accessibility for businesses.
4 Sources
Google Cloud announces the integration of Mistral AI's Codestral model into its Vertex AI platform, expanding its AI offerings and potentially challenging OpenAI's dominance in the field.
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved