Pixtral 12B 24.09

Free

Open Source

LLM

horizontal

30 views

Multimodal AI for image-text tasks with variable image support and 128K context

https://mistral.ai/news/pixtral-12b/

Published 2025/03/21

AgentHunter

Featured AI Agent

Visit Website

Agent Details

Pixtral-12B-2409 is a 12-billion-parameter multimodal model by Mistral AI, combining a 12B-parameter text decoder with a 400M-parameter vision encoder. It processes interleaved text and images natively, supporting variable image sizes and a 128K-token context window for long-form document analysis or multi-image workflows. The model excels in tasks like chart understanding, OCR, and multilingual reasoning, outperforming similar-sized open models (e.g., Qwen2-VL 7B, LLaVA-OV 7B) and even larger models like Llama-3.2 90B in benchmarks like MMMU (52.5%) and MathVista (58.0%)

Key Features

128K Context Window: Handles long documents or multi-image inputs.
Variable Image Support: Processes images at native resolution and aspect ratio via a vision encoder.
Multilingual & Code Capabilities: Supports 80+ coding languages and nuanced multilingual understanding.
Open Source: Apache 2.0 license for free modification and deployment.
High Accuracy: Outperforms Claude 3 Haiku and Gemini-1.5 Flash 8B in multimodal benchmarks.
Vision-to-Code: Generates HTML/CSS from sketches or diagrams

Use Cases

Image Captioning & OCR: Generate descriptions or extract text from images/documents.
Data Analysis: Convert charts to Markdown tables or interactive dashboards.
Document QA: Answer questions from technical manuals or financial reports.
Academic Research: Summarize papers or analyze scientific diagrams.
Automation: Integrate with workflows for invoice processing or customer support

Video

Featured AI Agents

xAIcreator

AI-powered Twitter marketing tool for tracking trends, rewriting viral content, and optimizing posting schedules.

Freemium

1

PoseUp.ai

PoseUp.ai is an AI-powered photo enhancement tool that transforms ordinary photos into professional-quality images.

Freemium

0

KOLFind

KOLFind is an AI-driven platform that helps brands discover and connect with nano and micro influencers across TikTok, Instagram, and YouTube to drive effective influencer marketing campaigns.

Freemium

10