Salesforce Tackles 'Jagged Intelligence' with New AI Benchmarks for Enterprise Reliability

2 Sources

Salesforce introduces new AI benchmarks and models to address the inconsistency in AI performance for enterprise applications, aiming to develop more reliable and capable AI agents for business environments.

News article

Salesforce Addresses 'Jagged Intelligence' in AI

Salesforce has launched a series of innovative AI benchmarks and models aimed at tackling the challenge of 'jagged intelligence' in artificial intelligence systems. This phenomenon refers to the discrepancy between an AI model's raw intelligence and its ability to perform consistently in real-world, unpredictable enterprise environments 12.

New Benchmarks for AI Reliability

To address this issue, Salesforce has introduced several new benchmarks:

  1. SIMPLE Benchmark: A public dataset featuring 225 straightforward reasoning questions that are easy for humans but challenging for AI. This benchmark aims to quantify the 'jaggedness' of AI models and improve their real-world performance 1.

  2. ContextualJudgeBench: This benchmark evaluates AI-enabled judges rather than the models themselves, focusing on the reliability of AI systems that assess other models 1.

  3. CRMArena: A framework designed to evaluate how AI agents perform in customer relationship management (CRM) tasks, such as summarizing sales emails and making commerce recommendations 12.

Enterprise General Intelligence (EGI)

Salesforce is pushing towards what they call "Enterprise General Intelligence" (EGI), which focuses on developing AI specifically for business complexity. This approach aims to create purpose-built AI agents optimized for both capability and consistency in business environments 2.

New AI Models and Embeddings

Salesforce has also introduced new AI models and embeddings to enhance enterprise AI capabilities:

  1. SFR-Embedding: A new model for deeper contextual understanding, leading the Massive Text Embedding Benchmark (MTEB) across 56 datasets 2.

  2. SFR-Embedding-Code: A specialized version for developers, enabling high-quality code search and streamlining development 2.

  3. xLAM V2 (Large Action Model): A family of models designed to predict actions rather than just generate text, starting at just 1 billion parameters 2.

Implications for Business AI Applications

These developments have significant implications for businesses looking to implement AI:

  1. Improved Consistency: By addressing 'jagged intelligence', Salesforce aims to create AI systems that perform more reliably in unpredictable business environments 12.

  2. Enhanced Trust: Better benchmarks and more consistent performance could lead to higher trust from business leaders in implementing AI systems 1.

  3. Tailored Solutions: The focus on EGI and CRM-specific benchmarks suggests a move towards AI solutions tailored for specific business needs 12.

  4. Efficient Models: Smaller, action-focused AI models like xLAM V2 may outperform larger language models for specific business tasks, offering more efficient solutions 2.

As AI continues to evolve, Salesforce's research lays the groundwork for more reliable, efficient, and business-focused AI agents. This could potentially revolutionize how enterprises leverage AI technology in their operations, leading to significant productivity gains and improved decision-making processes.

Explore today's top stories

NVIDIA Unveils Major GeForce NOW Upgrade with RTX 5080 Performance and Expanded Game Library

NVIDIA announces significant upgrades to its GeForce NOW cloud gaming service, including RTX 5080-class performance, improved streaming quality, and an expanded game library, set to launch in September 2025.

CNET logoengadget logoPCWorld logo

9 Sources

Technology

6 hrs ago

NVIDIA Unveils Major GeForce NOW Upgrade with RTX 5080

Space: The New Frontier of 21st Century Warfare

As nations compete for dominance in space, the risk of satellite hijacking and space-based weapons escalates, transforming outer space into a potential battlefield with far-reaching consequences for global security and economy.

AP NEWS logoTech Xplore logoeuronews logo

7 Sources

Technology

22 hrs ago

Space: The New Frontier of 21st Century Warfare

OpenAI Tweaks GPT-5 to Be 'Warmer and Friendlier' Amid User Backlash

OpenAI updates GPT-5 to make it more approachable following user feedback, sparking debate about AI personality and user preferences.

ZDNet logoTom's Guide logoFuturism logo

6 Sources

Technology

14 hrs ago

OpenAI Tweaks GPT-5 to Be 'Warmer and Friendlier' Amid User

Russian Disinformation Campaign Exploits AI to Spread Fake News

A pro-Russian propaganda group, Storm-1679, is using AI-generated content and impersonating legitimate news outlets to spread disinformation, raising concerns about the growing threat of AI-powered fake news.

Rolling Stone logoBenzinga logo

2 Sources

Technology

22 hrs ago

Russian Disinformation Campaign Exploits AI to Spread Fake

AI in Healthcare: Patients Trust AI Medical Advice Over Doctors, Raising Concerns and Challenges

A study reveals patients' increasing reliance on AI for medical advice, often trusting it over doctors. This trend is reshaping doctor-patient dynamics and raising concerns about AI's limitations in healthcare.

ZDNet logoMedscape logoEconomic Times logo

3 Sources

Health

14 hrs ago

AI in Healthcare: Patients Trust AI Medical Advice Over
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo