The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On Fri, 10 Jan, 8:04 AM UTC
2 Sources
[1]
Diffbot boosts LLM accuracy by tapping into its vast Knowledge Graph of up-to-date information - SiliconANGLE
Diffbot boosts LLM accuracy by tapping into its vast Knowledge Graph of up-to-date information Knowledge graph startup Diffbot Technologies Corp., which maintains one of the largest online knowledge indexes, is looking to tackle the problem of hallucinations in artificial intelligence chatbots by ensuring the accuracy of their responses. The company has just launched a fine-tuned version of Meta Platforms Inc.'s Llama 3.3, saying that its responses are enhanced using a new technique called graph retrieval-augmented generation. Diffbot's large language model is not like typical AI models, which are trained on vast databases. Instead, it's trained on a small amount of data and taught how to search for information within the company's vast Knowledge Graph, which contains more than 1 trillion interconnected facts and is constantly updated. Diffbot's Knowledge Graph has been crawling the public internet for the last eight years, categorizing web pages into different groups, such as people, companies, articles and products. It extracts the most recent information from these sites using natural language processing and compute vision to keep its database up to date. That database is updated every four to five days with "millions of new data points," and it's what's being used to fuel Diffbot's latest AI model to ensure its responses are grounded in the most up-to-date and accurate information. That's different to most other LLMs, which rely on static information that's encoded into their training data. According to Diffbot, this makes its AI model much more accurate than others. If it's asked about a recent news event, for example, it will search the Knowledge Graph for the most recent updates, extract the most relevant data, and cite the sources of that information to the user. So not only is it more accurate, but also more transparent than other chatbots. Diffbot founder and Chief Executive Mike Tung told VentureBeat that he believes the AI industry will shift towards a standard that will see most general-purpose reasoning bots distilled to around one billion parameters, rather than the multi-billion parameter LLMs being developed today. He argues that it's unsustainable to try and integrate all of the latest knowledge within AI models. Rather, it's better to teach the models to use the tools necessary to search for external knowledge. The startup hopes to finally solve the question of so-called "hallucinations", which occur when AI models cannot find the answer to a user's question and, instead of saying they don't know, fabricate their responses instead. This tendency makes it risky to deploy AI, and Diffbot believes the solution is to ground AI systems in "verifiable facts", rather than trying to cram as much knowledge as possible into them. Tung provided an example of a user wanting to know the latest weather forecast in their area. "Instead of generating an answer based on outdated training data, our model queries a live weather service and provides a response grounded in real-time information," he explained. Diffbot says benchmarks show that its method is far more reliable. It achieved an 81% score on the FreshQA benchmark, which is designed to test AI models on real-time factual knowledge, beating both Gemini and ChatGPT. In addition, it achieved a 70.36% score on MMLU-Pro, which tests AI models for their academic knowledge. The best thing about Diffbot's model is that it's being made open source, so companies will be able to download it and run it on their own machines and fine-tune it for their needs. For instance, they'll be able to customize it to search their own databases, as well as Diffbot's Knowledge Graph. "You can run it locally on your machine," Tung said, adding that this makes it superior from a privacy perspective. "There's no way you can run Google Gemini without sending your data over to Google and shipping it outside of your premises." Diffbot hopes that its LLM will be used by enterprises for workloads that require exceptional accuracy and full accountability, and it has made some inroads there, providing data services to Duck Duck Go Inc., Cisco Systems Inc. and Snap Inc. Its model can be downloaded via GitHub now, and there is a public demo available at diffy.chat. Companies that want to deploy it on their own hardware can choose the smallest 8 billion parameter version, which can run on only a single Nvidia A100 graphics processing unit. The biggest, 70-billion parameter model requires two H100 GPUs.
[2]
Diffbot's AI model doesn't guess -- it knows, thanks to a trillion-fact knowledge graph
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Diffbot, a small Silicon Valley company best known for maintaining one of the world's largest indexes of web knowledge, announced today the release of a new AI model that promises to address one of the biggest challenges in the field: factual accuracy. The new model, a fine-tuned version of Meta's LLama 3.3, is the first open-source implementation of a system known as Graph Retrieval-Augmented Generation, or GraphRAG. Unlike conventional AI models, which rely solely on vast amounts of preloaded training data, Diffbot's LLM draws on real-time information from the company's Knowledge Graph, a constantly updated database containing more than a trillion interconnected facts. "We have a thesis that eventually general purpose reasoning will get distilled down into about 1 billion parameters," said Mike Tung, Diffbot's founder and CEO, in an interview with VentureBeat. "You don't actually want the knowledge in the model. You want the model to be good at just using tools so that it can query knowledge externally." How it works Diffbot's Knowledge Graph is a sprawling, automated database that has been crawling the public web since 2016. It categorizes web pages into entities such as people, companies, products, and articles, extracting structured information using a combination of computer vision and natural language processing. Every four to five days, the Knowledge Graph is refreshed with millions of new facts, ensuring it remains up-to-date. Diffbot's AI model leverages this resource by querying the graph in real time to retrieve information, rather than relying on static knowledge encoded in its training data. For example, when asked about a recent news event, the model can search the web for the latest updates, extract relevant facts, and cite the original sources. This process is designed to make the system more accurate and transparent than traditional LLMs. "Imagine asking an AI about the weather," Tung said. "Instead of generating an answer based on outdated training data, our model queries a live weather service and provides a response grounded in real-time information." How Diffbot's Knowledge Graph beats traditional AI at finding facts In benchmark tests, Diffbot's approach appears to be paying off. The company reports its model achieves an 81% accuracy score on FreshQA, a Google-created benchmark for testing real-time factual knowledge, surpassing both ChatGPT and Gemini. It also scored 70.36% on MMLU-Pro, a more difficult version of a standard test of academic knowledge. Perhaps most significantly, Diffbot is making its model fully open source, allowing companies to run it on their own hardware and customize it for their needs. This addresses growing concerns about data privacy and vendor lock-in with major AI providers. "You can run it locally on your machine," Tung noted. "There's no way you can run Google Gemini without sending your data over to Google and shipping it outside of your premises." Open source AI could transform how enterprises handle sensitive data The release comes at a pivotal moment in AI development. Recent months have seen mounting criticism of large language models' tendency to "hallucinate" or generate false information, even as companies continue to scale up model sizes. Diffbot's approach suggests an alternative path forward -- one focused on grounding AI systems in verifiable facts rather than attempting to encode all human knowledge in neural networks. "Not everyone's going after just bigger and bigger models," Tung said. "You can have a model that has more capability than a big model with kind of a non-intuitive approach like ours." Industry experts note that Diffbot's knowledge graph-based approach could be particularly valuable for enterprise applications where accuracy and auditability are crucial. The company already provides data services to major firms including Cisco, DuckDuckGo, and Snapchat. The model is available immediately through an open source release on GitHub and can be tested through a public demo at diffy.chat. For organizations wanting to deploy it internally, Diffbot says the smaller 8 billion parameter version can run on a single Nvidia A100 GPU, while the full 70 billion parameter version requires two H100 GPUs. Looking ahead, Tung believes the future of AI lies not in ever-larger models, but in better ways of organizing and accessing human knowledge: "Facts get stale. A lot of these facts will be moved out into explicit places where you can actually modify the knowledge and where you can have data provenance." As the AI industry grapples with challenges around factual accuracy and transparency, Diffbot's release offers a compelling alternative to the dominant bigger-is-better paradigm. Whether it succeeds in shifting the field's direction remains to be seen, but it has certainly demonstrated that when it comes to AI, size isn't everything.
Share
Share
Copy Link
Diffbot launches a fine-tuned version of Meta's Llama 3.3, using Graph Retrieval-Augmented Generation to enhance AI responses with up-to-date information from its vast Knowledge Graph.
Diffbot Technologies Corp., a Silicon Valley-based knowledge graph startup, has unveiled a groundbreaking AI model aimed at tackling the persistent challenge of hallucinations in artificial intelligence chatbots. The company has launched a fine-tuned version of Meta's Llama 3.3, enhanced with a new technique called Graph Retrieval-Augmented Generation (GraphRAG) 1.
At the heart of Diffbot's innovation is its vast Knowledge Graph, which contains over a trillion interconnected facts and is updated every four to five days. Unlike traditional AI models that rely on static training data, Diffbot's model is designed to search and retrieve information from this constantly updated database 2.
Mike Tung, Diffbot's founder and CEO, explains the company's approach: "We have a thesis that eventually general purpose reasoning will get distilled down into about 1 billion parameters. You don't actually want the knowledge in the model. You want the model to be good at just using tools so that it can query knowledge externally" 2.
Diffbot's model has demonstrated remarkable performance in benchmark tests. It achieved an 81% accuracy score on FreshQA, a Google-created benchmark for testing real-time factual knowledge, surpassing both ChatGPT and Gemini. Additionally, it scored 70.36% on MMLU-Pro, a more challenging version of a standard test of academic knowledge 12.
In a significant move, Diffbot is making its model fully open-source. This allows companies to run the model on their own hardware and customize it for their specific needs, addressing growing concerns about data privacy and vendor lock-in with major AI providers 2.
The model is available in different sizes, with the smallest 8 billion parameter version capable of running on a single Nvidia A100 GPU, while the full 70 billion parameter version requires two H100 GPUs 1.
Diffbot's approach suggests an alternative path forward in AI development, focusing on grounding AI systems in verifiable facts rather than attempting to encode all human knowledge in neural networks. This method could be particularly valuable for enterprise applications where accuracy and auditability are crucial 2.
As the AI industry continues to grapple with challenges around factual accuracy and transparency, Diffbot's release offers a compelling alternative to the dominant bigger-is-better paradigm, demonstrating that when it comes to AI, size isn't everything 2.
Google unveils DataGemma, an open-source AI model designed to reduce hallucinations in large language models when handling statistical queries. This innovation aims to improve the accuracy and reliability of AI-generated information.
3 Sources
3 Sources
Recent developments in AI models from DeepSeek, Allen Institute, and Alibaba are reshaping the landscape of artificial intelligence, challenging industry leaders and pushing the boundaries of what's possible in language processing and reasoning capabilities.
4 Sources
4 Sources
Deep Cogito, a new AI research startup, has unveiled a series of open-source large language models with hybrid reasoning capabilities, aiming to push the boundaries of AI development towards superintelligence.
2 Sources
2 Sources
Recent developments suggest open-source AI models are rapidly catching up to closed models, while traditional scaling approaches for large language models may be reaching their limits. This shift is prompting AI companies to explore new strategies for advancing artificial intelligence.
5 Sources
5 Sources
Meta has released Llama 3, its latest and most advanced AI language model, boasting significant improvements in language processing and mathematical capabilities. This update positions Meta as a strong contender in the AI race, with potential impacts on various industries and startups.
22 Sources
22 Sources