What is Google DeepMind’s SAFE system and how accurate is it?
Cover Photo Major News from Google's DeepMind and Gemini, Grok 1.5, Amazon and Anthropic

Google DeepMind’s AI Fact-Checker Outperforms Humans

Google’s DeepMind research unit has just unveiled a groundbreaking AI system that can evaluate the accuracy of information better than humans. Introducing SAFE, or Search-Augmented Factuality Evaluator. This AI fact-checker breaks down generated text into individual facts and uses Google Search results to determine the accuracy of each claim. In a head-to-head battle against human annotators, SAFE’s assessments matched human ratings 72% of the time. And in cases where SAFE and humans disagreed, SAFE was found to be correct 76% of the time. While some experts question what “superhuman” really means in this context, there’s no denying that SAFE has the potential to revolutionize fact-checking. One clear advantage? Cost. Using SAFE is about 20 times cheaper than human fact-checkers. As the volume of information generated by language models continues to explode, having an economical and scalable way to verify claims will be increasingly vital.

Google Gemini AI Coming to Android Tablets, Coexisting with Google Assistant (For Now)

Google’s new generative AI model, Gemini, is making its way to your devices, and it looks like it might be able to coexist with Google Assistant, at least for now. Currently available on Android phones, Gemini AI is expected to eventually replace Google Assistant, the virtual assistant used for voice commands. When installed on phones, users have to choose between Gemini and Google Assistant. But a recent discovery in the Google Search app’s code suggests that things might be different for tablets. The code refers to using Gemini AI on a “tablet,” along with several features, and it appears that the Google app will host Gemini AI on tablets, rather than a standalone app like on phones. As this is still a beta version of the Google Search app, Google could always change its mind and not roll out these features. But for now, it looks like Android tablet users might get to enjoy the best of both worlds with Gemini AI and Google Assistant.

Elon Musk Unveils Grok-1.5: Closing In on GPT-4 Performance

Elon Musk’s xAI has just announced Grok-1.5, a major upgrade to its proprietary large language model, mere weeks after open-sourcing Grok-1. Set to release next week, Grok-1.5 brings enhanced reasoning and problem-solving capabilities, closing in on the performance of industry giants like OpenAI’s GPT-4 and Anthropic’s Claude 3. While it still falls slightly behind GPT-4 and Claude 3 on the MMLU benchmark, xAI expects to continue these improvements with Grok-2, which Musk says should exceed current AI on all metrics. Grok-1.5 will initially be available to early testers and those using the Grok chatbot on the X platform, with a phased rollout to a wider set of users over time. Musk has also announced that followers with a certain level of verified subscriber followers will get Premium and Premium+ subscription benefits, including Grok, for free.

Amazon Invests Record $2.75B in Anthropic, Doubling Down on AI

Amazon has just announced a massive $2.75 billion investment in Anthropic, the company behind the powerful Claude 3 family of large language models. This brings Amazon’s total investment in the OpenAI rival to a staggering $4 billion, making it the largest venture investment in the e-commerce and cloud computing giant’s history. Anthropic has been making waves lately with the release of Claude 3, which has taken the crown from OpenAI as the most powerful AI model in the world. This investment is a clear sign that Amazon sees a major upside in convincing customers to use its cloud services, build AI apps with its Bedrock platform, and do so using cutting-edge models like Claude 3. As the AI race heats up, this record-breaking investment in Anthropic could be a game-changer, cementing Amazon’s position as a major player in the world of artificial intelligence.

Frequently asked questions

SAFE (Search-Augmented Factuality Evaluator) is Google DeepMind’s new AI fact-checking system that evaluates information accuracy by breaking down text into individual facts and verifying them against Google Search results. The system has demonstrated impressive accuracy, matching human ratings 72% of the time and proving correct 76% of the time when disagreeing with human annotators. It’s also approximately 20 times more cost-effective than human fact-checkers.
Google Gemini AI is currently available on Android phones and is expanding to tablets. On phones, users must choose between Gemini and Google Assistant, but code discoveries suggest that tablets might allow both to coexist. The Google Search app will likely host Gemini AI on tablets rather than requiring a standalone app, offering users access to both Gemini’s advanced AI capabilities and Google Assistant’s traditional features.
Grok-1.5, announced by Elon Musk’s xAI, features enhanced reasoning and problem-solving capabilities compared to Grok-1. While it still performs slightly below GPT-4 and Claude 3 on the MMLU benchmark, it represents a significant improvement in AI performance. The update will be available to early testers and X platform users with Grok chatbot access, with broader rollout planned over time.
Amazon has invested a total of $4 billion in Anthropic, with the latest investment being $2.75 billion. This represents Amazon’s largest venture investment ever and demonstrates their commitment to advancing AI technology. The investment aims to strengthen Amazon’s cloud services and Bedrock platform while leveraging Anthropic’s Claude 3 model, currently considered the most powerful AI model available.
SAFE’s efficiency comes from its ability to automatically process and verify information using Google Search results at scale, making it 20 times more cost-effective than human fact-checkers. The system can quickly analyze large volumes of content and break down complex claims into verifiable facts, making it particularly valuable for handling the increasing amount of AI-generated content.
While Google Gemini is expected to eventually replace Google Assistant on phones, the transition appears to be more flexible for tablet users. Current Google Assistant users won’t be immediately forced to switch, as the company is taking a phased approach to the rollout. Tablet users may even have the option to use both services simultaneously, allowing them to benefit from both traditional Assistant features and Gemini’s advanced AI capabilities.
Grok-1.5 positions itself as a competitor to leading AI models like GPT-4 and Claude 3, though it currently performs slightly below them on standard benchmarks. Its key differentiator is its integration with X (formerly Twitter) platform data and real-time information access. The model focuses on providing more conversational and personality-driven responses while maintaining strong reasoning capabilities.
Picture of Gor Gasparyan

Gor Gasparyan

Optimizing digital experiences for growth-stage & enterprise brands through research-driven design, automation, and AI