Cover Photo Major News from Google I/O 2024 featuring Web Filter, LearnLM, Gemini 1.5 Flash, PaliGemma and ChatGPT

Google Search Adds “Web” Filter for Traditional Text Links

Google announced a new “Web” filter at Google I/O. This filter allows users to specifically view traditional text-based links, separating them from AI-generated overviews and information panels. The “Web” filter will be readily accessible on mobile devices, while desktop visibility will depend on search relevance. The global rollout is expected within days. This addition acknowledges user preferences for classic blue links, especially when seeking longer documents or using devices with limited internet access. It also signals a potential shift in SEO strategies, as AI-driven search results may prioritize diverse content formats over traditional website links.

Google Unveils LearnLM, AI for Education at Google I/O

Google introduced LearnLM, a new family of generative AI models specifically designed for education. LearnLM, a collaboration between DeepMind and Google Research, builds upon Google’s Gemini models to provide conversational tutoring across various subjects. LearnLM is already integrated into several Google products, including YouTube, Gemini apps, Google Search, and Google Classroom. Google is piloting LearnLM in Google Classroom to assist teachers with lesson planning by suggesting ideas, content, and activities tailored to specific student needs. On Android, LearnLM powers Circle to Search, a feature that helps solve math and physics problems, including those with symbolic formulas and diagrams. YouTube users on Android in the U.S. can leverage LearnLM to ask clarifying questions and take quizzes based on academic videos. In the coming months, Google plans to expand LearnLM’s capabilities within its Gemini apps, allowing users to create custom chatbots that act as subject-matter experts, providing study guidance and personalized practice activities. Google also intends to partner with educational organizations like Columbia Teachers College, Arizona State University, NYU Tisch, and Khan Academy to explore LearnLM’s potential beyond Google’s own products. While promising, LearnLM faces challenges common to generative AI, such as maintaining an encouraging tone, accurately assessing student responses, and avoiding factual inaccuracies.

Google Introduces Gemini 1.5 Flash, a Fast Multimodal AI Model

Google announced the release of Gemini 1.5 Flash, a new multimodal AI model designed for speed and efficiency in handling high-frequency tasks.  This model boasts a one million token context window and is available in public preview through the Gemini API within Google AI Studio. Additionally, Google revealed that Gemini 1.5 Pro, launched in February, will receive an expanded context window of two million tokens.  Developers can join a waitlist for access to this update. Google emphasizes the distinct strengths of each model: Gemini 1.5 Flash prioritizes speed for quick tasks, while Gemini 1.5 Pro excels in complex, multi-step reasoning tasks.  This diverse range of models provides developers with tailored AI solutions based on their specific needs. The announcement comes on the heels of OpenAI’s unveiling of GPT-4o, highlighting the intensifying competition in the AI landscape.  Both Gemini 1.5 models are available in public preview in over 200 countries and territories.

Google Introduces PaliGemma, a Vision-Language Model for Resource-Constrained Devices

Google unveiled PaliGemma, a new vision-language multimodal model within its Gemma family of lightweight open models.  PaliGemma excels in image captioning, visual question answering, and image retrieval, making it ideal for developers seeking to integrate these functionalities into their applications. As a small language model (SLM), PaliGemma operates efficiently on devices with limited resources, such as smartphones, IoT devices, and personal computers.  This efficiency makes it particularly suitable for applications where low latency and offline functionality are crucial. PaliGemma’s potential applications are vast, ranging from content generation and enhanced search capabilities to assisting the visually impaired.  Its compatibility with various devices opens doors for innovative use cases in wearables, robotics, and beyond. This announcement coincides with the release of Google’s largest Gemma model yet, boasting 27 billion parameters, further expanding the Gemma family’s capabilities.

ChatGPT Integrates Google Drive and Microsoft OneDrive

OpenAI announced a significant update to ChatGPT, enabling paying subscribers to directly import files from Google Drive and Microsoft OneDrive. This feature, available with the new GPT-4o model and older models, streamlines document workflows within the chatbot. Users can import various file types, including spreadsheets, presentations, and documents, by clicking a paperclip icon in the ChatGPT interface. ChatGPT now offers an interactive interface for viewing and editing spreadsheets in full-screen mode, leveraging the underlying AI model for real-time updates. Users can also download edited documents directly from ChatGPT. OpenAI emphasizes user privacy, stating that data from ChatGPT Team and Enterprise customers is not used for training. ChatGPT Plus users can opt out of data usage for training through their Data Controls. This update significantly enhances ChatGPT’s functionality for users who work with documents, saving time and providing greater flexibility.