Google Unveils Imagen 2 with Text-to-Live Images Feature, Raising Concerns
Google has announced the launch of Imagen 2, an enhanced image-generating tool, within its Vertex AI developer platform. This comes after Google’s previous image generator, built into its AI-powered chatbot Gemini, faced controversy for injecting gender and racial diversity into prompts, resulting in offensive inaccuracies. The most significant addition to Imagen 2 is the “text-to-live images” feature, which can create short, four-second videos from text prompts. Google is positioning this feature as a tool for marketers and creatives, such as generating GIFs for ads. However, the current resolution of these live images is low, at 360 pixels by 640 pixels, with Google promising improvements in the future. Despite Google’s emphasis on safety filters and bias mitigations, questions remain about the competitiveness of live images compared to other video generation tools in the market. Additionally, Google has not provided detailed information about the training data used for Imagen 2, raising concerns about potential IP-related lawsuits and the lack of an opt-out tool or compensation for creators whose work may have been used in the model training process.
Google Introduces $10 AI Add-Ons for Workspace, Following Microsoft’s Lead
In a move to monetize AI, Google has announced two new $10 per user per month add-on packages for its Google Workspace productivity suite. This follows Microsoft’s decision last year to add $30 per user per month to the price of an Office 365 subscription for its Copilot feature. The first add-on, AI meetings and messaging, takes notes, provides meeting summaries, and translates content into 69 languages. Aparna Pappu, VP & GM at Google Workspace, highlighted the addition of 52 new languages, including Filipino and Korean, bringing the total number of supported languages to 69. The second add-on, AI security, helps admins keep Google Workspace content more secure by classifying and protecting files with sensitive characteristics, protecting private information, and applying data loss prevention controls tailored to individual organizations’ requirements. While the $10 per user cost may seem steep, it aligns with the pricing of similar features from third-party services. Google allows customers to mix and match license types, applying the advanced features where they would be most useful. The two add-ons are now available to Workspace subscribers.
Google Cloud Introduces Vertex AI Agent Builder for Simplified AI Agent Creation
Another interesting launch from Google, it has unveiled a new tool called Vertex AI Agent Builder, designed to simplify the creation of AI agents. These agents, unlike traditional chatbots, can take actions based on conversations and interact with back-end transactional systems to automate processes. Google Cloud CEO Thomas Kurian emphasized the ease and speed with which users can build and deploy production-ready, generative AI-powered conversational agents using Vertex AI Agent Builder. One key feature of the tool is “grounding,” which ties answers to reliable sources such as Google Search or enterprise data sources. The new capabilities are already available and support multiple languages, with country-based API endpoints in the U.S. and EU. As the AI agent craze continues to grow, Google Cloud aims to position itself as a leader in simplifying the creation of these powerful tools for businesses.
Google’s Gemini 1.5 Pro Enters Public Preview on Vertex AI with Impressive Context Window
Gemini 1.5 Pro, most capable generative AI model of Google, is now available in public preview on Vertex AI, the company’s enterprise-focused AI development platform. Launched in February, Gemini 1.5 Pro boasts an impressive context window, capable of processing between 128,000 to 1 million tokens, equivalent to around 700,000 words or 30,000 lines of code. This is significantly higher than competitors like Anthropic’s Claude 3 and OpenAI’s GPT-4 Turbo. The model’s capabilities extend to analyzing code libraries, reasoning across lengthy documents, and engaging in long conversations with chatbots. Being multilingual and multimodal, Gemini 1.5 Pro can understand images, videos, and now audio streams, enabling it to analyze and compare content across various media formats and languages. Google acknowledges that processing a million tokens takes time, with searches in demos taking between 20 seconds and a minute to complete. However, the company is working on optimizing the model to improve latency.