Microsoft Unveils Phi-Silica, a Specialized AI Model for Copilot+ PCs
At the annual Microsoft developer conference, they announced Phi-Silica, a new small language model (SLM) specifically designed for the NPUs (Neural Processing Units) in its upcoming Copilot+ PCs. This 3.3 billion parameter model will be embedded in all Copilot+ PCs, available starting in June. Phi-Silica prioritizes efficiency, boasting a first token latency of 650 tokens per second while consuming only 1.5 Watts of power. This allows the model to run locally on the NPU without burdening the PC’s CPU or GPU. Microsoft highlights Phi-Silica as a significant step in bringing advanced AI capabilities directly to Windows devices. The model will empower developers to create innovative applications that leverage local AI processing for enhanced productivity and accessibility. Phi-Silica joins Microsoft’s growing family of Phi-3 models, each tailored for specific use cases and hardware capabilities. This latest addition underscores Microsoft’s commitment to advancing AI accessibility and performance across its product ecosystem.
Microsoft’s Azure AI Studio Launches with GPT-4o Support
Microsoft also announced the general availability of Azure AI Studio, its platform for building and deploying generative AI applications. The updated platform now includes access to OpenAI’s GPT-4o model, along with several new features for developers. One of the key additions is the integration of GPT-4o, enabling developers to incorporate advanced text, image, and audio processing into their apps. Azure AI Studio also introduces support for the Azure Developer CLI and AI Toolkit for Visual Studio Code, streamlining the development process. Microsoft is expanding its Models-as-a-Service (MaaS) offering, adding new models from providers like Nixtila and Core42. The MaaS program allows developers to access and fine-tune models through a pay-as-you-go plan. Furthermore, Azure AI Studio now features an AI toolchain for data integration, prompt orchestration, and system evaluation. Developers can also utilize prompt flow controls for managing multimodal workflows and gain insights into their AI applications through tracing, debugging, and monitoring capabilities.
Microsoft Debuts Team Copilot for Enhanced Collaboration
Team Copilot was also unveiled at the conference, a new suite of AI-powered tools designed to enhance teamwork and streamline project management within organizations. Team Copilot integrates with Microsoft Teams, Loop, and Planner, offering features like automated meeting agendas, collaborative note-taking, intelligent chat summaries, and task management assistance. Within Teams meetings, Team Copilot can manage agendas, take notes that participants can co-author, and provide summaries of key discussion points. In Loop and Planner, it aids in creating and assigning tasks, tracking deadlines, and providing project status updates. Microsoft plans to release Team Copilot in preview later this year for users with a Copilot for Microsoft 365 license, priced at $30 per user per month.
GitHub Copilot Extends Reach with Third-Party App Integrations
GitHub introduces Copilot Extensions, allowing developers to connect their favorite tools and services directly within their coding environment. Through GitHub Copilot Chat, developers can now interact with supported apps, triggering actions, retrieving information, and streamlining workflows. Initial integrations include popular services like Azure, DataStax, Docker, Microsoft Teams, MongoDB, and Stripe, enabling tasks like deploying code to Azure or managing cloud resources without leaving GitHub. This move positions GitHub as a central hub for software development, minimizing context switching and integrating AI assistance with external tools. GitHub Copilot Extensions are currently in private preview, with plans to expand supported apps in the future.
Meta Unveils Chameleon, a New Multimodal AI Model
Meta has introduced Chameleon, a new family of AI models designed for seamless integration of text and images. Unlike existing models that combine separate components for different modalities, Chameleon utilizes an “early-fusion” architecture, enabling it to learn from and generate interleaved sequences of images and text. Meta’s research reveals that Chameleon achieves state-of-the-art performance in tasks like image captioning and visual question answering, while remaining competitive in text-only tasks. The model’s unified token space allows for efficient processing and generation of mixed-modal content. Chameleon demonstrates impressive capabilities in understanding and responding to complex prompts involving both visual and textual information. Human evaluations indicate a preference for Chameleon’s multimodal outputs. While Meta has not released the models yet, their potential impact on AI research and applications is significant. Chameleon’s early-fusion approach could inspire advancements in multimodal AI, particularly as more modalities, such as code and robotics control, are incorporated.