At Google I/O 2024, the annual developer conference held on May 14, 2024, the company shared how it was "building more helpful products and features with AI—including improvements across Search, Workspace, Photos, Android and more."
A main focus was on the power of Gemini's evolving core AI model—"with multimodality, long context and agents, it brings us closer to our ultimate goal: making AI helpful for everyone." The AI model is designed to "reason across text, images, video and code."
With an estimated 2-billion user products now using Gemini, this advancing AI formed the basis for many new developments that were announced by Google at the I/O Conference.
What were the major announcements?
On the Defining AI: On the actual AI model, Gemini 1.5 Pro improvements and the new 1.5 Flash model (a new light model optimized for tasks where low latency and cost matter) include a series of "quality improvements across key use cases, such as translation, coding, reasoning" in order to process a wider range of more complex tasks. For instance, it "can make sense of multiple large documents, up to 1,500-pages total, or summarize 100 emails. Soon it will be able to handle an hour of video content or codebases with more than 30,000 lines." Gemini 1.5 Pro will be available to Gemini Advanced subscribers in more than 150 countries and over 35 languages.
For its Generative Media advancements, 'Image FX', 'Music FX' have new updates with 'Video FX' models and tools being introduced using 'Veo'. This tool generates high-quality videos that "closely represents a user's creative vision — accurately capturing a prompt's tone" and its details.
Learning & Education: 'LearnLM' is an advanced model for "learning, and grounded in educational research to make teaching and learning experiences more active, personal and engaging." This will be available across Google's products i.e. the Search engine, Gemini chat, Android and Youtube. For instance, a new pilot program in Google Classroom works directly with educators to "simplify and improve the process of lesson planning — a critical, but time-consuming component of teaching."
Photos: With 'Ask Photos', users can request and search for photos directly in a "natural way" by using direct and descriptive questions for the system to process. "Google Photos can show you what you need, saving you from all that scrolling." It can also use the AI to generate a sequence of topic/subject related images.
Workspace: The new features in the Gmail app will be rolled out over the next couple of months. First, it will allow for summarizing emails by analysing email threads. Next, Gemini in Gmail will offer "Contextual Smart Reply, you can edit or simply send as-is." There is also a new capability which allows users to write in more languages, "with automatic language detection and real-time translated captions in more than 60 languages to help people around the world connect."
Search Engine: Bringing "together Gemini's advanced capabilities — including multi-step reasoning, planning and multimodality" with "best-in-class Search systems", the use of the 'AI Overviews' functionality has been expanded. By creating an AI-organized results page, it will allow for greater clarity in searches, with "options to simplify the language or break it down in more detail." With its roll out soon, it is expected to reach over a billion people by the end of the year.
Android: As the new AI assistant, Google has incorporated Gemini models into Android, "which processes text, images, audio, and speech to unlock new experiences while keeping information private." These will reflect the advanced AI specifications and capabilities of the Gemini technology.
More tangible outputs in the future?
All these I/O 2024 developments are based on a "longer context window, new data analysis capabilities, connections to additional Google apps and more customizable options" aimed at a more intelligent and 'natural' experience. However, many of the announced capabilities are either yet to be rolled out across Google's customer bases or their key features are limited to Gemini Advanced users.
Therefore, the coming months will be the true test for the effectiveness and success of such advancements. This period will also reveal how the evolving Gemini models stand up against other similar and competing technologies which are also being rolled out in the AI sphere.