Google Launches Gemini 2.0 with Autonomous Tool Linking
Google is embracing “agentic experiences” in the rollout of Gemini 2.0, its new flagship family of generative AI expected to compete with ChatGPT with OpenAI o1, GitHub Copilot, and Amazon Nova.
The tech giant released the first model, Gemini 2.0 Flash, on Dec. 11 for global developers through the Gemini API in Google AI Studio and Vertex AI. Consumers can expect Gemini 2.0 to impact Google Search and AI Overviews, with limited testing beginning next week. A public rollout is set for early 2025.
Through Gemini 2.0, developers can access multimodal input and text output, while early access partners can test text-to-speech and native image generation. The Gemini app will be updated with Gemini 2.0 Flash “soon,” Google said in a press release.
General availability, and additional model sizes such as the base model Gemini 2.0, are expected to follow in January.
What is Gemini 2.0?
Gemini 2.0 is a multimodal generative AI model running on Google’s Trillium hardware. It is designed to make online tasks easier and more intuitive by assisting with summarizing information, performing web searches, and even interacting with tools or apps more naturally.
Google noted that Gemini 2.0 Flash is twice as fast as its predecessor, 1.5 Pro, and it surpasses it in AI performance benchmarks such as MMLU-PRO and LiveCodeBench.
“If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful,” Google CEO Sundar Pichai said in a statement.
What sets Gemini 2.0 apart is its agentic capabilities. Pichai described these capabilities as enabling the model to “understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision.”
Google further emphasized that Gemini 2.0 distinguishes itself through:
- The multimodal processing.
- Ability to understand long books or wide swaths of the web.
- Function calling.
- “Native tool use.”
- “Complex instruction following and planning.”
Native tool use allows the AI to incorporate tools like Google Search and code execution to perform autonomous actions. In practical terms, that sometimes looks like Google’s Project Astra — an Android app now in testing that uses the phone’s camera and Gemini’s reasoning to answer questions about the world in real time. Project Astra can analyze up to 10 minutes of video at a time.
Google also announces additional projects, prototypes
Project Mariner
Another proof of concept is Project Mariner, an experimental Chrome extension showcasing Google’s effort to enable Gemini to read browser screens. Users can ask it to summarize web pages or make a purchase.
“It’s still early, but Project Mariner shows it’s becoming technically possible to navigate within a browser, even though it’s not always accurate and slow to complete tasks today, which will improve rapidly over time,” Demis Hassabis, CEO of Google DeepMind and Koray Kavukcuoglu, CTO of Google DeepMind, wrote in the press release.
SEE: Google revealed specialized image and video generation AI models in early December, too.
Deep Research
Deep Research, available with a Gemini Advanced subscription, is an experimental model connected to the web. It is designed to create research plans and outlines for grad students, scientists, or entrepreneurs. The tool searches the web for the topic of your choice, presents a research plan to approve or change, and then analyzes the existing body of work.
Jules developer assistant
Google also announced a new developer tool called Jules, a coding assistant powered by Gemini 2.0 Flash. Jules sits within GitHub and can write code, fix bugs, and create and execute multi-step plans. Jules is available to a limited pool of testers today. Google expects expanded availability in early 2025.
Google is preparing for cyber threats
Google also noted that it is aware Project Mariner, in particular, might be a rich hunting ground for prompt injection attacks. The company said it is working on putting up guardrails against phishing and fraud attempts where attackers might sneak AI instructions into emails, websites, or documents.
Source link