Developers are already using multiple large language model (LLM) and other generative AI-based tools in the creation of automation tools. And soon, the tools will be able to use each other.
A new development in AI “swarms” serves as a wake up call for everyone involved in cybersecurity, automation and, in fact, IT generally: OpenAI’s Swarm.
What is OpenAI Swarm?
OpenAI launched an experimental framework last month called Swarm. It’s a “lightweight” system for the development of agentic AI swarms, which are networks of autonomous AI agents able to work together to handle complex tasks without human intervention, according to OpenAI.
(I wrote about agentic AI, but not swarming agents, in July.)
Swarm is not a product. It’s an experimental tool for coordinating or orchestrating networks of AI agents. The framework is open-source under the MIT license (which allows Python developers to use, modify, and distribute the software with minimal restrictions), and available on GitHub.
In the GitHub readme section, OpenAI says:
“Swarm is currently an experimental sample framework intended to explore ergonomic interfaces for multi-agent systems. It is not intended to be used in production, and therefore has no official support. (This also means we will not be reviewing PRs or issues!)
The primary goal of Swarm is to showcase the handoff & routines patterns explored in the Orchestrating Agents: Handoffs & Routines cookbook. It is not meant as a standalone library and is primarily for educational purposes.”
Swarm is not totally unique. Other existing systems can be used for the orchestration of multiple agents, which approaches the functioning of agentic AI swarms. Though not explicitly designed for swarming, they can be used for making AI agents interact with each other to varying degrees. These include: Microsoft AutoGen, CrewAI, LangChain, LangGraph, MetaGPT, AutoGPT, and Haystack.
While Swarm might be designed for simplicity and relative ease of use, all these other tools are more robust, reliable, supported and ready for prime-time.
OpenAI apparently launched Swarm to explore methods for improving agent collaboration through “routines” and “handoffs.” In this case, “routines” are predefined sets of instructions that guide agents through tasks or workflows. They serve as recipes for agents to follow, which adds control and predictability to multi-agent systems. “Handoffs” enable one agent to delegate a job to another based on the current context. For example, if the agent requires something specific that can be better handled by an agent specializing in that task, it can delegate it. That “handoff” provides the history of the task to the new agent, so it has context under which to proceed.
One characteristic of Swarm is that it’s stateless, so agents don’t remember anything from previous interactions. That simplifying element also limits the tool to simpler tasks. (Developers can, however, build solutions that do enable memory between agent interactions.)
While Swarm isn’t intended for actual production (and OpenAI won’t maintain it going forward), the fact that it’s dabbling in the concept is one indication that agent swarms could eventually become commonplace.
It also points to a trend in which agent swarm technology becomes increasingly usable and, for lack of a better term, democratized.
One way to look at agentic AI swarming technology is that it’s the next powerful phase in the evolution of generative AI (genAI). In fact, Swarm is built on OpenAI’s Chat Completions API, which uses LLMs like GPT-4.
The API is designed to facilitate interactive “conversations” with AI models. It allows developers to create chatbots, interactive agents, and other applications that can engage in natural language conversations.
Today, developers are creating what you might call one-off AI tools that do one specific task. Agentic AI would enable developers to create a large number of such tools that specialize in different specific tasks, and then enable each tool to dragoon any others into service if the agent decides the task would be better handled by the other kind of tool. These tool types could include:
- 1. RAG (Retrieval-Augmented Generation): Enhancing text generation with relevant retrieved information. Basically, these agents would be tasked to “Google it” and return to the task at hand with that found information.
- 2. NL2SQL: Converting natural language queries into SQL commands.
- 3. Text Generation: Creating various forms of written content.
- 4. Code Generation: Producing code based on natural language descriptions.
- 5. Data Analysis: Processing and interpreting large datasets.
- 6. Image Generation: Creating images from text prompts.
- 7. Speech Synthesis: Converting text to spoken audio.
- 8. Language Translation: Translating between different languages.
- 9. Summarization: Condensing long-form content into concise summaries.
- 10. Dialogue Management: Handling multi-turn conversations in chatbots.
Instead of the user making choices, opening new tools and essentially serving as the guide and glue for complex AI-based tasks, the agents would do all this autonomously.
Easy-to-use swarms of AI agents — what could go wrong?
It’s clear that agentic AI swarms could seriously boost enterprise productivity, offloading chores from people, enabling them to focus on higher-level responsibilities.
The risks are also clear. Take security, for example.
At present, as far as we know, no nation-state or state-sponsored hackers are using agentic AI swarms. But that day is surely coming.
Hostile nation states are using LLMs in general, and even ChatGPT in particular, for malicious rreconnaissance and research, scripting and coding, social-engineering and phishing content, language translation, and detection evasion.
At present, people working for these nation states are doing individual hacking, and using LLMs as part of their knowledge toolset, manually prompt-engineering chatbots, then using the returned results in their breach attempts.
In an agentic AI swarm future, state-sponsored hackers will be able to create individual specialist AI agents to do each of these tasks, and enable the agents to call into play the other agents as needed. By removing the “bottleneck” of a human operator, malicious hacking can take place on a massive scale at blistering speed.
It’s reasonable to assume at this early stage that the most effective defense against agentic AI swarm attacks will be agentic AI swarm defenses.
Another area of concern is the risk of overcomplexity. Agentic AI, including agentic AI swarming technology, operates autonomously to pursue goals. It can be “creative,” or, more accurately, unpredictable in how it achieves goals given to it by the developers who create it and the users who deploy it. Because it’s autonomous, people might not know what it’s doing or how it’s doing it. And it’s possible to lose track of what agent swarms are doing, or even that they’re still operating.
Individual employees might automate their own work using agentic AI swarms they monitor close — agents that could continue running after the workers leaves the company (or gets hit by a bus).
Pessimistic (or realistic) prognosticators fear agentic AI swarms might even accelerate job losses because they’ll be so capable of operating like people do.
As with other new, powerful developments in AI technology, agentic AI swarms are packed with promise and peril.
What’s important to know about OpenAI’s Swarm is that it represents a move to simplify and democratize swarming agents. That probably means near-future exponential growth in the number of swarming agents in operation, and a rise in the expectation that tech pros will be using agentic AI agents for all manner of automation.
The agents are coming. I recommend you learn all about them before they get here.
Source link