AWS wants to drastically cut down AI hallucinations – here’s how it plans to do it

AWS’ new Automated Reasoning checks promise to prevent models from producing factual errors and hallucinating, though experts have told ITPro that it won’t be an all-encompassing preventative measure for the issue.

Announced as part of AWS re:Invent 2024, the hyperscaler unveiled the tool as a safeguard in ‘Amazon Bedrock Guardrails’ that will mathematically validate the accuracy of responses generated by large language models (LLMs).

By using mathematical, logic-based algorithmic verification and reasoning processes, Automated Reasoning checks ensure that an AI model’s output aligns with known facts and isn’t based on fabricated or inconsistent data.

“When you implement one of these automated reasoning checks, what happens is, Bedrock can actually check the factual statements made by models are accurate,” AWS CEO Matt Garman said on stage as he unveiled the tool.

“This is all based on sound mathematical verifications, and it’ll show you exactly how it reached that conclusion,” he added.

Garman said AWS has been using automated reasoning internally for some time. For example, the hyperscaler uses it to deliver customers’ identity and access management (IAM) policies, as well as use it in its S3 services.

Automated Reasoning checks are not going to solve the problem of hallucinations, though, according to Peter van der Putten, head of the AI lab at Pegasystems and assistant professor of AI at Leiden University.

“The Automated Reasoning checks that are launched in preview by AWS is an interesting and original new feature. That said, it should not be seen as some generic silver bullet against all forms of hallucinations,” van der Putten told ITPro.

“I see it more as an extra guardrail that could be useful as an additional check in the context of specific use cases where rule-based policies and guidelines are in play,” he added.

The limitations of automated reasoning

AWS suggested airline inquiries and HR policies as specific use cases in which automated reasoning would be most useful. However, van der Putten went on to explain why automated reasoning thrives only in specific circumstances.

“For a specific use case you can ingest documents that contain policies, rules, and guidelines and the system then generates rules that can be run against generative AI output such as chatbot replies to check for validity,” van der Putten said.

“This can be very useful for narrow domains where these rules are known but is not meant to be a catch-all for generative AI in general,” he added.

More generic or more complex AI systems that can be defined by a coverall policy or set of rules will not be as effectively held to account by automated reasoning, as they are by definition less predictable.

“Obviously, one may wonder why these rules are not made available to the core generative AI system in the first place, for instance through a retrieval augmented generation approach,” van der Putten said.

“In certain instances that might not be feasible given lack of access or other reasons, or there is appetite to implement a separate layer of control or audit,” he added.

The announcement is also nothing that hasn’t been done before, according to Leonid Feinberg, co-founder and CEO of Verax AI. While it may sound groundbreaking, he told ITPro similar moves have been made by Guardrails AI and Nvidia NeMo.

“Many existing LLM products already utilize this approach, however, as we’re occasionally seeing in the news, the impact is limited,” Feinberg said.

“The non-determinism and lack of predictability of LLMs is still a problem too complex to be successfully mitigated with such simple tools and is still pretty much unsolved as a whole,” he added.

Source link