Only a third of organizations employ adequate testing practices in AI application development, according to new research, prompting calls for increased red teaming to reduce risks.
Analysis from Applause found 70% of developers are currently developing AI applications and features, with over half (55%) highlighting chatbots and customer support tools as their primary focus at present.
Yet despite an acceleration in AI application development, a concerning number of organizations are overlooking quality assurance (QA) efforts during the software development lifecycle.
The study warned this trend is having an adverse impact on both quality and long-term return on investment (ROI).
“The results of our annual AI survey underscore the need to raise the bar on how we test and roll out new generative AI models and applications,” said Chris Sheehan, EVP of high tech & AI at Applause.
AI application development needs a human touch
A key talking point of the Applause study centered around human involvement in the development lifecycle. With developers ramping up the use of generative AI tools in their daily workflows, the need for a ‘human touch’ has become critical to identify and remediate a range of issues.
These include issues such as inaccuracy, bias, and toxicity, the study noted.
Researchers found the top QA-related activities that involve human testing include prompt and response grading (61%), accessibility testing (54%), and UX testing (57%).
Applause added that humans are also crucial in training industry-specific or ‘niche’ models, particularly with the rise of agentic AI applications that interact directly with end-users.
Notably, the study found that only one-third (33%) of organizations currently employ red team testing in application development processes. Red teaming refers to adversarial testing practices – commonly used in cybersecurity – to identify potential weak points in platforms or applications.
Researchers called for a heightened focus on red teaming in AI application development, noting that this could play a key role in highlighting the aforementioned issues such as model bias or inaccuracy.
Application flaws persist
The study from Applause found that customer-related issues are becoming a frequent problem for enterprises. Nearly two-thirds of customers using generative Ai in 2025 reported encountering some sort of issue.
Over a third (35%) encountered biased responses, hallucinations (32%), and offensive responses (17%).
Hallucinations have been a persistent problem in AI development for some time now.
While the situation has improved markedly since the early days of the generative AI boom, the issue is still causing a degree of uncertainty among enterprise IT leaders.
In a study by KPMG in August 2024, six-in-ten tech leaders specifically highlighted hallucinations as a key concern with adopting or building generative AI tools and applications.
Sheehan noted that positive changes are being made by development teams, however. Many enterprises surveyed by the firm are “already ahead of the curve” and are integrating AI testing measures into the development lifecycle at an earlier stage.
This includes more robust model training methods which employ “diverse, high quality” datasets. Some enterprises are also warming to red teaming practices, he added.
“While every generative AI use case requires a custom approach to quality, human intelligence can be applied to many parts of the development process including model data, model evaluation and comprehensive testing in the real world.
“As AI seeps into every part of our existence, we need to ensure these solutions provide the exceptional experiences users demand while mitigating the risks that are inherent to the technology.”
MORE FROM ITPRO
Source link