54% of software defects in production are caused by human error during testing. Traditional RPA (Robotic Process Automation) is brittle. It breaks when the UI changes54% of software defects in production are caused by human error during testing. Traditional RPA (Robotic Process Automation) is brittle. It breaks when the UI changes

Building a Self-Healing Web Tester with AI Agents and Combinatorial Logic

2026/01/07 12:14
5 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

54% of software defects in production are caused by human error during testing.

If you are a QA Engineer or a Full Stack Developer, you know the pain of Web UI Testing. You spend days writing Selenium or Playwright scripts, targeting specific div IDs and XPath selectors. Then, a frontend developer changes a CSS class, and your entire test suite turns red.

Traditional RPA (Robotic Process Automation) is brittle. It breaks when the UI changes. It’s strictly rule-based.

In this engineering guide, based on research from Fujitsu’s Social Infrastructure Division, we are going to build a "Next-Gen" Testing Pipeline. We will move away from brittle scripts and move toward Autonomous AI Agents.

We will combine Combinatorial Parameter Generation (to ensure we test every edge case) with AI Agents (using tools like browser-use) that "see" the website like a human, making your tests immune to UI changes.

The Architecture: The Agentic Test Loop

We are building a system that doesn't just "click coordinates"; it understands intent.

The Pipeline:

  1. Source Analysis: Extract parameters from the code/specifications.

  2. Combinatorial Engine: Generate the minimum set of test cases to cover all logic paths.

  3. The Agent: An LLM-driven browser controller that executes the test.

  4. The Judge: An AI validator that checks if the output matches the expectation.

Phase 1: The Combinatorial Engine (Smart Pattern Generation)

A common mistake in testing is testing everything (too slow) or testing randomly (misses bugs). The research suggests analyzing source code to generate an Exhaustive Parameter Table.

We need to cover the "All-Pairs" (pairwise) combinations of settings to catch interaction bugs.

**The Logic: \ If you have 3 settings:

  • Theme: [Dark, Light]
  • Notifications: [Email, SMS, Push]
  • Role: [Admin, User]

Testing every combination = 2×3×2=122×3×2=12 tests. \n Pairwise testing can reduce this to ~6 tests while catching 90%+ of defects.

**Python Implementation: \ We can use the allpairspy library to generate this matrix automatically.

from allpairspy import AllPairs # Parameters extracted from the Web UI Source Code parameters = [ ["Dark", "Light"], ["Email", "SMS", "Push"], ["Admin", "User"] ] print("PAIRWISE TEST CASES:") for i, pairs in enumerate(AllPairs(parameters)): print(f"Case {i}: Theme={pairs[0]}, Notify={pairs[1]}, Role={pairs[2]}") # Output: # Case 0: Theme=Dark, Notify=Email, Role=Admin # Case 1: Theme=Light, Notify=SMS, Role=Admin # ... (Optimized list)

Phase 2: The AI Agent (Without Selenium)

This is the game-changer. Instead of writing driver.find_element(By.ID, "submit-btn").click(), we give an AI agent a high-level instruction.

The research highlights the use of "Browser Use," an emerging class of AI agents that control headless browsers.

Why this works:

  • If the "Submit" button changes from  to 
    , Selenium fails.
  • The AI Agent sees a visual element labeled "Submit" and clicks it, regardless of the underlying HTML.

The Implementation

We will use Python with a library like langchain and playwright (simulating the browser-use concept) to build an agent that accepts the parameters from Phase 1.

from langchain.chat_models import ChatOpenAI from browser_use import Agent import asyncio async def run_ai_test(theme, notify, role): # 1. Construct the Natural Language Instruction instruction = f""" Go to 'http://localhost:3000/settings'. Log in as a '{role}'. Change the Theme to '{theme}'. Set Notifications to '{notify}'. Click 'Save'. Verify that the 'Success' toast message appears. """ # 2. Initialize the Agent agent = Agent( task=instruction, llm=ChatOpenAI(model="gpt-4-vision-preview"), ) # 3. Execute history = await agent.run() # 4. Return result return history.is_successful() # Run a test case from Phase 1 asyncio.run(run_ai_test("Dark", "SMS", "Admin"))

Phase 3: The "Past Failure" Feedback Loop (RAG)

The paper notes that 54% of defects are human error—often repeating past mistakes. To fix this, we inject "Past Failure Knowledge" into the Agent.

We create a lightweight RAG (Retrieval-Augmented Generation) system. Before generating the test plan, the system checks a vector database of previous bug reports.

The Workflow:

  1. Ingest: Index old Jira tickets/Bug reports into a Vector DB.
  2. Retrieve: When testing the "Settings Page," retrieve bugs related to "Settings."
  3. Inject: Add a constraint to the Agent's prompt.

Modified Prompt Logic:

# Retrieved Context: "Bug #402: Saving settings fails when username contains emoji." enhanced_instruction = f""" {base_instruction} IMPORTANT: Based on past failure #402, please also test changing the username to 'User😊' before saving to ensure the app does not crash. """

The ROI: Why Switch?

The research indicates massive efficiency gains from this approach.

  1. Night/Weekend Testing: Unlike humans, AI Agents don't need sleep. You can run 10,000 permutations overnight.
  2. Cost Reduction: The study projects a 0.5 man-month reduction per project cycle.
  3. Zero Maintenance: When the UI changes, you don't rewrite scripts. The AI adapts.

Security & Ethics Warning

While powerful, AI Agents executing web actions carry risks:

  • Data Leakage: Be careful sending proprietary specs or PII to public LLMs (OpenAI/Anthropic). Use Azure OpenAI or local models (Llama 3) for enterprise data.
  • Runaway Agents: Always implement a "Human-in-the-Loop" or a hard timeout to prevent the agent from clicking "Delete Database" if it gets confused.

Conclusion

The days of writing brittle XPath selectors are numbered. By combining Combinatorial Logic (to determine what to test) with AI Agents (to determine how to test), we can build a testing pipeline that heals itself.

**Your Next Step: \ Don't rewrite your Selenium suite yet. Start by picking one flaky test flow. Replace it with a browser-use agent and see if it survives the next UI update. \n

\

Market Opportunity
Sleepless AI Logo
Sleepless AI Price(SLEEPLESSAI)
$0.01872
$0.01872$0.01872
+3.08%
USD
Sleepless AI (SLEEPLESSAI) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Trump Approval Rating Tracker: 39% In Latest Survey

Trump Approval Rating Tracker: 39% In Latest Survey

The post Trump Approval Rating Tracker: 39% In Latest Survey appeared on BitcoinEthereumNews.com. Sept. 16-18 net approval rating: Trump’s favorability rating declined three points to 39% and the share of U.S. adults who have an unfavorable view of him increased two points to 57% compared to last week in an Economist/YouGov survey of 1,567 U.S. adults conducted Sept. 12-15 (margin of error 3.6). The results represent an 11-point decline in Trump’s 50% favorability rating at the start of his term, according to Economist/YouGov polling. Sept. 15-6 net approval rating: Trump’s job performance improved one point, to 46%, in Morning Consult’s weekly survey compared to the previous week, while his disapproval rating stayed stagnant at 52% (the poll of 2,204 registered U.S. voters was conducted Sept. 12-14 and has a margin of error of 2). The poll found the killing of conservative activist Charlie Kirk is the top story of 2025, with 67% of voters saying they’ve seen, read or heart “a lot” about it, according to Morning Consult, well above hundreds of other news events Morning Consult has asked about this year. Sept. 10-14: On par with two other polls this week, Trump had a 42% approval rating in the latest Reuters/Ipsos survey conducted Sept. 5-9, while 56% disapproved, representing a two-point increase from the groups’ August poll in his disapproval rating and a two-point uptick in his approval rating (the poll of 1,084 U.S. adults has a margin of error of 3). Sept. 8-7: Trump’s approval rating declined one point from last week, to 45%, tied with his record low since taking office, according to Morning Consult’s weekly survey that found 52% disapprove of his job performance (the poll of 2,201 registered voters conducted Sept. 6-8 has a margin of error of 2). Sept. 7-12: Trump’s approval rating ticked up two points from July, to 44%, while his disapproval rating declined two…
Share
BitcoinEthereumNews2025/09/18 01:08
Solo Bitcoin Miner Wins $210K After 1-in-28,000 Odds

Solo Bitcoin Miner Wins $210K After 1-in-28,000 Odds

A solo Bitcoin miner beat roughly 1-in-28,000 odds to mine a block through Solo CKPool, turning a tiny hash rate into a reward worth about $210,000.
Share
coinlineup2026/04/06 14:58
Payward Names Robert Moore as Chief Financial Officer

Payward Names Robert Moore as Chief Financial Officer

Payward announced the appointment of Robert Moore as Chief Financial Officer, effective immediately. Most executive searches answer the question in front of them
Share
Globalfintechseries2026/04/06 15:16

$30,000 in PRL + 15,000 USDT

$30,000 in PRL + 15,000 USDT$30,000 in PRL + 15,000 USDT

Deposit & trade PRL to boost your rewards!