How AI Agents Are Replacing Outdated Testing Tools in Microsoft’s Business Central

If you run a business on Microsoft Dynamics 365 Business Central, 2026 has brought something that was not on most IT teams’ radar: autonomous AI agents that now operate inside the platform itself. These are not Chabot’s or smart suggestions. They are software agents that process invoices, create sales orders, and execute financial workflows, automatically, without waiting for a human to click a button.
What nobody talks about alongside this is the testing problem it creates. How do you verify that an AI agent is making correct decisions? And why are the tools most teams have been using to test Business Central completely unsuited to answering that question?
What Microsoft Just Changed in Business Central
Earlier this year, Microsoft rolled out its 2026 Wave 1 release across Dynamics 365. For Business Central, the version of D365 designed for small and medium-sized businesses, the update was significant. The headline additions included:
- A Payables Agent that reads incoming invoices, matches them to purchase orders, and prepares them for payment approval, all without human involvement in the matching process itself
- A Sales Order Agent that can create and update sales orders from natural language instructions, including emails
- An Agent Designer tool that lets organizations build their own custom agents for any internal BC workflow, without needing traditional software development skills
These features represent a genuine shift in how Business Central operates. For years, BC has been a system that records what people do. With Wave 1 2026, it has become, in Microsoft’s own words, a system that “actively supports decision-making and increasingly automates work.” Transactions are happening in the background. Decisions are being made by algorithms.
For IT and finance teams, this raises an obvious question: how do you test that?
Why the Old Testing Approach Has Hit a Wall
Most organizations testing Business Central have relied on some version of the same approach for years: record the steps a human takes through the software, then replay those steps automatically to check that everything still works after an update. Microsoft built this capability into its testing ecosystem through a tool called RSAT (Regression Suite Automation Tool), and third-party platforms adopted similar record-and-replay approaches.
In late 2025, Microsoft made a significant announcement: RSAT is now feature complete. That is Microsoft’s way of saying the tool will not receive new capabilities. It still works in the environments it was designed for, but it is not going to evolve. And it was never designed for Business Central’s new AI agents.
The fundamental issue is architectural. RSAT and tools like it record what a human does on screen. They click buttons, fill in fields, and check that the interface responds as expected. An AI agent does not interact with the interface the way a human does. When the BC Payables Agent processes an invoice, it does not click through a form, it reads data, matches records, and writes results directly to the financial system. There are no buttons to record. There is a decision to validate.
| Record-and-replay testing asks: “Did the screen behave as expected?”AI agent testing asks: “Did the software make the right financial decision?”These are different questions, and they need different tools. |
What AI Agent Testing Actually Looks Like
Testing whether an AI agent is making correct decisions requires validating the outcome of that decision, specifically, the data it creates in the financial system. For the BC Payables Agent, that means checking things like:
- Did the agent match the invoice to the correct purchase order?
- Did it assign the correct account code in the general ledger?
- Did the payment terms applied match what was agreed with the vendor?
- Did the transaction land in the right accounting period?
None of these can be verified by looking at a screen. They require something that can access the database after the agent has acted and assert that the specific financial values are correct. This is a fundamentally different approach to software quality assurance, and it is the direction that enterprise software testing is moving, not just in Business Central but across ERP platforms generally.
The good news is that this approach is not significantly more complicated to set up than traditional testing, once you have the right platform. The key difference is that instead of recording clicks, you define what the correct outcome should look like, which account code, which dimension, which period, and the testing platform checks whether the software produced exactly that.
Why This Matters Beyond the Technical Detail
For businesses running Business Central, the practical stakes are clear. The Payables Agent will process invoices at volume. If it is misconfigured or if an update from Microsoft changes how it codes certain transactions, the error will replicate across every invoice it touches, quickly and silently, until someone spots it in a bank reconciliation or an audit.
Manual reviews catch individual errors. Automated outcome testing catches systematic ones, the kind that happen when a software update changes a default setting or when a new vendor category does not map correctly to the chart of accounts. At the transaction volumes most BC implementations handle, the difference between catching an error on the third invoice and catching it on the three hundredth is a material one.
This is also why the conversation about AI agents in enterprise software cannot stay focused only on what they do for users. The equally important question, one that is only just beginning to be addressed in the industry, is how you verify that they are doing it correctly.
Where the Technology Is Heading
The shift from record-and-replay testing to AI agent outcome validation is still early. Most organizations running Business Central have not yet started building test coverage for the Wave 1 2026 agent features, and Wave 2, which arrives in October, will expand those capabilities further.
Platforms built specifically for this challenge, rather than adapted from older testing approaches, are already available. Sofy.ai, for example, has built AI test agents designed specifically for Dynamics 365, including dedicated support for Business Central workflows. Their approach focuses on validating what the software produces in the database after an AI agent acts, not what appeared on the screen. For businesses evaluating their options, their Dynamics 365 test agents overview gives a practical picture of what this kind of testing looks like across Finance, Supply Chain, Sales, and Business Central.
RSAT’s feature complete announcement was not a crisis for D365 teams. It was a signal. The era of recording what users click is giving way to the era of validating what AI agents decide. The tools are catching up, the question is whether the testing strategies of individual businesses are catching up alongside them.




