A Head of Tax sat through three vendor demos last month. Every demo started with the word "AI". Every demo ended with a chat window. By the third one she had stopped taking notes. The product was not changing. Only the label was.
This is happening across the category. The tax software market has decided 2026 is the year of AI. Every release note mentions it. Every conference keynote leads with it. Most of what is being sold is not new. It is a chatbot bolted onto an existing screen. The same software, with a sidebar.
The buyer has no way to tell the difference. There is no shared language for what "AI in tax" actually does. So the chatbot vendors and the agentic platform vendors are pitching from the same brochure, and the buyer is left to guess.
This article gives the buyer a vocabulary. Four levels. One question. A way to walk out of a vendor demo knowing exactly what you were just shown.
Why a maturity model
The self-driving industry got here first. The SAE J3016 standard defines six levels of driving automation, from Level 0 (no automation) to Level 5 (full autonomy). Nobody in the industry calls a lane-keep assist "self-driving" any more, because the levels exist. A buyer can ask "what level?" and the answer is auditable.
Tax needs the same. The capabilities are real, the categories are real, but without a shared label nobody can talk about them honestly. So here is the label.
Level 1. Chatbot.
The AI answers questions from a corpus of tax documents. PDFs. ATO guidance. The tax act. Internal procedures. It returns text.
This is RAG. Retrieval-augmented generation. A search engine with a friendlier interface. The agent does not touch your tax data. It does not call any actions. It can only answer.
This is useful. A junior preparer can ask "what is the threshold for the small business CGT concessions in FY26" and get an answer faster than Google would have given it. A new joiner can find their own procedure faster than searching the shared drive.
But this is not the work. It is the research that precedes the work. The treadmill is downstream of where Level 1 sits. Most "AI for tax" announcements this year are Level 1.
Level 2. Single-action automation.
A specific, bounded task gets automated by a model. One input. One output. On rails.
Examples. Classify expenditure as capital versus revenue. Extract the invoice date from a PDF and write it to a field. Identify which fixed assets fit a particular depreciation pool. Real tasks. Real time saved.
The human still drives. The Level 2 model sits inside a screen, runs when called, returns a result, and waits for the next click. It does not know what came before or what comes next. A tool inside a workflow, not the operator of it.
A lot of "AI enabled" tax software is Level 2. The classifier is real. The extraction is real. But be honest about what you bought. One narrow task at a time, human in the seat.
Level 3. Workflow assistant.
The AI helps inside one workflow. It takes a step that used to be manual and proposes what a human should approve.
Example. The team is preparing the annual income tax return. The Level 3 assistant looks at the temporary differences in the deferred tax workpaper and drafts the journal entries that flow from them. The human reviews, edits, approves. The journal posts.
Level 3 is where most genuine AI productivity in tax sits today. The model composes multiple field-level operations into a single proposed outcome. Inside one workflow, it is doing real work.
The limit is that Level 3 cannot reach across workflows. It cannot read the tax ledger. It cannot escalate when a variance breaches a threshold. It cannot leave a record that satisfies an ATO auditor. It does the writing. It does not own the consequences.
Level 4. Agentic Tax Operations.
The AI is a first-class user of the platform. It can do everything a human user can do, except make the judgement calls.
This is the bar. At Level 4 the agent:
- Reads the structured tax ledger directly, not a document corpus.
- Calls multiple actions in sequence, not one at a time.
- Drafts, proposes, composes, escalates, and routes for sign-off across the whole workflow.
- Leaves a field-level audit trail. Every action provable, attributable, reviewable.
- Stops at the judgement calls and hands them to the human with the supporting evidence.
Two examples. Different seats, same architecture.
A preparer says to the agent: "Set tax rules and mapping for all new accounts. Then prepare and load our fixed asset workpaper." The agent reads the new accounts, applies the mapping rules, saves them, and writes an audit log entry for each one. It reads the fixed asset register, composes the workpaper from the platform's templates, and loads it for review. A morning of clicking through screens, done in minutes. Every action provable.
A Head of Tax says to the agent: "I have a last-minute meeting with the CFO. Prepare an audit and risk committee paper in our branding. Show effective tax rate, cash tax, and deferred tax balances. Highlight the key decisions made by the team and the balances driven off the GL." The agent pulls the data, composes the paper in the branded template, surfaces the team's decisions from the audit log, and traces each balance back to its origin in the GL inline. The Head of Tax walks into the meeting with a board-grade paper that would normally take a junior preparer the rest of the day to assemble.
This is not magic. It is architecture. The platform exposes its actions through a protocol an AI can call (in TaxTime's case, MCP). The data sits in a structured tax ledger an AI can read. The GL is already inside the tax ledger, so there is no GL-to-tax reconciliation to chase. The audit trail is structural, not bolted on. Judgement calls are fenced off as human-only.
Almost no tax platform in the world is built for Level 4. The ones that are stand alone in the category.
Level 5? No.
A logical reader gets here and asks the obvious question. What about Level 5? What about a fully autonomous tax function where the agent makes the judgement calls, files the return, and signs off itself?
The answer is no. Not "not yet". No.
Tax has irreducible judgement. A transfer pricing position. A residency determination. An anti-avoidance application. These are judgements. The ATO under Justified Trust requires a human on the line, and the standard is rising. The board signs off the tax position. The Head of Tax owns it to the audit committee. Human accountabilities, not workflow steps that happen to involve a person.
Anyone selling Level 5 is selling a liability. The buyer carries the legal and reputational risk of an unsupervised agent making judgement calls. The vendor is operating outside the language guardrails of the category.
We name Level 5 to exclude it. Tax Operations is built to make the human's judgement calls easier, not to remove them. The agent does the treadmill work. The human keeps the calls. The Head of Tax stays the Value Protector.
How to ask a vendor what level they really are
Three questions cut through the marketing.
These three are the vendor-side mirror of last week's five-question agent-ready test. That test asked whether your function is ready to use an agent. Tax ledger. Exposed actions. Audit trail. Judgement-step separation. Reproducibility. The three below ask whether the vendor is selling you a product that can be one. The first three of the five, restated from the other end of the table. Pair them.
Question one. What data does the AI read. If the answer is "a knowledge base of tax documents" or "our help articles" or "the ATO website", it is Level 1. If the answer is "your data, in our platform", keep going. (Mirrors buyer-side question one: do you have a tax ledger.)
Question two. What actions can the AI take. If the answer is "it answers your question" or "it summarises the document", it is Level 1. If the answer is "it classifies" or "it extracts", it is Level 2. If the answer is "it drafts a journal for review", it is Level 3. If the answer is "it can call any action a user can call, and stops at the judgement calls", it is Level 4. (Mirrors buyer-side question two: does your platform expose actions.)
Question three. What is the audit trail. If the answer is "we log the questions you asked", it is Level 1 dressed up. If the answer is "every action the agent takes is logged at field level, with the user who approved it, the input it received, and the output it produced", it is Level 4. (Mirrors buyer-side question three: is there an audit trail at the field level.)
A vendor that cannot answer these three questions without slides is not at Level 4. A vendor that can is the conversation worth having.
Why this matters now
Two things are happening at once. The AI category is exploding. The audit environment is tightening. Justified Trust is not relaxing. The ATO does not soften its expectations because a model pressed the button.
The Head of Tax who buys a Level 1 chatbot and tells the audit committee "we have deployed AI" is exposed. The committee assumes the AI does the work. It doesn't. The gap between the slide and the reality is where the liability lives.
The Head of Tax who buys a Level 4 platform has bought something different. The agent does the treadmill work. The human keeps the judgement calls. The audit committee gets a credible story. The work gets done. Nobody is exposed.
In the next eighteen months the category will sort itself. Buyers who can name the levels will save themselves from buying a chatbot at the price of an agent. Buyers who cannot will pay for both.
This article is the language. Save it. Use it. Ask the question.
Off the treadmill. On to impact.
Andrew Danckert is the founder of TaxTime. This article extends "Why MCP is the tax operations breakthrough no one is talking about" (May 2026).

.png)
.jpg)