Opinion15 May 2026 · 10 min read

When to trust an AI tax agent - a CA-firm risk framework

The hardest question in modern practice management isn't how to use AI agents; it's where the line falls between agent and human. A six-factor framework for deciding what to delegate.

SH
CA Vikram Shah
Head of Tax Research

When to trust an AI tax agent

Every CA practice partner I speak to has the same uneasiness. The agents work. They reconcile faster than humans. They draft notices that are 80% there in one pass. They catch Section 43B mismatches a junior would have missed. And yet, signing the return is still terrifying.

That uneasiness is correct. Here is the six-factor framework we use internally to decide whether an agent can run a task autonomously, with light review, or with full sign-off.

Factor 1: Reversibility

Can the outcome be undone within a reasonable window?

  • Reversible (low risk): Drafting an internal email, suggesting a reconciliation match, classifying a notice by type. Agent runs free.
  • Reversible at cost: Filing a return that can be revised. Agent prepares, human approves.
  • Irreversible: DSC-signing a final return submission, communicating with a tax officer in person. Human only.

Factor 2: Statutory exposure

What is the maximum penalty if the agent gets it wrong?

  • Under ₹10,000 or no penalty: Agent runs free. Most agent work falls here.
  • ₹10,000-₹1 lakh: Agent + senior review.
  • Above ₹1 lakh or prosecution risk: Partner-level review mandatory.

Factor 3: Ambiguity in the rule

Is the rule definite (Section 80C ceiling = ₹1,50,000) or interpretive (whether a payment is FTS or business profit under a treaty)?

Agents handle definite rules at human-or-better accuracy. They are unreliable on interpretive rules where reasonable practitioners disagree. Use agents for the definite. Use humans for the interpretive.

Factor 4: Volume vs uniqueness

  • High volume, repetitive: GSTR-2B reconciliation, TDS entry validation, DIR-3 KYC for 50 directors. Agent-native.
  • Low volume, unique: A novel transfer-pricing position for a single client. Human-led with agent-assisted research.

A useful test: if you would assign it to a different junior every week without re-training, an agent can do it. If you assign it to your most experienced senior because the last junior got it wrong, an agent cannot.

Factor 5: Counter-party visibility

Will the agent's output be seen externally?

  • Internal-only: Internal task notes, draft reconciliation reports. Agent-final.
  • Client-facing: Client emails, calculation summaries. Agent-draft + human polish.
  • Government-facing: Filed return, notice response, tribunal submission. Human-signed.

Factor 6: Audit trail and accountability

Can you prove what the agent did and why, to a future auditor or to ICAI?

ThynkTax requires every Tier-2/Tier-3 agent to emit a reasoning trace - the LLM's chain-of-thought, the documents it consulted, the rules it applied, the rejected alternatives. Without a reasoning trace, the agent should not be touching anything billable.

The four-tier matrix

Putting it all together:

| Tier | Description | Examples | Approval gate |
|------|-------------|----------|---------------|
| 0 | Suggestion-only | Reconciliation matches, classification, naming, summarisation | None - user accepts/rejects inline |
| 1 | Autonomous | Validation runs, threshold monitoring, deadline alerts, cross-domain reconciliation | None for autonomous; exceptions flagged |
| 2 | Supervised | Return preparation, notice drafting, vendor email composition | Human reviews before send/save |
| 3 | Approval-gated | Filing submission to GSTN / IT portal / MCA / TRACES | Human explicitly approves; partner sign-off for large clients |

The discipline that makes this work

Three rules we enforce:

  1. No agent runs at Tier 1 unless it has shipped 1,000 successful executions at Tier 2 first. Promotion to autonomous is earned.
  2. Every agent has a kill switch. When an agent's accuracy drops below 99% on validation samples, it auto-demotes to Tier 2.
  3. The Filing Agent is permanently Tier 3. Even when it's been faultless for a year. There is no configuration that allows the Filing Agent to fire without a human approval, ever.

Rule 3 is non-negotiable because filing is irreversible (Factor 1), creates statutory exposure (Factor 2), is government-facing (Factor 5), and a single bad filing can torch the firm's audit standing.

How ThynkTax implements this

  • Every agent declares its default tier in its DeerFlow manifest; tenants can downgrade but not upgrade.
  • Tier 2/3 agents pause at a structured HITL gate and surface a single approve / reject / edit screen.
  • Filing Agent is hard-coded Tier 3; the platform refuses configuration that would auto-fire it.
  • The Agent Execution Log is immutable, signed, and exportable for ICAI peer-review.

The framework above isn't a marketing position; it's the operating discipline that makes AI in a CA practice survivable. Run it strictly. Then watch productivity climb.

  • Reviewed by CA Vikram Shah, Head of Tax Research
Tags
agentsriskopinionpractice-management
Got this in your inbox?

Subscribe to weekly briefings - same depth, your inbox.