How to Brief a Custom AI Tool Build So It Ships What You Actually Need

Most builds go wrong before the first line of code. The brief arrives as a paragraph describing a desired outcome, no input format, no output format, no definition of done. The agency nods through discovery and charges you to figure it out later. What you actually need is a document that states the job, the inputs, the output format, and the acceptance criteria. Without it, there is no finish line, and the agency knows it.

Why Most AI Briefs Fail Before the Build Starts

The most common mistake is scoping by capability instead of by job. A client says: “I want it to handle customer service, draft quotes, and flag overdue invoices.” That is three separate tools described in one sentence. Three months and £14,000 later, none of the three things work reliably, because no single model was trained, prompted, and tested against a clear, singular task.

The RAND Corporation found that 73% of failed AI projects had no agreed definition of success before the project started. That is not a technology problem. It is a brief problem.

The one-sentence test

Before any scoping meeting, write the tool’s job in one sentence: subject, verb, object. “The tool reads inbound lead emails and outputs a CRM entry with company name, job title, and deal stage.” That is a buildable brief. “The tool improves our sales process” is not.

If you cannot write the sentence in under 20 words, the project is not ready to scope.

The Four Things a Brief Must Contain

A brief that gets built correctly contains four elements. Everything else, timelines, tech stack, integrations, is secondary to these.

1. The job, in one sentence

State what the tool does, not what problem it solves. “Solves our document chaos” is a problem statement. “Extracts vendor name, invoice number, amount, and due date from uploaded PDF invoices and posts them to Xero” is a job statement. The difference determines whether a developer can write a test for it.

2. The exact inputs

Define what data the tool receives, in what format, from what source. “It gets the invoices” is not a definition. “It receives PDF files uploaded via a web form, maximum 5MB, one invoice per file, in English” is. Input definitions prevent the most common production failure: the automation works perfectly with the demo data and breaks the moment a supplier sends a scanned JPG instead of a native PDF.

For each input, document: the data type, the format, the source, and any known edge cases. If your CRM exports contacts as a CSV with inconsistent phone number formatting, that is not the developer’s problem to discover in week three, it is a fact the brief must capture.

3. The exact output

Define what the tool returns and how correctness is measured. “It drafts a reply” tells a developer nothing. “It returns a plain-text email draft of 80–120 words, addressed by first name, referencing the original enquiry subject, with a proposed meeting time in the recipient’s timezone” is a testable output.

Every output needs a success criterion. Without one, the question “is this working?” has no answer; and the agency will always say yes.

4. The one human checkpoint

Identify the single point where a human reviews or approves before action is taken. This is not optional, it is what separates an AI tool that operates with accountability from one that acts without it. It also defines the blast radius if something goes wrong. An AI that drafts a reply for human review before sending has a bounded failure mode. An AI that sends autonomously does not.

What Happens When You Skip Each Element

This is not theoretical. These are the failure patterns that appear in production builds where one element was missing from the brief.

No job sentence: The scope grows throughout the build. Features are added because “while we’re in there.” Delivery takes twice as long. The final tool does six things poorly instead of one thing reliably.

No input definition: The tool works in the demo, which used clean, curated data prepared by the agency. In production, real data arrives in inconsistent formats. The automation errors silently or returns garbage. The agency cites “out-of-scope data formats” and quotes a change request.

No output definition: There is no agreed standard to QA against. The agency ships something that technically produces output. The client says it is wrong. The agency says it is right. Nobody can resolve the dispute because there was never a spec.

No human checkpoint: A failure in the model, a hallucinated figure, a misclassified category, propagates into live systems before anyone notices. The cost is not just fixing the AI. It is cleaning the downstream data it corrupted.

How to Map Dependencies Before the Build Starts

A brief without a dependency map is a brief that will generate change requests. Every custom AI tool depends on upstream inputs it does not control. Before signing any build contract, document every dependency, and decide who owns each one.

Run through these five questions for any proposed tool:

What data sources does it read from? List every system, file format, API, and export. For each, ask: who controls it, how often does its format change, and what happens to the tool if it changes?
What human decisions does the current workflow include? AI replaces defined, repeatable steps, not judgment calls that vary by person or context. Every undocumented human decision in the existing workflow is a gap the tool will expose.
What third-party systems does it connect to? List every API, database, or SaaS platform the tool writes to or reads from. For each, confirm: API rate limits, authentication method, and what the fallback is if the connection drops.
What triggers the automation? A file upload, a webhook, a scheduled cron job, a human clicking a button? The trigger defines the tool’s operating context. A tool triggered by a human action has different failure modes than one that runs on a schedule.
Who owns each dependency? For every item on the list above, assign a named person or team. “The IT department” is not an owner. A dependency without a named owner is a liability that will surface at the worst possible time.

An agency that cannot produce this dependency map as a client deliverable, before the build starts, is an agency that will charge you for fixing the things they did not document.

What Good AI Scoping Looks Like in Practice

A logistics company needed a tool to classify inbound freight enquiries and route them to the correct operations team, three teams, each handling a different cargo type. The brief they submitted to us contained the following:

Job sentence: “The tool reads inbound enquiry emails and returns a routing decision: bulk, refrigerated, or hazardous, with a confidence score.”
Input: Plain-text email body and subject line, sent from a monitored Gmail inbox via API.
Output: A JSON object with three fields, category (string), confidence (float 0–1), and reason (string, max 50 words).
Human checkpoint: Any classification with confidence below 0.75 is held in a review queue before routing.
Dependencies: Gmail API (owned: ops manager), internal routing rules document (owned: head of ops, reviewed quarterly), Slack routing webhook (owned: IT).

That brief took two hours to write. The build took three weeks. It has been in production for seven months without a single out-of-scope support request.

Compare that to the brief we most commonly receive: a paragraph describing a desired outcome, no input format, no output format, no success criterion. Those projects generate four rounds of revision and a final deliverable that satisfies neither party.

What to Ask Before You Sign Anything

Before committing to a custom AI build with any agency, including ours, ask these questions and require written answers:

Can you show me the job definition in one sentence?
What is the exact input schema?
What is the exact output format and success criterion?
What is the dependency map and who owns each item?
What is the human review checkpoint?
How will we QA the output before go-live?
What is out of scope, explicitly?

An agency that deflects these questions with “we’ll figure it out in discovery” is an agency that wants flexibility to charge for the things that were never defined. Discovery should answer these questions; not defer them.

See how we scope and build this at designodin.com/ai.

Frequently Asked Questions

What is a custom AI tool brief and what should it contain?

A custom AI tool brief is a document that defines the tool’s job in one sentence, the exact inputs it receives, the exact output it returns with a measurable success criterion, and the human checkpoint before any automated action. Without all four elements, there is no way to test whether the build is finished, or correct.

How detailed does the input definition need to be?

Precise enough that a developer could write a validation function for it. That means: data type (string, number, PDF, CSV), format (specific fields, column names, character limits), source (which system sends it), and known edge cases (e.g., some suppliers send scanned JPGs instead of native PDFs). Vague inputs produce vague builds.

What is a human checkpoint in an AI automation and why is it required?

A human checkpoint is a named step where a person reviews the AI’s output before it takes an action, sends an email, updates a record, routes a request. It defines the failure boundary. If the AI produces a wrong output, the checkpoint prevents that error from propagating downstream. It also creates an accountability structure: someone is responsible for what gets actioned, not just what the AI suggested.

How do I know if my scope is too broad?

If you cannot write the tool’s job in one sentence with a subject, verb, and object, or if the sentence contains “and”, the scope is too broad. A tool that “handles customer service and drafts quotes” is two tools. Build them sequentially, not simultaneously.

What happens if the job definition changes mid-build?

It is not a problem as long as the change is documented as a scope revision and repriced accordingly. The problem is when scope changes are treated as corrections rather than additions, which only happens when the original brief was vague enough to support competing interpretations. A precise brief reduces the ambiguity that lets scope creep in.

Can I write a brief myself without a technical background?

Yes. The job sentence, the output format, and the human checkpoint require no technical knowledge, they require knowing your own workflow. The input definition benefits from a conversation with whoever manages the data sources, but you do not need to understand APIs to document what format your CRM exports data in. A good agency will help you complete the technical sections during discovery; the business logic sections are yours to provide.

How long should it take to write a proper brief?

Two to four hours for a focused tool with a single job. If it takes longer, the workflow is more complex than the tool can handle, and that is important to know before the build starts, not after.

A well-scoped brief costs nothing to write and cuts out the most common failure modes in a custom AI build. We will not quote a build until the brief meets all four criteria above. If you want to talk through what this looks like for your operation, start a conversation.