Which model should I pick for the Clay AI column?

Any capable general model works; Clay lets you choose, including Claude. For scoring against a written rubric you want reasoning quality over speed, so a mid-to-high tier model is worth the few cents per row. Use the same model for calibration and the full run so scores stay comparable.

How many accounts should I score before scaling to the full list?

Start with 20, read the reasons (not just the scores), and tune the rubric. Only once a 20-row sample looks right do you run a few hundred, then the full table. Scoring 4,000 rows before the rubric is calibrated just burns enrichment credits on a rubric you are about to change.

What about accounts with missing enrichment data?

Expect gaps; on a cold list a fifth to a third of rows can be missing tech-stack data. Write the prompt so a blank field scores neutral, never as a disqualifier, or you will quietly kill good accounts for a data-coverage reason. Use Clay's waterfall (stack two or three providers) on the fields that fail most.

Won't AI scoring just bake in my existing biases?

It bakes in whatever your rubric says, which is the point and the risk. That is why the rubric is drawn from real win/loss data, not a wishlist, and why you calibrate against known-good and known-bad accounts. When your win data later contradicts the rubric, you edit the text and re-run. The bias is visible and editable, not hidden in a black box.

Build and AI-Score a Target Account List in Clay

The stack

How the tools connect

Sources Apollo.io Company and contact source data feeding the list

Engine Clay Sources, enriches, and runs the AI scoring column in one table Claude Drafts and critiques the scoring rubric before you trust it

Destination HubSpot Receives the scored, tiered accounts as records

What a run costs

Clay credits: ~3-5 per account enriched
A 400-row list: roughly 1,500-2,000 credits
AI scoring: pennies per row on a mid-tier model
Your time: about half a day to calibrate, minutes per re-run after

The problem

Most target account lists are a lie the team tells itself. You filter Apollo for 'SaaS, 200-2,000 employees, US' and call the resulting 4,000 rows your ICP. But a headcount-and-industry filter cannot see the things that actually disqualify an account: the wrong tech stack, no relevant team to sell to, a recent acquisition that froze all budgets, or that they are already a happy customer of your biggest competitor. Reps then spend weeks working accounts that were never qualified, and everyone wonders why connect rates are flat.

Real qualification also doesn't scale by hand. A sharp RevOps person can eyeball thirty accounts and rank them with good judgment, reading the homepage, checking the team page, noticing the funding timing. They cannot do that for three thousand. So teams retreat to crude filters and the nuance, which is exactly where the signal lives, gets thrown away. The list ends up treating a perfect-fit account and a marginal one identically, which guarantees you spread effort evenly across good and bad instead of concentrating it on the best.

Clay collapses the whole pipeline into one table: source the accounts, enrich them with signals a model can actually reason over (tech stack, department headcount, growth, the company's own description), then run an AI column that applies a written rubric to every row and returns a score, a one-line reason, and the single biggest risk. Because the rubric is written down, the scoring is consistent across thousands of rows and auditable when a rep pushes back. And when your win data later tells you what actually predicts a closed deal, you edit the rubric text and re-run; you are not at the mercy of a black box.

The score is worthless if you cannot see why. A number with no reason is a number reps will ignore the first time it is wrong. Build the rubric to always explain itself, and calibrate it against accounts you already know the answer to, or you have built a very expensive random-number generator.

How it works

The workflow, end to end

01 Write the rubric ClaudeICP as scorable lines, drawn from real win/loss
02 Source accounts Apollopull a few hundred rows into a Clay table
03 Enrich signals Clayonly the fields the rubric references
04 AI score Clay + Claudescore, tier, reason, biggest risk as JSON
05 Calibrate Known accountsdrop in winners and losers, tune the weights
06 Sync tiers HubSpotpush A and B with the score and reason

Write the ICP as an explicit scoring rubric in plain language, anchored in your actual win/loss data
Source a raw account list into a Clay table from Apollo, a CSV, or Clay's built-in company finder
Enrich each account with only the firmographic and signal fields your rubric references
Run a Clay AI column that applies the rubric and returns score, tier, reason, and biggest risk as JSON
Calibrate against known good and bad accounts, tune the rubric, then score the full table
Push tier A and B accounts to the CRM with the score and reason in custom properties

The playbook

Write the ICP as a scoring rubric a stranger could apply

Before you open Clay, write your ICP as a rubric so explicit that a new hire could apply it identically to yours. List every attribute that matters, mark each as either a hard disqualifier or a weighted plus, and define what 'great' versus 'okay' looks like with numbers. The single biggest mistake here is writing the rubric from a wishlist of dream customers instead of from reality. Pull it from your last twenty closed-won and twenty closed-lost deals and ask what was actually true of the winners.

Make every line scorable. 'Fast-growing' is useless because no one can apply it consistently. 'Headcount grew 20% or more in the last 12 months' is a thing the data can answer. 'Has a real ops team' becomes 'has 5 or more people with Operations in their title.' If you cannot imagine the enrichment data that would answer a criterion, the criterion is too vague to keep.

Have Claude pressure-test the rubric before you trust it. Paste your draft plus a short description of three recent wins and three recent losses and ask it to predict the scores; where its prediction disagrees with reality, your rubric weighting is probably wrong, not Claude.

ICP rubric draft (this becomes the AI prompt later)

Score 0-100.
HARD DISQUALIFIERS (score 0 if ANY are true):
- Fewer than {{MIN_EMPLOYEES, e.g. 50}} employees
- Operates only in {{EXCLUDED_INDUSTRY, e.g. pure government}}
- No {{REQUIRED_TEAM, e.g. Operations}} function detectable

OTHERWISE, weight as follows:
- Industry fit (0-25): best = {{TARGET_INDUSTRIES}}; partial = adjacent
- Size fit (0-20): best = {{HEADCOUNT_RANGE, e.g. 200-2,000}}
- Tech stack signal (0-20): +full if they use {{COMPLEMENTARY_TOOLS}}; 0 if they use {{COMPETITOR_TOOL}}
- Growth signal (0-15): +full if headcount up {{X}}%+ in 12mo OR funding in last 18mo
- Relevant team exists (0-20): +full if {{DEPARTMENT}} team has {{SIZE}}+ people

💡

TipWrite a one-line 'why we lose' note next to each disqualifier. When a rep argues with a low score later, you want the institutional reason on record, not a re-litigation of the whole ICP.

Source the raw account list into a Clay table

In Clay, create a workbook and add a new table. You have three sourcing routes. First, import a CSV of companies you already have (top-right Import, then map the company-name and domain columns). Second, use Clay's built-in 'Find Companies' source, which is powered by underlying data providers, and set your firmographic filters there. Third, connect your Apollo account under integrations and pull a saved Apollo company search directly into the table.

Start with a few hundred rows, not the full universe. You are validating the workflow and calibrating the rubric first; scoring 4,000 rows before you know the rubric works just burns enrichment credits on garbage. Once the rubric is calibrated you scale the source.

Keep the source filters loose on the soft attributes (industry-adjacent, broad size band) and tight only on the hard disqualifiers you are confident about. You want the AI rubric to do the nuanced ranking. If you pre-filter aggressively at the source, you will never get to see scored the marginal accounts that sometimes turn out to be your best deals.

💡

TipMake sure every row has a clean company domain, not just a name. Domain is the join key almost every Clay enrichment relies on; a name-only row will fail to enrich and silently score low for the wrong reason.

Enrich each account with only the signals your rubric uses

Add Clay enrichment columns so each row carries exactly the facts the rubric needs and nothing more. Click the '+' to add a column, choose 'Enrich Data,' and pick the enrichment: employee count, employee count for a specific department (Clay can pull headcount for a named function), industry and sub-industry, technologies used, most recent funding, and a company description. Each enrichment is its own column that calls a data provider and costs credits, so add only what a rubric line actually references.

Add one column that fetches the company's homepage or about-page text. This is the highest-impact enrichment for AI scoring: giving the model the company's own words about itself improves accuracy far more than firmographics alone, because positioning text reveals what they sell, who to, and how they think, which a NAICS code never will.

Use Clay's waterfall pattern for anything that fails often, like department headcount or tech stack: stack two or three providers in priority order so a miss from the first falls through to the second. Expect meaningful coverage gaps regardless; on a cold list, do not be surprised if a fifth to a third of rows are missing tech-stack data, and write the rubric so a missing signal scores neutral rather than disqualifying.

Employee count, total
Employee count for the specific department you sell to
Industry and sub-industry
Technologies / tech stack (waterfall two providers)
Most recent funding or a headcount-growth signal
Homepage or about-page text (the single most valuable input)

💡

TipRun enrichment on 20 rows first and read the columns. If 'department headcount' is blank on rows you know have big teams, the enrichment is failing and your scores will be wrong for a data reason, not a rubric reason. Fix the data before you blame the rubric.

Run the AI scoring column

Add a new column, choose the AI / 'Use AI' option, and select a capable model (Clay lets you pick, including Claude models). In the prompt editor, reference your enrichment columns inline by typing '/' to insert a column token, and paste your rubric from step one. Demand strictly structured JSON output: a numeric score, a tier, a one-line reason, and the single biggest risk. Structured output is what lets you sort, filter, and report cleanly downstream; free-text scores are unsortable.

Run it on 20 rows first using Clay's option to run a column on selected rows only. Then read the reasons, not just the scores. If the model is being generous, handing out 70s to mediocre accounts, tighten the rubric wording (often the size and team-existence weights are too loose). If it is missing an obvious disqualifier, your disqualifier line is ambiguous. Iterate on the prompt, not on the data.

Only after the 20-row sample looks right do you run the full table. Set the column to run on all rows; for large tables this processes in the background and you can watch it fill in.

Clay AI scoring column prompt

You are scoring an account against our ICP rubric. Use ONLY the data provided below. If a field is blank, treat that signal as neutral (do not penalize for missing data unless the rubric says a hard disqualifier requires it).

RUBRIC:
{{PASTE_YOUR_RUBRIC_FROM_STEP_1}}

ACCOUNT DATA:
Name: /Company Name
Employees: /Employee Count
{{Department}} headcount: /Dept Headcount
Industry: /Industry
Tech stack: /Technologies
Funding/growth: /Funding
About (their own words): /Homepage Text

Return JSON only, no preamble:
{"score": <0-100 integer>, "tier": "A|B|C|Disqualified", "reason": "<one sentence citing the specific facts that drove the score>", "biggest_risk": "<one phrase: the thing most likely to make this a bad deal>"}

Tier mapping: A = 80+, B = 60-79, C = 40-59, Disqualified = below 40 OR any hard disqualifier triggered.

💡

TipAdd a separate, cheap formula or AI column that just outputs the matched disqualifier name when one fires. When a rep asks why a juicy-looking logo scored zero, you can point at 'already uses CompetitorX' instead of shrugging.

Calibrate against accounts where you already know the answer

This is the step teams skip and the one that decides whether the sales team trusts the list. Drop five of your best existing customers and five known bad-fit accounts you have lost or churned into the table and let the rubric score them. Your best customers should land in tier A. The bad fits should be C or Disqualified. If a beloved customer scores a 45, your rubric is wrong, not the AI, so go fix the weights and re-run.

Calibration also catches data problems masquerading as scoring problems. If a known-great account scores low, check its enrichment row first: often the tech-stack or headcount enrichment simply failed and the model scored on missing data. Fixing that one row's data is a different action than rewriting the rubric, and conflating the two is how people give up on the whole approach.

Iterate until the known accounts land where they should. It usually takes two or three rubric edits. Once they do, you have a scoring system with earned trust, which is the only kind reps will actually use.

Push tier A and B accounts to the CRM

Filter the table to tier A and B (use Clay's filter on the tier column). Add a column that writes to your CRM: Clay has native HubSpot and Salesforce integrations under 'Add Action.' Map the account fields plus the score, tier, reason, and biggest_risk into custom properties on the company record. Now reps see why each account ranked where it did, the score and the reason sitting right on the record, and RevOps can report coverage by tier.

Write the rubric version number into a CRM field too, for example 'icp_rubric_v3.' When you tune the rubric and re-score, you can tell old scores from new ones and avoid the confusion of comparing accounts graded on different curves.

Set the Clay table to refresh on a schedule (table settings, then auto-refresh) so new accounts matching your source filters get enriched and scored automatically and flow into the CRM over time. The list becomes a living system rather than a one-time export that is stale in a month.

💡

TipDo not sync tier C and Disqualified accounts to the CRM as active targets, but do keep them in the Clay table. They are your calibration set and your record of what you deliberately chose not to work, which matters in territory disputes.

Inside the prompt

The scoring prompt is short, but every line is there for a reason. Here is what each one is doing and why.

Why each line is in the scoring prompt

Use ONLY this data"Use ONLY the data provided below": Stops the model inventing facts about the account. No data, no guess.
Blank = neutral"If a field is blank, treat that signal as neutral": A failed enrichment should not disqualify a good account. This one line prevents most false zeros.
Paste the rubricRUBRIC: {{your rubric}}: The scoring logic lives in text you control, not a black box. Edit it and re-run when win data teaches you something.
JSON only{score, tier, reason, biggest_risk}: Structured output is what makes the column sortable and reportable. Free text is unusable downstream.
Force a reason"cite the specific facts that drove the score": A score with no reason gets ignored the first time it is wrong. This makes every score defend itself.

What you get

Each account row gets a structured score, tier, reason, and risk that syncs to the CRM as custom properties.

Example output

Company: Northbeam Logistics (northbeamlogistics.com)
{
  "score": 84,
  "tier": "A",
  "reason": "450-person logistics firm with a detectable 32-person operations team, raised a Series B 9 months ago, and homepage emphasizes multi-region expansion, strong fit for our planning product.",
  "biggest_risk": "may already run an incumbent TMS; confirm on first call"
}

Company: TinyCraft Studio (tinycraft.design)
{
  "score": 22,
  "tier": "Disqualified",
  "reason": "8-person design agency, below the 50-employee floor and no detectable operations function.",
  "biggest_risk": "too small to have budget or the problem we solve"
}

Company: Meridian Freight (meridianfreight.com)
{
  "score": 71,
  "tier": "B",
  "reason": "620-person freight company in target industry with a sizable ops team, but no growth or funding signal in the last 18 months suggests flatter budgets.",
  "biggest_risk": "no recent funding, budget cycle may be slow"
}

Anatomy of one scored row

Northbeam Logistics · northbeamlogistics.com

score84: 0-100 and sortable, so reps work the top of the list first instead of eyeballing it.
tier"A": Bucketed from the score (A = 80+). This is the field you filter the CRM view on.
reason450-person logistics firm, 32-person ops team, recent Series B, multi-region push: The one line that earns rep trust. A score with no reason gets ignored the first time it is wrong.
biggest_riskmay already run an incumbent TMS: Hands the rep their first discovery question before they dial.

What a scored list usually looks like // 400 accounts sourced

A 8% work these first

B 22% sequence, lighter touch

C 40% nurture, do not dial

Disqualified 30% kept for calibration

Rough shape of a cold list. The point: a third disqualifies itself and the As are rare, which is the whole reason to score before anyone dials.

Pitfalls to avoid

⚠️

Rubric from a wishlist, not dataIf your ICP is aspirational rather than drawn from actual wins and losses, the AI will faithfully and consistently score against the wrong target. Anchor every weight in what was true of your last forty closed deals, not who you wish would buy.

⚠️

Over-enriching to score one fieldEnriching every available data point burns credits fast and adds noise the rubric does not use. Add an enrichment column only when a specific rubric line references it; delete columns the rubric ignores.

⚠️

No calibration stepWithout dropping in known-good and known-bad accounts to test against, you cannot tell whether the scores mean anything at all. Always calibrate on accounts where you know the answer before scaling to thousands.

⚠️

Missing data scored as a disqualifierA blank tech-stack column from a failed enrichment should score neutral, not zero. If you do not handle missing data explicitly in the prompt, you will quietly disqualify good accounts for a data-coverage reason and never notice.

⚠️

Treating the score as a verdictAn AI score is a prioritization aid, not a final judgment. Reps should glance at the reason and the biggest risk before committing weeks of effort, and a strong human signal should always override a mediocre score.

FAQ

Questions people ask

Which model should I pick for the Clay AI column?: Any capable general model works; Clay lets you choose, including Claude. For scoring against a written rubric you want reasoning quality over speed, so a mid-to-high tier model is worth the few cents per row. Use the same model for calibration and the full run so scores stay comparable.
How many accounts should I score before scaling to the full list?: Start with 20, read the reasons (not just the scores), and tune the rubric. Only once a 20-row sample looks right do you run a few hundred, then the full table. Scoring 4,000 rows before the rubric is calibrated just burns enrichment credits on a rubric you are about to change.
What about accounts with missing enrichment data?: Expect gaps; on a cold list a fifth to a third of rows can be missing tech-stack data. Write the prompt so a blank field scores neutral, never as a disqualifier, or you will quietly kill good accounts for a data-coverage reason. Use Clay's waterfall (stack two or three providers) on the fields that fail most.
Won't AI scoring just bake in my existing biases?: It bakes in whatever your rubric says, which is the point and the risk. That is why the rubric is drawn from real win/loss data, not a wishlist, and why you calibrate against known-good and known-bad accounts. When your win data later contradicts the rubric, you edit the text and re-run. The bias is visible and editable, not hidden in a black box.

Build and AI-Score a Target Account List in Clay

The stack

The problem

How it works

The playbook

Write the ICP as a scoring rubric a stranger could apply

Source the raw account list into a Clay table

Enrich each account with only the signals your rubric uses

Run the AI scoring column

Calibrate against accounts where you already know the answer

Push tier A and B accounts to the CRM

Inside the prompt

What you get

Pitfalls to avoid

Questions people ask

Want playbooks like this in your inbox?

Related use cases

Turn Any Company URL Into a First-Call Account Brief

Research-Based Cold Email Personalization at Scale

Enrich and Route Inbound Leads Automatically