The stack
The problem
Most target account lists are a lie the team tells itself. You filter Apollo for 'SaaS, 200-2,000 employees, US' and call the resulting 4,000 rows your ICP. But a headcount-and-industry filter cannot see the things that actually disqualify an account: the wrong tech stack, no relevant team to sell to, a recent acquisition that froze all budgets, or that they are already a happy customer of your biggest competitor. Reps then spend weeks working accounts that were never qualified, and everyone wonders why connect rates are flat.
The deeper problem is that real qualification does not scale by hand. A sharp RevOps person can eyeball thirty accounts and rank them with good judgment, reading the homepage, checking the team page, noticing the funding timing. They cannot do that for three thousand. So teams retreat to crude filters and the nuance, which is exactly where the signal lives, gets thrown away. The list ends up treating a perfect-fit account and a marginal one identically, which guarantees you spread effort evenly across good and bad instead of concentrating it on the best.
Clay collapses the whole pipeline into one table: source the accounts, enrich them with signals a model can actually reason over (tech stack, department headcount, growth, the company's own description), then run an AI column that applies a written rubric to every row and returns a score, a one-line reason, and the single biggest risk. Because the rubric is written down, the scoring is consistent across thousands of rows and auditable when a rep pushes back. And when your win data later tells you what actually predicts a closed deal, you edit the rubric text and re-run; you are not at the mercy of a black box.
The opinion that matters here: the score is worthless if you cannot see why. A number with no reason is a number reps will ignore the first time it is wrong. Build the rubric to always explain itself, and calibrate it against accounts you already know the answer to, or you have built a very expensive random-number generator.
How it works
- Write the ICP as an explicit scoring rubric in plain language, anchored in your actual win/loss data
- Source a raw account list into a Clay table from Apollo, a CSV, or Clay's built-in company finder
- Enrich each account with only the firmographic and signal fields your rubric references
- Run a Clay AI column that applies the rubric and returns score, tier, reason, and biggest risk as JSON
- Calibrate against known good and bad accounts, tune the rubric, then score the full table
- Push tier A and B accounts to the CRM with the score and reason in custom properties
See it run
The playbook
Write the ICP as a scoring rubric a stranger could apply
Before you open Clay, write your ICP as a rubric so explicit that a new hire could apply it identically to yours. List every attribute that matters, mark each as either a hard disqualifier or a weighted plus, and define what 'great' versus 'okay' looks like with numbers. The single biggest mistake here is writing the rubric from a wishlist of dream customers instead of from reality. Pull it from your last twenty closed-won and twenty closed-lost deals and ask what was actually true of the winners.
Make every line scorable. 'Fast-growing' is useless because no one can apply it consistently. 'Headcount grew 20% or more in the last 12 months' is a thing the data can answer. 'Has a real ops team' becomes 'has 5 or more people with Operations in their title.' If you cannot imagine the enrichment data that would answer a criterion, the criterion is too vague to keep.
Have Claude pressure-test the rubric before you trust it. Paste your draft plus a short description of three recent wins and three recent losses and ask it to predict the scores; where its prediction disagrees with reality, your rubric weighting is probably wrong, not Claude.
Score 0-100.
HARD DISQUALIFIERS (score 0 if ANY are true):
- Fewer than {{MIN_EMPLOYEES, e.g. 50}} employees
- Operates only in {{EXCLUDED_INDUSTRY, e.g. pure government}}
- No {{REQUIRED_TEAM, e.g. Operations}} function detectable
OTHERWISE, weight as follows:
- Industry fit (0-25): best = {{TARGET_INDUSTRIES}}; partial = adjacent
- Size fit (0-20): best = {{HEADCOUNT_RANGE, e.g. 200-2,000}}
- Tech stack signal (0-20): +full if they use {{COMPLEMENTARY_TOOLS}}; 0 if they use {{COMPETITOR_TOOL}}
- Growth signal (0-15): +full if headcount up {{X}}%+ in 12mo OR funding in last 18mo
- Relevant team exists (0-20): +full if {{DEPARTMENT}} team has {{SIZE}}+ people
TipWrite a one-line 'why we lose' note next to each disqualifier. When a rep argues with a low score later, you want the institutional reason on record, not a re-litigation of the whole ICP.
Source the raw account list into a Clay table
In Clay, create a workbook and add a new table. You have three sourcing routes. First, import a CSV of companies you already have (top-right Import, then map the company-name and domain columns). Second, use Clay's built-in 'Find Companies' source, which is powered by underlying data providers, and set your firmographic filters there. Third, connect your Apollo account under integrations and pull a saved Apollo company search directly into the table.
Start with a few hundred rows, not the full universe. You are validating the workflow and calibrating the rubric first; scoring 4,000 rows before you know the rubric works just burns enrichment credits on garbage. Once the rubric is calibrated you scale the source.
Keep the source filters loose on the soft attributes (industry-adjacent, broad size band) and tight only on the hard disqualifiers you are confident about. You want the AI rubric to do the nuanced ranking. If you pre-filter aggressively at the source, you will never get to see scored the marginal accounts that sometimes turn out to be your best deals.
TipMake sure every row has a clean company domain, not just a name. Domain is the join key almost every Clay enrichment relies on; a name-only row will fail to enrich and silently score low for the wrong reason.
Enrich each account with only the signals your rubric uses
Add Clay enrichment columns so each row carries exactly the facts the rubric needs and nothing more. Click the '+' to add a column, choose 'Enrich Data,' and pick the enrichment: employee count, employee count for a specific department (Clay can pull headcount for a named function), industry and sub-industry, technologies used, most recent funding, and a company description. Each enrichment is its own column that calls a data provider and costs credits, so add only what a rubric line actually references.
Add one column that fetches the company's homepage or about-page text. This is the highest-impact enrichment for AI scoring: giving the model the company's own words about itself improves accuracy far more than firmographics alone, because positioning text reveals what they sell, who to, and how they think, which a NAICS code never will.
Use Clay's waterfall pattern for anything that fails often, like department headcount or tech stack: stack two or three providers in priority order so a miss from the first falls through to the second. Expect meaningful coverage gaps regardless; on a cold list, do not be surprised if a fifth to a third of rows are missing tech-stack data, and write the rubric so a missing signal scores neutral rather than disqualifying.
- Employee count, total
- Employee count for the specific department you sell to
- Industry and sub-industry
- Technologies / tech stack (waterfall two providers)
- Most recent funding or a headcount-growth signal
- Homepage or about-page text (the single most valuable input)
TipRun enrichment on 20 rows first and read the columns. If 'department headcount' is blank on rows you know have big teams, the enrichment is failing and your scores will be wrong for a data reason, not a rubric reason. Fix the data before you blame the rubric.
Run the AI scoring column
Add a new column, choose the AI / 'Use AI' option, and select a capable model (Clay lets you pick, including Claude models). In the prompt editor, reference your enrichment columns inline by typing '/' to insert a column token, and paste your rubric from step one. Demand strictly structured JSON output: a numeric score, a tier, a one-line reason, and the single biggest risk. Structured output is what lets you sort, filter, and report cleanly downstream; free-text scores are unsortable.
Run it on 20 rows first using Clay's option to run a column on selected rows only. Then read the reasons, not just the scores. If the model is being generous, handing out 70s to mediocre accounts, tighten the rubric wording (often the size and team-existence weights are too loose). If it is missing an obvious disqualifier, your disqualifier line is ambiguous. Iterate on the prompt, not on the data.
Only after the 20-row sample looks right do you run the full table. Set the column to run on all rows; for large tables this processes in the background and you can watch it fill in.
You are scoring an account against our ICP rubric. Use ONLY the data provided below. If a field is blank, treat that signal as neutral (do not penalize for missing data unless the rubric says a hard disqualifier requires it).
RUBRIC:
{{PASTE_YOUR_RUBRIC_FROM_STEP_1}}
ACCOUNT DATA:
Name: /Company Name
Employees: /Employee Count
{{Department}} headcount: /Dept Headcount
Industry: /Industry
Tech stack: /Technologies
Funding/growth: /Funding
About (their own words): /Homepage Text
Return JSON only, no preamble:
{"score": <0-100 integer>, "tier": "A|B|C|Disqualified", "reason": "<one sentence citing the specific facts that drove the score>", "biggest_risk": "<one phrase: the thing most likely to make this a bad deal>"}
Tier mapping: A = 80+, B = 60-79, C = 40-59, Disqualified = below 40 OR any hard disqualifier triggered.
TipAdd a separate, cheap formula or AI column that just outputs the matched disqualifier name when one fires. When a rep asks why a juicy-looking logo scored zero, you can point at 'already uses CompetitorX' instead of shrugging.
Calibrate against accounts where you already know the answer
This is the step teams skip and the one that decides whether the sales team trusts the list. Drop five of your best existing customers and five known bad-fit accounts you have lost or churned into the table and let the rubric score them. Your best customers should land in tier A. The bad fits should be C or Disqualified. If a beloved customer scores a 45, your rubric is wrong, not the AI, so go fix the weights and re-run.
Calibration also catches data problems masquerading as scoring problems. If a known-great account scores low, check its enrichment row first: often the tech-stack or headcount enrichment simply failed and the model scored on missing data. Fixing that one row's data is a different action than rewriting the rubric, and conflating the two is how people give up on the whole approach.
Iterate until the known accounts land where they should. It usually takes two or three rubric edits. Once they do, you have a scoring system with earned trust, which is the only kind reps will actually use.
Push tier A and B accounts to the CRM
Filter the table to tier A and B (use Clay's filter on the tier column). Add a column that writes to your CRM: Clay has native HubSpot and Salesforce integrations under 'Add Action.' Map the account fields plus the score, tier, reason, and biggest_risk into custom properties on the company record. Now reps see not just that an account was prioritized but precisely why, and RevOps can report coverage by tier.
Write the rubric version number into a CRM field too, for example 'icp_rubric_v3.' When you tune the rubric and re-score, you can tell old scores from new ones and avoid the confusion of comparing accounts graded on different curves.
Set the Clay table to refresh on a schedule (table settings, then auto-refresh) so new accounts matching your source filters get enriched and scored automatically and flow into the CRM over time. The list becomes a living system rather than a one-time export that is stale in a month.
TipDo not sync tier C and Disqualified accounts to the CRM as active targets, but do keep them in the Clay table. They are your calibration set and your record of what you deliberately chose not to work, which matters in territory disputes.
What you get
Each account row gets a structured score, tier, reason, and risk that syncs to the CRM as custom properties.
Company: Northbeam Logistics (northbeamlogistics.com)
{
"score": 84,
"tier": "A",
"reason": "450-person logistics firm with a detectable 32-person operations team, raised a Series B 9 months ago, and homepage emphasizes multi-region expansion, strong fit for our planning product.",
"biggest_risk": "may already run an incumbent TMS; confirm on first call"
}
Company: TinyCraft Studio (tinycraft.design)
{
"score": 22,
"tier": "Disqualified",
"reason": "8-person design agency, below the 50-employee floor and no detectable operations function.",
"biggest_risk": "too small to have budget or the problem we solve"
}
Company: Meridian Freight (meridianfreight.com)
{
"score": 71,
"tier": "B",
"reason": "620-person freight company in target industry with a sizable ops team, but no growth or funding signal in the last 18 months suggests flatter budgets.",
"biggest_risk": "no recent funding, budget cycle may be slow"
}
Pitfalls to avoid
Rubric from a wishlist, not dataIf your ICP is aspirational rather than drawn from actual wins and losses, the AI will faithfully and consistently score against the wrong target. Anchor every weight in what was true of your last forty closed deals, not who you wish would buy.
Over-enriching to score one fieldEnriching every available data point burns credits fast and adds noise the rubric does not use. Add an enrichment column only when a specific rubric line references it; delete columns the rubric ignores.
No calibration stepWithout dropping in known-good and known-bad accounts to test against, you cannot tell whether the scores mean anything at all. Always calibrate on accounts where you know the answer before scaling to thousands.
Missing data scored as a disqualifierA blank tech-stack column from a failed enrichment should score neutral, not zero. If you do not handle missing data explicitly in the prompt, you will quietly disqualify good accounts for a data-coverage reason and never notice.
Treating the score as a verdictAn AI score is a prioritization aid, not a final judgment. Reps should glance at the reason and the biggest risk before committing weeks of effort, and a strong human signal should always override a mediocre score.