What does it cost to run?

LLM API costs are trivial — under $0.01 per lead scored. For a business with 1,000 leads/month, that's $10/month in API fees. The real cost is the build time, which is 2-3 weeks depending on data cleanliness.

Does it work with HubSpot / Salesforce / Pipedrive?

Yes — the model runs as a middleware between your lead source and your CRM. Any CRM with webhook/API access works. We push the score as a custom field so all your existing workflows (auto-routing, task creation, notifications) keep working.

How to build a lead scoring model with LLMs (that actually beats your rules engine)

Every CRM sells 'lead scoring.' Almost nobody uses it because rule-based scoring is garbage. Here's the hybrid LLM + rules approach that actually works.

Why rule-based scoring fails

HubSpot lets you build scoring rules like '+10 for VP title, +5 for company size >100, -20 for Gmail address.' These rules are easy to write and sound reasonable. They're also almost useless in practice.

Here's why: the signals that actually predict purchase intent are buried in the content of what leads write, not the fields they fill in. 'I'm evaluating 3 vendors and we need to decide by end of month' is a HOT lead even if the title is missing. 'Just researching' from a VP of a Fortune 500 is a COLD lead. Rules can't read that difference. LLMs can.

The hybrid architecture

Raw lead data from form/CRM

Fields + freeform text

Rules engine: objective signals

Company size, title, domain type, geographic fit

LLM: intent + urgency from text

Classifies language for buying signals

LLM: fit scoring against ICP

Does this look like our best customers?

Combined score + confidence

Rules 40% + LLM 60% weighted

Route: HOT (sales) / WARM (nurture) / COLD (drip)

Confidence gates escalation

The prompt that does the heavy lifting

LLM lead scoring prompttext

You are scoring inbound leads for [COMPANY]. Analyze the
lead data below and return JSON with three scores:

1. INTENT (0-100): how actively are they looking to buy?
2. FIT (0-100): how well do they match our ICP?
3. URGENCY (0-100): how time-sensitive is their need?

ICP EXAMPLES:
- Best customers: [3 examples of your ideal customers]
- Bad-fit customers: [2 examples of customers who churned]

LEAD DATA:
{lead_object}

BUYING SIGNALS TO LOOK FOR:
- Specific timelines ("need by Q2")
- Budget mentions ("we have $X allocated")
- Competitor evaluation ("comparing you to X")
- Pain language ("our current tool is broken")

OUTPUT JSON:
{
  "intent": 0-100,
  "fit": 0-100,
  "urgency": 0-100,
  "reasoning": "1-sentence explanation",
  "buying_signals": ["list of quotes from the lead"],
  "red_flags": ["list of concerns"],
  "recommended_action": "route_to_sales|nurture_sequence|disqualify"
}

Two things make this prompt work in production: the ICP examples (few-shot learning ties the model to your specific customer patterns) and the quote extraction (forces the model to cite actual lead text instead of hallucinating reasoning).

Validation is the step everyone skips

A lead scoring model is only as good as the validation data behind it. Here's the 3-step validation we run for every deployment:

Backtest on 200 historical leads with known outcomes

Pull leads from the last 6 months where you know whether they closed or not. Run the scoring model on each. Compare predicted score to actual outcome. You want >85% correlation between HOT predictions and actual closes.

Dual-track for the first month

Keep your old scoring alongside the new model. Every lead gets both scores. Compare weekly. Catch cases where they disagree and manually label which was right.

Feed disagreements back into the prompt

The gold in month 1 is the edge cases. Every time your sales team says 'this HOT lead was obviously COLD,' update the prompt with that example. The model learns your nuances fast.

FAQ

In the 5 deployments we've done: 88-92% accuracy at identifying HOT vs not-HOT leads. The gap from 92 to 100 is unavoidable because humans buy for irrational reasons sometimes. But 90%+ is a massive upgrade from 55% rule-based.

Stop routing bad leads to your best reps.

We build hybrid lead scoring models for sales teams. 88%+ accuracy, 2-3 week build, works with any CRM. Book a 30-min call to map it for your pipeline.

Score my pipeline