Every CRM sells 'lead scoring.' Almost nobody uses it because rule-based scoring is garbage. Here's the hybrid LLM + rules approach that actually works.
Why rule-based scoring fails
HubSpot lets you build scoring rules like '+10 for VP title, +5 for company size >100, -20 for Gmail address.' These rules are easy to write and sound reasonable. They're also almost useless in practice.
Here's why: the signals that actually predict purchase intent are buried in the content of what leads write, not the fields they fill in. 'I'm evaluating 3 vendors and we need to decide by end of month' is a HOT lead even if the title is missing. 'Just researching' from a VP of a Fortune 500 is a COLD lead. Rules can't read that difference. LLMs can.
The hybrid architecture
The prompt that does the heavy lifting
You are scoring inbound leads for [COMPANY]. Analyze the
lead data below and return JSON with three scores:
1. INTENT (0-100): how actively are they looking to buy?
2. FIT (0-100): how well do they match our ICP?
3. URGENCY (0-100): how time-sensitive is their need?
ICP EXAMPLES:
- Best customers: [3 examples of your ideal customers]
- Bad-fit customers: [2 examples of customers who churned]
LEAD DATA:
{lead_object}
BUYING SIGNALS TO LOOK FOR:
- Specific timelines ("need by Q2")
- Budget mentions ("we have $X allocated")
- Competitor evaluation ("comparing you to X")
- Pain language ("our current tool is broken")
OUTPUT JSON:
{
"intent": 0-100,
"fit": 0-100,
"urgency": 0-100,
"reasoning": "1-sentence explanation",
"buying_signals": ["list of quotes from the lead"],
"red_flags": ["list of concerns"],
"recommended_action": "route_to_sales|nurture_sequence|disqualify"
}Two things make this prompt work in production: the ICP examples (few-shot learning ties the model to your specific customer patterns) and the quote extraction (forces the model to cite actual lead text instead of hallucinating reasoning).
Validation is the step everyone skips
A lead scoring model is only as good as the validation data behind it. Here's the 3-step validation we run for every deployment:
Backtest on 200 historical leads with known outcomes
Pull leads from the last 6 months where you know whether they closed or not. Run the scoring model on each. Compare predicted score to actual outcome. You want >85% correlation between HOT predictions and actual closes.
Dual-track for the first month
Keep your old scoring alongside the new model. Every lead gets both scores. Compare weekly. Catch cases where they disagree and manually label which was right.
Feed disagreements back into the prompt
The gold in month 1 is the edge cases. Every time your sales team says 'this HOT lead was obviously COLD,' update the prompt with that example. The model learns your nuances fast.
FAQ
Stop routing bad leads to your best reps.
We build hybrid lead scoring models for sales teams. 88%+ accuracy, 2-3 week build, works with any CRM. Book a 30-min call to map it for your pipeline.



