Methodology
Reproducible scoring. Auditable evidence. No vibes.
A GEO score is only useful if you can explain it to your CFO. Every Enso audit returns quantified per-dimension scores backed by the exact prompts, the exact engine responses, and the exact citations the AI relied on. This page documents the full pipeline so your team — and your skeptics — can reason about it.
Four pillars
What separates a real GEO score from a prompt wrapper.
Pillar 01
Dual-engine consensus
Every prompt suite runs against GPT-4 class AND Gemini 2.5 Pro. We score each independently, then compute a confidence-weighted consensus. Disagreement isn't averaged away — it's surfaced as a Consistency penalty so you see it.
Pillar 02
Live web grounding
We force grounding via Google Search and Brave Search LLM context on every run. Pure-parametric model knowledge would lock us to a stale snapshot of the world. Grounding means the score reflects what an AI buyer would actually see today.
Pillar 03
Category-normalized scoring
A 72 in B2B hardware is not a 72 in CPG. Every dimension is normalized against rough category baselines so a hardware brand isn't penalized against a DTC food brand on Awareness. Gap-vs-norm is plotted alongside raw score on every chart.
Pillar 04
Cynicism guardrails
Models love hedges. Our system prompt explicitly forbids 'appears', 'seems', 'may', 'trends suggest' — and the post-processing layer downgrades any response that smuggles them in. The output you see is what survived two layers of de-hedging.
Per-dimension rubric
How each of the five scores is computed.
Weights sum to 1.00 for the Overall consensus. Per-dimension scores are reported on the 0-100 scale that’s shown across the dashboard.
Awareness
Formula: % of unbranded category prompts where the brand surfaces in the first generated response
12 unbranded prompt variants per category (e.g. 'best AI inference startups for transformer workloads'). Score is the inclusion rate across both engines.
Authority
Formula: Weighted blend of citation density, primary-source share, and absence-of-hedging
Citation density = grounded sources cited per 100 tokens. Primary-source share = % of citations from the brand's own domain or peer-reviewed venues. Hedging penalty subtracts up to 15 pts.
Sentiment
Formula: Polarity score on brand-descriptive language, normalized to category baseline
Each engine response is segmented and polarity-scored with a domain-tuned classifier. Category-baselined: a tech brand isn't penalized against a luxury brand for clinical language.
Consistency
Formula: Cross-engine agreement on category, positioning, key claims, and competitor set
Pairwise Jaccard similarity on extracted entity sets between engines, plus claim-level NLI agreement. Disputed claims are flagged in the report and lower the score linearly.
Defensibility
Formula: Composite of competitor-pressure, supply-chain, and platform-lock-in signals
Searches the prompt suite for risk language ('depends on', 'requires', 'fragile to', 'commoditizing'), surfaces named threat actors, and weights by the source quality of the grounding.
Consensus formula
overall = Σ ( per_dim_score × per_dim_weight )
─────────────────────────────────
Σ per_dim_weight
confidence = 1 − ( cross_engine_disagreement_rate × 0.6
+ missing_grounding_rate × 0.4 )Confidence is reported alongside every Overall score so a 72 with 0.91 confidence reads differently from a 72 with 0.58 confidence.
Sources & limits
What we measure. What we don’t. What’s next.
Engines we query
- OpenAI GPT-4 class (Chat Completions API)
- Google Gemini 2.5 Pro with grounded web search
- Brave Search LLM Context as third grounding signal
Engines on roadmap
- Anthropic Claude — Q3 2026
- Perplexity Sonar — Q3 2026
- DeepSeek and Mistral — Q4 2026
Limitations we are honest about
- Per-engine rate limits cap real-time refresh to about 1 audit per minute per brand
- Sentiment classifiers are tuned to English; non-English categories ship with a 0.8 confidence cap
- Defensibility uses heuristics over the prompt suite — it is directional, not predictive
Reproducibility
- Every audit stores prompts, engine responses, and grounding sources
- Re-runs within the same hour are deduplicated
- Methodology changes are versioned and called out in the changelog
Have a methodology question? Ask a founder directly.
See the methodology applied to your brand.
Run a free audit and inspect every score with its underlying prompts and citations.