Through August and September 2025 we ran a structured audit on 60 Indian B2B websites — 18 SaaS companies, 14 services firms, 12 manufacturing/industrial sites, 9 BFSI/fintech, 7 logistics. For each, we ran the same 35-query probe across Perplexity, ChatGPT (web search), and Google AI Overviews.
22 of 60 sites earned at least one AI engine citation; 8 of 60 earned citations on more than 25% of probe queries. This post is the four patterns the cited eight share — and the one finding that surprised us most.
60
Indian B2B sites audited (Aug-Sep 2025)
22 / 60
Earned at least one AI citation
8 / 60
Cited on >25% of probe queries
35
Probe queries per site, 3 surfaces
## The answer in 60 words
The 8 cited Indian B2B sites share four patterns: (1) at least one pillar page over 2,500 words with original numbers and a named case study, (2) ≥5 question-format H2s per page, (3) entity-graph schema with sameAs to Wikidata or LinkedIn, (4) consistent month-over-month publishing for at least 6 months. The surprise: backlink count did not predict citation. Substance + structure did.
## Why we ran this audit
Indian B2B SEO advice in 2025 is mostly cargo-cult — "add llms.txt", "use FAQPage", "publish more." We wanted to know what the cited sites actually do. The 60-site sample is not statistically representative of the whole Indian B2B web, but it covers the vertical clusters our agency works in: SaaS, services, manufacturing, BFSI, logistics. The probe set was 35 queries chosen to mirror buyer-journey searches: 12 informational ("what is X"), 13 comparison ("X vs Y"), 10 vendor-shortlist ("best X for [Indian context]").
We crosschecked against
the Princeton GEO research, which found citations + statistics + quotations correlate with +30-40% AI citation lift, and against
Growth Marshal's Sonar ranking-factor breakdown from mid-2025. Our findings are consistent with both, with one Indian-specific wrinkle: the cited sites bias heavily toward content that explains
what changed in Indian regulation (GST, RBI, DPDP) — those queries deliver an outsized share of citations.
## Pattern 1 — A 2,500+ word pillar page with original numbers
All 8 cited sites have at least one pillar page over 2,500 words on a topic central to their business. The 30 sites that earned zero citations have a median post length of 780 words. This is not a length fetish; it is a substance threshold. AI engines extract numbered lists, comparisons, and quoted statistics. Below ~1,500 words, there usually isn't enough of any of those.
The original-numbers piece matters more than the length. Six of 8 cited sites publish at least one piece per quarter with first-party data — a survey, a benchmark, an audit summary, internal usage stats. The two cited sites without first-party data make up for it with extremely well-cited curation (15+ external sources per pillar, each linked).
1
2,500+ word pillar pages
Median pillar length on cited sites: 3,100 words. Median on uncited sites: 780. The threshold is roughly 1,500 words; the sweet spot is 2,500-3,500.
2
5+ question-format H2s
Headers like "How much does X cost in INR?" beat "The cost of X." AI engines extract H2s as candidate Q-nodes during indexing.
3
Entity-graph schema
All 8 cited sites had Organization schema with sameAs to Wikidata, LinkedIn, or Crunchbase. Of the 38 uncited sites, only 4 had any sameAs at all.
4
6+ months of monthly publishing
All 8 cited sites published at least one new piece per month for the last 6 months. Cadence beats one-off campaigns; freshness compounds.
## Pattern 2 — Question-format H2s (the cheapest fix)
Of the 8 cited sites, 7 use H2s formatted as questions on most posts ("How does X work?" / "When should I use Y?"). Of the 30 uncited sites, only 4 do. This is the cheapest pattern to copy: rewrite your existing H2s as questions and you align with how AI engines index candidate Q-nodes.
The Princeton paper documented a +18% citation lift just from rewriting headers as questions. We saw the same on 3 client sites we A/B tested in July 2025 (control: declarative headers; treatment: question headers; ~22% citation lift on the treatment after 21 days).
The pattern is mechanical:
| Declarative header (loses) | Question header (wins) |
| The Cost of n8n Self-Hosted | How much does n8n cost to self-host on Hetzner per month? |
| Migration Strategies | What is the safest MySQL-to-PostgreSQL migration path? |
| Common Mistakes | What are the most common GST 2.0 invoicing mistakes? |
| Pricing Tiers | Which Razorpay pricing tier suits a ₹2 Cr SaaS in India? |
## Pattern 3 — Entity-graph schema with sameAs
All 8 cited sites have Organization schema in the site root with a populated
sameAs array — typically pointing at LinkedIn, X, and (in 5 of 8 cases) Wikidata. None of the uncited sites had sameAs Wikidata; only 4 had any sameAs at all (most pointing at one social profile).
The mechanism: AI engines disambiguate entities ("which Cleartax?", "which Razorpay?") using sameAs as a confident signal. Without it, the engine has to guess from context — and small Indian B2B brands often share name fragments with larger global entities, losing the disambiguation.
We covered the full schema stack in our prior post on
the 5-layer schema stack; the audit confirms that Layer 1 (Organization with sameAs) is the cheapest single fix Indian B2B sites can ship. Estimated time: 60 minutes.
## Pattern 4 — Consistent monthly publishing for 6+ months
All 8 cited sites published at least one new piece per month for the last 6 months. The 30 uncited sites had a median of 0.4 posts per month over the same window — most were dormant or sporadic. AI engines bias toward fresh content;
Frase's GEO research notes content older than 14 days without updates declines ~23% in AI citation frequency.
The cadence number that surprised us: it was not "publish daily." Two of the 8 cited sites published exactly 4 posts a month — a weekly tempo. Three published 8-12 posts a month. None published more than 16. Beyond ~16 posts/month the data went noisy; we suspect editorial quality drops past that threshold.
## The pattern that surprised us — backlinks did not predict citation
We pulled the Ahrefs DR (Domain Rating) for all 60 sites. Median DR for the 8 cited sites: 38. Median DR for the 30 uncited sites: 41.
The cited sites had slightly lower backlink authority than the uncited group.
This contradicts the dominant SEO narrative that backlinks are the moat. For traditional Google search, they probably still are. For AI engines in 2025, structure + substance + freshness appear to outweigh raw backlink count. The 8 cited sites included two relatively young domains (DR 24 and DR 28) that out-cited five DR-60+ sites in the sample.
The honest read: backlinks are not
irrelevant — they probably help with the initial crawl and indexing rate. But they are not predictive of which page wins the AI citation.
Aleyda Solis has been making this argument publicly since early 2025; our 60-site audit is one more data point in support.
If your AI-search strategy is "build more backlinks": our data suggests reallocate. Spend the same hours on a 2,500-word pillar with original numbers and entity-graph schema. The citation lift in 30 days is bigger.
## The 8 cited sites, by vertical
We are not naming the specific sites (the audit was internal), but the vertical breakdown of the 8 cited:
| Vertical | # cited / # audited | Common pattern |
| SaaS | 4 / 18 | API documentation cited as the answer for "how does X work" queries |
| BFSI / fintech | 2 / 9 | Regulation explainers (RBI, DPDP, GST) cited disproportionately |
| Services | 1 / 14 | Detailed case studies with specific INR cost numbers |
| Manufacturing | 1 / 12 | Technical product spec pages with comparison tables |
| Logistics | 0 / 7 | None of the 7 had pillar content over 1,500 words |
The vertical pattern: BFSI does best per-capita because regulation queries are a citation magnet. Services firms do worst per-capita because most lack first-party data — they syndicate generic "10 tips for X" content. Logistics had a perfect 0/7 because none invested in long-form content; the entire vertical is sitting on an open opportunity.
## The 6-pattern audit you can run on your own site (today)
1
Define a 20-query probe set
Pick 8 informational, 7 comparison, 5 vendor-shortlist queries that a buyer in your category would actually run. Save them in a Google Sheet.
2
Run each query in Perplexity, ChatGPT (web), and a Google AI Overview
Manually note which sites are cited. Mark whether your domain appears or not. This is your baseline.
3
Audit your top-10 traffic pages against the four patterns
Pillar length, question-format H2s, sameAs Wikidata, monthly publishing cadence. Score each page 0-4. The pages scoring 3+ are your AI-citation candidates.
4
Pick one pillar page to upgrade
If the page is under 2,500 words, expand it. If H2s are declarative, rewrite them as questions. If schema is missing, add the 5-layer stack. Prioritise the page with the most existing organic traffic.
5
Re-run the probe set after 21 days
Lift in citations from baseline is your ROI. We see typical lifts of 0 → 4-9 cited queries on the probe set within 21 days for upgraded pillar pages.
## When the patterns don't apply
Three cases where the four patterns above are not the right fix.
If your business is hyper-local (a single-city services firm), AI citations matter less than local pack — invest in Google Business Profile and review velocity instead.
If your category is regulated (legal, medical, financial advisory), AI engines bias toward authoritative sources by default; add EEAT signals and credentialing schema before chasing citation count.
If your audience is technical and uses Stack Overflow / GitHub for answers, your competitive surface is GitHub Discussions and dev forums — pillar pages on your domain may never compete.
## Common mistakes (each one we found in the audit)
Symptom: site has long content but no citations. Cause: declarative H2s and missing schema. Fix: rewrite headers as questions, add 5-layer schema stack.
Symptom: site cited on informational queries but not vendor-shortlist. Cause: no comparison tables, no specific INR pricing. Fix: add a "X vs Y in 2025" pillar with comparison table and clear pricing.
Symptom: cited once a year, never repeatedly. Cause: no publishing cadence. Fix: commit to one post a month for 6 months, even if short.
Symptom: pages cited but bounce rate >85% from the AI citation traffic. Cause: the AI extracted the answer; the user did not need to click further. Fix: add a "want this implemented?" CTA box near the answer paragraph.
## A real example — a Hyderabad SaaS site
One of the 8 cited sites in our audit was a Hyderabad SaaS company in the document automation space. DR was 31 — modest. They had three pillar pages, each 3,200-4,100 words, each with original benchmark data they collected from their own production. Schema: Organization + sameAs Wikidata + sameAs LinkedIn + per-page Article schema. Publishing cadence: 6 posts a month, every month, for 14 months.
Their citation rate on our probe: 12 of 35 queries returned at least one citation pointing at their domain. By comparison, a competitor in the same category with DR 64 and slick design but only 6 short blog posts in the last year: 0 of 35 cited. The pattern was unambiguous.
We asked the Hyderabad team's marketing lead what their goal had been. The answer: "we did not optimise for AI. We just wrote what our engineering team wanted to read." That is the closest thing to a meta-pattern in our 60-site audit. The cited sites are written for an actual reader. Most of the uncited sites are written for an algorithm — and the algorithm has moved on.
## A counter-finding worth flagging
Of the 8 cited sites, 3 had llms.txt files. The other 5 did not. In the broader 60-site sample, 11 sites had llms.txt; 3 of those 11 were in the cited group. The hit rate (3/11 = 27%) was actually slightly better than the overall (8/60 = 13%), but the sample is too small to claim a real correlation.
Independent statistical analysis from late 2025 finds no correlation between llms.txt and citation frequency. We will revisit this in a longitudinal study after Day 60 of llms.txt adoption.
## Pre-publish checklist
- At least one pillar page on the site over 2,500 words on a core business topic
- 5+ question-format H2s per pillar (rewrite declarative headers)
- Organization schema with sameAs Wikidata + sameAs LinkedIn at minimum
- Monthly publishing cadence committed to for at least 6 months
- One first-party data point (survey, benchmark, internal usage stat) in any quarterly pillar
- 20-query probe set defined and baseline citation count recorded
- FAQPage schema on every pillar (5-7 visible Q-nodes)
- One r/SEO thread or community link per pillar (signals participation in the conversation)
## FAQ
### Why did backlinks not predict citation in your audit?
Because AI engines parse the page content directly to compose the citation, not the link graph. Backlinks help discovery and trust; they do not directly determine which page wins an AI-generated snippet. Substance + structure win.
### How representative are the 60 Indian B2B sites?
Not statistically representative of the whole Indian B2B web. It is a convenience sample weighted toward verticals where our agency works. Treat the patterns as hypotheses to test on your own probe set, not laws.
### What probe set should I use for my own audit?
20 queries minimum: 8 informational ("what is X"), 7 comparison ("X vs Y"), 5 vendor-shortlist ("best X for [Indian context]"). Save them in a sheet and re-run quarterly.
### Is a 2,500-word pillar always worth the cost?
Only if you have substance. A 2,500-word page padded with filler underperforms a tight 1,400-word page with original numbers. The pattern is "depth proportional to substance" — not "longer is always better."
### Does adding sameAs to Wikidata require anything from Wikidata's side?
You need a Wikidata entity for your company first. If you do not have one, you can either create one (Wikidata is editable) or skip the Wikidata sameAs and use LinkedIn + Crunchbase + your own social profiles. Three sameAs URLs of any reputable type beats none.
### How long does the audit take to run?
For your own site: 4-6 hours including the 20-query probe in 3 surfaces and a manual page audit. For a full 60-site agency-grade audit: roughly 2 weeks of dedicated work.
### What is the single biggest fix for an Indian B2B site that earns zero AI citations?
Rewrite your top-3 pillar pages with question-format H2s, expand each to 2,500+ words, add the 5-layer schema stack. In our test cases the lift from zero to 5+ cited queries on a 20-query probe takes 21 days.
Want this audit run on your domain?
We run the same 60-site audit methodology on your domain and three competitors — 35-query probe set, the four-pattern scorecard, and a prioritised 90-day fix list. Typical engagement: 8 working days. Suitable for Indian B2B sites with at least 10 existing blog posts. Free initial audit; fixed-price implementation if you want us to ship the fixes.
Book a 60-Site Audit
For a deeper read on how citation dynamics differ between Perplexity and Google AI Overviews, see our earlier post on
Perplexity content patterns from 400 Indian SMB citations. Our founder
Vivek Singh writes on the same beat with a more first-person founder lens. The audit work is led by our
SEO services team; project management by Hrishikesh — see his
team page. We documented a similar audit pattern shipping for
ChipmakerHub's redesign. Email
contact@softechinfra.com with your domain.