Search Engine Optimization Guide for Bot Platforms (2026)
This is the complete SEO playbook BotWave uses to stay discoverable across the modern search landscape, traditional engines, the Bing umbrella, and the new AI answer engines. Everything below is what we actually run in production; nothing is theoretical.
If you are scaling a bot platform (WhatsApp, Telegram, or any high-volume programmatic site), follow these steps in order. The earlier steps unblock the later ones.
The 2026 search landscape, what you actually need to cover
There are three umbrellas. Cover one engine in each and you cover ~98% of search traffic in the English-speaking web:
- The Google umbrella, Google Search, Google Discover, Startpage. Optimise once for Googlebot.
- The Bing umbrella, Bing, Yahoo, DuckDuckGo, Ecosia, Swisscows, AOL. All draw from the Microsoft index. One IndexNow ping covers all of them.
- AI answer engines, Perplexity, ChatGPT Search, Brave Leo, You.com, Claude, Meta AI. No webmaster console; coverage is automatic via
robots.txtallow-listing + structured data.
See our per-engine pages at /search-engines for individual reference cards.
Step 1, Verify Google Search Console (and only Google Search Console)
The only engine that requires a manual sign-in is Google. Everything else is automatic via the protocols below.
- Add both
botwave.onlineandwww.botwave.onlineas separate properties. - Verify ownership via DNS TXT (more durable than HTML file).
- Submit
https://www.botwave.online/sitemap.xmlonce. GSC auto-discovers chunks. - Open the Coverage report weekly. The most common issues:
- "Discovered - currently not indexed", Google found the URL in your sitemap but won't crawl it. Fix: stronger internal-link mesh + unique per-slug content.
- "Duplicate without user-selected canonical", Two pages have the same content. Fix: explicit
alternates.canonicalon each route. - "Page with redirect", Not an error if intended (e.g.
http://โhttps://). - "Not found (404)", Legacy URL. Fix: add a 301 in
next.config.jsasync redirects.
Step 2, Implement IndexNow (covers Bing + Yandex + DDG + Yahoo + Ecosia + AOL)
IndexNow is a free, no-login protocol Microsoft and Yandex co-developed. One HTTP POST notifies every participating engine.
How BotWave wires it:
- Generate a 32-char hex key. Save as a constant (e.g.
b7e9c1a2f4d8e0b1a3c5d7e9f1b3a5c7). - Host that key at
/<KEY>.txton your domain so IndexNow can verify ownership (public/<KEY>.txt). - Build a small helper that POSTs to
https://api.indexnow.org/IndexNowwith the JSON body{ host, key, keyLocation, urlList }. - Wire two endpoints:
POST /api/indexnowfor single-URL pings (after publishing a new blog post).POST /api/indexnow/submit-allfor full-catalog sweeps (daily cron).
- Optionally ship a one-shot CLI script (
scripts/indexnow-bulk-submit.mjs) for catch-up.
Limits: 10,000 URLs per request. We chunk at 500 to be conservative.
Step 3, Allow-list every crawler you want to be visible to
Your robots.txt is the single most leveraged file for AI engines. They check it before deciding whether to ingest your pages.
Allow-list the named UAs explicitly (positive signal, even though * already permits them):
User-agent: Googlebot
Allow: /
Disallow: /api/
Disallow: /dashboard/
User-agent: Bingbot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
# ... and so on. See app/robots.ts for the full list.Engines we allow-list at BotWave: Googlebot, Googlebot-Image, Googlebot-News, Bingbot, Slurp, DuckDuckBot, Baiduspider, YandexBot, Applebot, GPTBot, OAI-SearchBot, ChatGPT-User, Google-Extended, ClaudeBot, Claude-Web, anthropic-ai, PerplexityBot, Perplexity-User, CCBot, Bytespider, Amazonbot, cohere-ai, Diffbot, FacebookBot, Meta-ExternalAgent, ImagesiftBot, Omgilibot, YouBot.
Step 4, Chunk your sitemap (essential when you have 20k+ URLs)
The sitemap protocol caps at 50,000 URLs / 50MB per chunk. We use 2,000 URLs per chunk because:
- Smaller chunks fetch faster (helps every engine, not just Google).
- GSC indexes each chunk independently, one slow chunk doesn't block the rest.
- Append-only URL ordering keeps chunk membership stable across deploys.
Implementation: generateSitemaps() in app/sitemap.ts returns [{ id: 0 }, { id: 1 }, ...] and the default export reads the chunk ID and slices the master URL list. The landingPages array is treated as append-only so URLs never move chunks.
Step 5, Emit structured data (JSON-LD) everywhere
AI engines especially weigh structured data heavily. Per page-type:
| Page type | Schemas to emit |
|---|---|
| Homepage | Organization, WebSite with SearchAction, SoftwareApplication |
| /pricing | Product + Offer per tier |
| Blog posts | Article, BreadcrumbList, FAQPage |
| /how-to/* | HowTo, BreadcrumbList, FAQPage |
| /fix/* | FAQPage, BreadcrumbList |
| /compare/* | FAQPage, BreadcrumbList |
| /use-cases/* | Article, FAQPage, BreadcrumbList |
| /privacy, /terms | WebPage, FAQPage, BreadcrumbList |
Run every page through Schema.org validator and Google's Rich Results Test before shipping.
Step 6, Defeat "Discovered - currently not indexed"
If GSC says it found your URLs but won't crawl them, the issue is almost always one of:
- Thin content, programmatic pages with templated bodies. Fix: emit unique per-slug content (we did this for /how-to, /fix, /compare, /use-cases, see commit history of
lib/howto/content.ts). - No inbound links, the URL only appears in the sitemap, not in any other page's HTML. Fix: link to it from your homepage, footer, or a "related content" section.
- Slow page load, failing Core Web Vitals. Fix: server-render, drop unused JS, optimise images.
For BotWave's 98 unindexed blog URLs in GSC, the fixes were: (a) home-page Guides & Tutorials section linking to every post, (b) per-post FAQPage schema with substantive Q&A, (c) cross-linking between blog/how-to/compare on shared keywords.
Step 7, Optimise for AI answer engines
The new wave of search is conversational. AI engines (Perplexity, ChatGPT Search, Brave Leo, You.com, Claude, Meta AI) synthesise live web data into direct answers with citations. To be cited:
- Allow the UAs (step 3 above).
- Emit FAQPage + Article schema so the engine can extract Q&A pairs.
- Provide
llms.txtandllms-full.txt, AI-engine-friendly content dumps at the root of your site. See /llms.txt and /llms-full.txt for BotWave's implementations. - Write content that answers questions, AI engines preferentially cite pages with clean Q&A structure or step-by-step content. Marketing pages rarely get cited.
- Keep canonical signals clean, duplicate-canonical warnings tell AI engines you don't know which version is authoritative.
Step 8, Cover Apple's ecosystem (Spotlight, Siri, Safari)
Applebot powers Spotlight, Siri Suggestions, and Safari smart search. It crawls automatically, your only job is:
- Allow
ApplebotandApplebot-Extendedinrobots.txt. - Use clean semantic HTML (
<article>,<h1>-<h3>). - Provide complete OpenGraph metadata on every page.
There is no Apple webmaster console, so robots.txt is the only signal you control.
Step 9, Set up monitoring (it's not "done" after launch)
- GSC Coverage, weekly check for new "Discovered - not indexed" entries.
- GSC Sitemaps, confirm all chunks return 200 and are listed.
- Bing Webmaster Tools, sign in once, submit sitemap, check Site Explorer monthly.
- IndexNow status, log the response code of every batch. 200/202 = accepted; 403 = key mismatch (re-check
/<KEY>.txtis reachable); 422 = invalid URL list. - Manual spot checks, search
site:botwave.onlineon Google, Bing, DDG, Brave Search, Yandex, Ecosia weekly. Indexed count should track sitemap URL count within ~15%.
Practical schedule we follow at BotWave
| Cadence | Action |
|---|---|
| Per deploy | Run scripts/indexnow-bulk-submit.mjs (or hit /api/indexnow/submit-all) |
| Per content publish | POST single URL to /api/indexnow |
| Daily | GSC Coverage report scan for new errors |
| Weekly | site: spot-checks on Google, Bing, DDG, Brave, Yandex |
| Monthly | Bing Webmaster Tools review |
| Quarterly | Audit robots.txt against new AI engines (e.g. add new UAs as they emerge) |
Common SEO mistakes that hurt bot platforms specifically
- Indexing the dashboard. Auth-required pages should be in
Disallow:so they don't pollute the index. We block/api/,/dashboard/,/admin/. - Indexing the QR-pairing page with a query string. Dynamic QR URLs change every load. Block the pattern in
robots.txtor use<meta name="robots" content="noindex">on the route. - Duplicate pages per region. If you build
/whatsapp-bot-nigeria,/whatsapp-bot-lagos,/whatsapp-bot-abujawith the same body, Google will pick one canonical and ignore the rest. Differentiate the body or merge. - Forgetting OpenGraph on programmatic pages. When Meta AI / WhatsApp / Telegram unfurl your URL, they read
og:imageandog:description. We auto-generate OG images via/api/og?title=from the page title.
Useful resources
- IndexNow protocol: indexnow.org/documentation
- Google Search Console: search.google.com/search-console
- Bing Webmaster Tools: bing.com/webmasters
- Yandex Webmaster: webmaster.yandex.com
- Schema.org validator: validator.schema.org
- BotWave per-engine reference: /search-engines
Apply these nine steps in order, and your bot platform will be discoverable across Google, the Bing umbrella, and every AI answer engine in 2026.