Make sure your example links have actual content, as they may have been removed and show 404 errors, which often happens with old bookmarks.

We cannot currently crawl PDF or DOC files, so only links to plain HTML pages can be submitted as examples.


If detected on-topic pages are not on topic you are looking for, you need to provide more examples. If you are targeting specific entities or locations, you need to read more about machine learning.

If above situation repeats, fix your crawl settings.


  1. Providing more example pages usually helps. Few examples are enough for very specific unambiguous topics, 30 examples are enough for many topic, 50 should do the job for most topics, and 200 for not clearly defined topics. Most public crawl setups with full access have ~50-200 examples, even if topics are well defined. Public topic with overview only were generated mostly from Quick SEO examples, and may be ambiguous (same concept describing different things, i.e. Trial.
  2. If above change does not yield satisfying results, edit forbidden sites, forbidden atext patterns and forbidden href patterns.
  3. As a last resort, specify content phrases. This can make a crawl very inefficient if above steps are not done properly! Especially if you want geolocation targeting, you need to forbid few most common top level domains, countries, and cities (that appear in top of 'Most relevant domains' and 'Keyword ideas' tools) in forbidden sites, forbidden href patterns, and forbidden atext patterns settings! This simple 'brute force' solution can produce MUCH better crawl results.
  4. Once a crawl has been running for a while, you will start seeing most relevant domains. This is the best place to detect any spam sites that may beat our algorithms and get to the top, which happens occasionally. Click on 'pages from domain' link next to domain stats of suspicious site, then if pages look spammy, click on 'REMOVE DOMAIN' in the 'DOMAIN LINKS' submenu.

Crawl may stop before onTopicPages is reached if crawl settings is detected to be anomalous (too low percent of detected on-topic pages).

If you find above process too complicated or time consuming, you can have an expert setup your crawl.

