Help

CHECK THE NUMBERS

If onTopicPages < processedPages / 10 in the beginning when seeds (assuming seeds are on topic) are processed, something is wrong.

If you specified content phrases, then this can go lower, down to onTopicPages < processedPages / 100.

STOP, DELETE RESULTS, START crawl again.

CHECK LATEST PAGES

If detected on-topic pages are not on topic you are looking for, you need to provide more examples. If you are targeting specific entities or locations, you need to read more about machine learning.

If above situation repeats, fix your crawl settings.

RECIPE FOR GOOD CRAWL SETUP

  1. Providing more example pages usually helps. Few examples are enough for very specific unambiguous topics, 30 examples are enough for many topic, 50 should do the job for most topics, and 200 for not clearly defined topics (most public crawl setups have ~50-200 examples, even if topics are well defined).
  2. If above change does not yield satisfying results, edit forbidden sites, forbidden atext patterns and forbidden href patterns.
  3. As a last resort, specify content phrases. This can make a crawl very inefficient if above steps are not done properly! Especially if you want geolocation targeting, you need to forbid few most common top level domains, countries, and cities (that appear in top of 'Most relevant domains' and 'Keyword ideas' tools) in forbidden sites, forbidden href patterns, and forbidden atext patterns settings! This simple 'brute force' solution can produce MUCH better crawl results.
  4. Once a crawl has been running for a while, you will start seeing most relevant domains. This is the best place to detect any spam sites that may beat our algorithms and get to the top, which happens occasionally. Click on 'pages from domain' link next to domain stats of suspicious site, then if pages look spammy, click on 'REMOVE DOMAIN' in the 'DOMAIN LINKS' submenu.

Crawl may stop before onTopicPages is reached if crawl settings is detected to be anomalous (too low percent of detected on-topic pages).

If you find above process too complicated or time consuming, you can have an expert setup your crawl.

contact | terms | privacy
© 2016 semanticjuice.com