Download
You can download the data from your crawls and use it for your custom data analysis.
Table |
Fields |
Description |
domains |
domain  link count  visit count  page count  relevance  popularity  |
Distinct domains found in parsed URLs. Our algorithms are tuned to rapidly discover the most relevant and popular domains in the niche. Our combined metrics which depend on many signals is successful in positioning domains in their right positions very early in the crawl. Of course this positions change as crawl statistics are updated, but it is quite robust in well connected topics, regardless of initial seeds provided. |
urls |
id url urlhash score  pagerank  referring domains  external backlinks  lang  filtered near_duplicate wrong_lang not_on_topic error on_topic last_visit |
All likely relevant URLs parsed on visited pages. After being visited, they are further marked as on_topic, not_on_topic, error, etc. The better the crawl settings, the lower percent of false positives will occur, and link analysis tools will yield more useful results. filtered URLs are those that were blocked by forbidden sites, forbidden href patterns or forbidden atext patterns. |
graph |
parentid childid atext  nofollow semantic flow ™  |
Graph edges of the crawled web where each parent or child id relates to id from urls table. Numerous experiments show that nofollow attribute is obsolete, but we provide it as some users ask for it. |
topic pages |
url urlhash relevance  hubness  title  microdata  lang  links  rel links  new rel links  external links  external domains  referring domains  external backlinks  main image url  last_change |
Detected on-topic pages from visited URLs. For seed only crawls, hubness is a useful measure of proportion of new relevant links on the page since the last crawl. |
topic pages content |
url content  plaintext  summary  |
We tried to extract the main content of a page, both in HTML and plain text format. Full HTML of on-topic pages can also be provided if you need more than just core content of web pages. We also provide an experimental summary for English language, or most important sentences for other languages. |