Download

You can download the data from your crawls and use it for your custom data analysis.

Table Fields Description
domains domain info link count info visit count info page count info relevance info popularity info Distinct domains found in parsed URLs. Our algorithms are tuned to rapidly discover the most relevant and popular domains in the niche. Our combined metrics which depend on many signals is successful in positioning domains in their right positions very early in the crawl. Of course this positions change as crawl statistics are updated, but it is quite robust in well connected topics, regardless of initial seeds provided.
urls id url urlhash score info pagerank info referring domains info external backlinks info lang info filtered near_duplicate wrong_lang not_on_topic error on_topic last_visit All likely relevant URLs parsed on visited pages. After being visited, they are further marked as on_topic, not_on_topic, error, etc. The better the crawl settings, the lower percent of false positives will occur, and link analysis tools will yield more useful results. filtered URLs are those that were blocked by forbidden sites, forbidden href patterns or forbidden atext patterns.
graph parentid childid atext info nofollow semantic flow ™info Graph edges of the crawled web where each parent or child id relates to id from urls table. Numerous experiments show that nofollow attribute is obsolete, but we provide it as some users ask for it.
topic pages url urlhash relevance info hubness info title info microdata lang info links info rel links info new rel links info external links info external domains info referring domains info external backlinks info last_change Detected on-topic pages from visited URLs. For seed only crawls, hubness is a useful measure of proportion of new relevant links on the page since the last crawl.
topic pages content url content Core content of on-topic pages is extracted. To get full HTML use URL provided.

contact | terms | privacy
© 2016 semanticjuice.com