Human markup vs SemanticJuice content extraction algorithm:
NEXT RANDOM SAMPLE
(hundreds of different websites)
https://www.semanticjuice.com/rd/data-robert/cleaneval-final/raw/599.html
Examples provided by
Tomaž Kovačič
.
Semantic Juice
.