This article was originally published on 1st May 2009
Almost all online businesses with a large online presence and
listings are targets of systematic data theft – commonly
known as scraping, Web harvesting or Web data extraction. During
these attacks scrapers systematically steal large amounts of
information from the company's web site, in clear breach of the
terms and conditions, and use it for example to boost a competing
business.
Data Scraping or Data Harvesting is theft of property from websites
and has been going on as long as companies have been publishing
data and images on the web. Today it is done on an industrial scale
as some entities believe that it is easier to steal data than to
create it. There dozens of commercially available packages that
offer tools and anonymity to Scrape.
A system called ASSASSIN, developed by Sentor, can detect and block
scraping and data theft around the clock, in real time. The
Assassin Anti Scraping system is an expert system that analyses
traffic and requests to websites in an unobtrusive way. By
analysing usages and traffic patterns it scores requests and
concludes whether they are made by human or web robots. It can
detect Scrapers using anonymous proxy services or large amounts of
open proxy servers to avoid detection. The system retains all
forensic data which gives the choice to apply blocks, warn off
scrapers or prosecute perpetrators. The Yell.com white paper on
ASSASSIN case study is available at http://www.sentor.se/en/ASSASSIN_case_study.pdf
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.