The Commercial Landscape of Web Scraping

"Web scraping,'' also known as ''web data extraction'' or ''web harvesting,'' is the process of extracting data from websites using automated software solutions, known as ''bots'' or ''spiders.'' According to Distil Network's Economics of Web Scraping Report, web scraping activity is a prevalent practice, generating up to 46% of all web traffic. Approximately 38% of web scrapers use this technology to obtain content, primarily targeting websites directed to real estate, digital publishing, travel, online directories, e-commerce, marketplace, and classifieds.

The value of web scraping is based on an informal quid pro quo between website owners, web scrapers, and web users. Aggregation websites, such as hotel booking and ticket-selling websites, offer their users the ability to leverage the disparate resources available on the internet. By employing web scraping techniques, aggregation websites extract information from various websites (including government websites) and consolidate that information into a single place for their patrons' ease-of-use. This collection of information drives traffic to the aggregator, potentially increasing its advertising revenue, brand recognition, and user-generated fees. In exchange for the data, aggregation websites often send traffic to the scraped website itself, thereby increasing that website's audience and potential revenue.

Despite benefits such as increased traffic and revenue, some website owners find web scraping ultimately harmful to their carefully-crafted internet presence. Web scrapers may infringe a website owner's copyrights or trademarks, which can spur legal challenges and damage the website's brand. Web scraping can also slow down a website owner's servers and increase webpage load times, negatively impacting user experience and the website's revenue stream. Consequently, it is common for website owners to prohibit scraping in their terms of service and sue web scrapers on, among other claims, violations of the Federal Computer Fraud and Abuse Act (CFAA) and analogous state claims. In response, web scrapers typically counterclaim alleging violations of antitrust and unfair competition laws.

Computer Fraud and Abuse Claims

To address the growing problem of computer hacking, in 1984 Congress passed the Computer Fraud and Abuse Act, creating criminal and civil liability for a party who accesses a computer without authorization or in a manner exceeding their authorization. To prevail on a civil CFAA claim, a plaintiff must demonstrate that a defendant ''intentionally accesse[d] a computer without authorization or exceed[ed] authorized access, and thereby obtain[ed]. . .information from any protected computer;'' or that the defendant ''knowingly cause[d] the transmission of a program . . . and . . . cause[d] damage without authorization to a protected computer.'' 18 U.S.C. § § 1030(a)(2)(C), 1030(a)(5)(A) (2008). To proceed on a civil claim under the CFAA, a plaintiff must also allege, as a threshold matter, that the defendant'sunauthorized access caused at least $5,000 in loss or damage during a one-year period. 18 U.S.C. § 1030(c)(4)(A)(i)(I) (2008). While courts have typically applied the CFAA in manner that broadly protects a website's publicly-available data against third-party web scrapers, courts have also articulated various standards to determine whether a web scraper accessed a website without authorization or exceeded authorized access in violation of the CFAA.

In some jurisdictions, merely breaching a website's terms of use can potentially expose a web scraper to liability under the CFAA. See EF Cultural Travel BV v. Zefer Corp., 318 F.3d 58, 62 (1st Cir. 2003) (''[a] lack of authorization could be established by an explicit statement on the website restricting access'', giving rise to a CFAA violation); see also EarthCam, Inc. v. OxBlue Corp., 703 Fed.Appx. 803, 808 (11th Cir. 2017) (suggesting that ''a person exceeds authorized access if he or she uses the access in a way that contravenes any policy or term of use governing the computer in questions.''); see also CollegeSource, Inc. v. AcademyOne, Inc., 597 Fed.Appx. 116, 130 (3d Cir. 2015) (suggesting that defendants can be liable under the CFAA if they ''breach any technological barrier or contractual term of use.''). This is illustrated in Southwest Airlines Co. v. Farechase, Inc., where the District Court for the Northern District of Texas denied a web scraper's motion to dismiss CFAA claims, reasoning that the scraper accessed Southwest's website without authorization since the terms of use prohibiting the use of web scraping technology were accessible from all pages on the website. See Southwest Airlines Co. v. Farechase, Inc., 318 F. Supp.2d 435, 439–440 (N.D. Tex. Mar. 19, 2004). However, a website owner who simply maintains terms of use on their website will not necessarily find recourse under the CFAA. Certain jurisdictions may require that the terms of use be reasonably accessible to users to provide meaningful protection for websites. In Cvent, Inc. v. Eventbrite, Inc., the United States District Court for the Eastern District of Virginia granted a web scraper's motion to dismiss the website owners' CFAA claim because, among other reasons, the terms of use prohibiting third-party access was buried at the bottom of the first page, in extremely fine print, and situated among many other links. See Cvent, Inc. v. Eventbrite, Inc., 739 F. Supp. 2d 927, 932– 933 (E.D.Va. Sept. 15, 2010).

In other jurisdictions, courts have taken the position that a violation of the terms of use of a website, without more, cannot establish liability under the CFAA. In Craigslist, Inc. v. 3Taps, Inc., for example, the District Court for the Northern District of California denied 3Taps's motion to dismiss Craigslist's CFAA claim. See Craigslist Inc. v. 3Taps Inc., 942 F. Supp. 2d 962 (N.D. Cal. 2013). The Court found that because 3Taps continued to use Craigslist's website, after Craigslist denied 3Taps's access through cease and desist letters and IP blocking techniques, 3Taps's conduct constituted unauthorized access under the CFAA. Id. at 969–970. Similarly in Facebook, Inc. v. Power Ventures, Inc., the U.S. Court of Appeals for the Ninth Circuit, in affirming the lower court decision, held that Power Ventures violated the CFAA by continuing to access Facebook's computers after presented with a cease-and-desist letter and IP blocking measures. Facebook, Inc. v. Power Ventures, Inc. et al., 844 F.3d 1058 (9th Cir. 2016). In Ticketmaster L.L.C. v. Prestige Entertainment, Inc. et al., ruling in favor of a web scraper, the District Court for the Central District of California granted Prestige's motion to dismiss Ticketmaster's claims under the CFAA, reasoning that a cease and desist letter, without more, was insufficient to revoke Prestige's authorization to purchase large quantities of tickets with bots. See Ticketmaster L.L.C. v. Prestige Entertainment, Inc. et al., No. 17-cv- 07232, 2018 WL 654410 (C.D. Cal. Jan. 31, 2018). None of these cases, however, directly addressed the question of whether the scope of the CFAA encompasses collected information that is generally available to the public on a website.

In 2017, however, the District Court for the Northern District of California, in a ruling favorable to web scrapers, addressed the applicability of the CFAA to web scraping activities to publicly available information, thereby adding further uncertainty into website owners' ability to seek recourse under the CFAA against web scrapers. In hiQ Labs, Inc. v. LinkedIn Corp., the Northern District of California granted hiQ's motion for preliminary injunction prohibiting LinkedIn from using electronic blocking techniques to prevent hiQ from scraping information from public LinkedIn profiles. See hiQ Labs, Inc. v. LinkedIn Corp., 273 F. Supp.3d 1099 (N.D. Cal. 2017). The court ruled that the injunction favoring hiQ was proper because, among other considerations, hiQ ''raised serious questions as to applicability of the CFAA to its [web scraping of LinkedIn's public profile information],'' namely that the CFAA was not enacted to prevent access to publicly viewable data not protected by an authentication gateway. Id. at 1113– 1114. The court reasoned that its ruling was true to the legislative intent of the CFAA, stating that the application of the CFAA to publicly available website content ''would have sweeping consequences well beyond anything Congress could have contemplated; it would 'expand its scope well beyond computer hacking.' '' Id. at 1110. In distinguishing hiQ's web scraping activities from that conducted in the Facebook case, the Court noted that the Facebook defendants scraped private data protected by authorization techniques (e.g., password protection or paywall), whereas hiQ accessed and scraped only public data that was left unprotected. Id. at 1109. The court therefore ruled that LinkedIn could not limit hiQ's access to LinkedIn's public profiles or any content open to the public under the CFAA because hiQ did not ''access [LinkedIn's servers] 'without authorization', even in the face of technical countermeasures, when the data it accesses is otherwise open to the public.'' Id. at 1113. LinkedIn has since appealed the ruling to the U.S. Court of Appeals for the Ninth Circuit.

If the Ninth Circuit affirms the decision in hiQ Labs v. LinkedIn Corp., website owners may be potentially precluded from bringing claims under the CFAA against web scrapers that mine publicly available data on the internet. Until courts resolve these legal issues, website owners should instead consider relying on more stringent authorization standards and defensive technology to hamstring web scraping activities.

Antitrust and Unfair Competition Laws

Proponents of web scraping often express concern that decisions permitting website owners to prohibit and seek remedies for web scraping under the CFAA are anti-competitive. For example, in response to the Craigslist v. 3Taps decision, Professor Eric Goldman of the Santa Clara University School of Law stated that the ''ruling is not onlybad for consumers, but is bad for Internet Law—in the sense that Craigslist is creating legal precedent that other websites can use in the future for anticompetitive/ anti-consumer purposes.'' See Eric Goldman, Craigslist Anti-Consumer Lawsuit Threatens to Break Internet Law, FORBES (May 23, 2013, 11:50 AM). Yet, in many cases, earlier courts consistently denied web scrapers' antitrust and unfair competition claims. In parting from its predecessors, the hiQ court issued a ruling favorable to web scrapers and their proponents on this issue. HiQ argued that LinkedIn blocked its access to member data to monetize the data for itself with a competing product, constituting ''unfair'' competition under California's Unfair Competition Law (''UCL''), Cal. Bus. & Prof. Code § 17200 et seq. The court agreed with hiQ's contention that LinkedIn's conduct ''violate[d] the spirit of the antitrust laws'' (and was therefore anti-competitive) in two ways: first, LinkedIn was leveraging its dominance in online professional networking for an uncompetitive advantage against hiQ in the data analytics market; second, LinkedIn's conduct violated the ''essential facilities'' doctrine by precluding access to its member data, which is the lifeblood of hiQ's business. hiQ Labs Inc., 273 F. Supp.3d at 1117. Moreover, the district court in hiQ was not persuaded by LinkedIn's argument that it acted primarily out of concern for member privacy and not for exclusive control over the data collected from its members, Id. at 1118, finding that LinkedIn's practice of making user data available to other third parties undermined this argument. Id. In so holding, hiQ illumines another path by which web scrapers may potentially challenge website owners' data access restrictions. This issue is also before the Ninth Circuit.

Takeaways

Recent decisions could signal a shift in web scrapers' potential liability under the CFAA. Providing notice and implementing IP address blocking techniques against web scrapers may no longer prove to be successful tactics for a website owner to implement as a way to restrict web scrapers' access to publicly available information. Moreover, claims brought against web scrapers under the CFAA may potentially open website owners to liability under unfair competition laws. This risk may be particularly perilous for websites collecting and maintaining troves of data on a large user base. Web scraping companies should, however, tread cautiously, following a potential roadmap from recent caselaw to shield themselves from CFAA liability. While these decisions are pending on appeal, they imbue uncertainty in the current legal landscape and leave the current symbiosis between website owners and web scrapers in limbo. 

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.