Big Data: Legal Challenges (Full Report)

Analysis of Big Data is characterised by use of real time information and very large sets of information from disparate sources. Much of the relevant data is unstructured or only semi-structured, and will often lack originality, and even meaning, without the work of the data analyst to extract insights. On the one hand we have raw data with little form and little meaning, and on the other, immense value when it is combined with other data sources and advanced techniques of evaluation. It is entirely possible that after evaluation, a new structured data set may also be created which contains the real insights, and which although highly valuable, may be simply expressed.

The dichotomy between ideas or information itself and the form of expression of those ideas has always been a theme of copyright law. That distinction comes to the fore in the context of data and databases. Originality in "sweat of the brow" jurisdictions (including New Zealand) has typically been measured by the labour and expense involved in collecting the information recorded in a work. Labour was more significant than creative or intellectual effort. In the case of data, the risk is that in many cases it may be inexpensive and easy to collect vast quantities of valuable data, putting in doubt the availability of copyright protection.

In Australia, the High Court has clearly signalled a departure from the "sweat of the brow" approach, emphasising the need for originality in the arrangement of the relevant data, the preparatory work in creating such data being deemed irrelevant for copyright purposes. Combined with the focus on authorship by a human author shown by the Full Federal Court in Phone Directories, copyright in Australia is no longer particularly relevant for corporate big data assets. Not only is the collection of big data in raw form (often unstructured or semi structured) likely to lack effort of a literary nature, the data is often lacking in any meaning absent some effort from the data analyst, and the "author" is usually a machine or computer programmed to collect information automatically. Big Data is seldom the province of significant human effort and there will rarely be a particular person who could be said to be an author for copyright purposes.

Similarly in the United States, copyright will not assist in the protection of all but the most carefully selected data sets, given the requirement to show at least the "modicum of creativity necessary to transform mere selection into copyrightable expression" (Feist).

The sui generis database right of the EU provides perhaps greater prospect of protection for big data. It affords protection to databases in which there has been "a substantial investment in either the obtaining, verification or presentation of the contents". There is therefore no requirement with respect to originality of the database. The cases which have primarily exercised the European Courts in relation to the database right have focussed on the distinction between investment in the creation of the data, as opposed to its collection. Organisations which have effectively collected data at the same time as it created have not fared well in obtaining database protection.

However this limitation is of less relevance to Big Data, which rarely will be created in any sense by those collecting it. It may well be that the European database right is much better adapted as a mechanism to protect large volumes of Big Data particularly where the effort is focused on obtaining or verifying data.

Protection for confidential information is also likely to remain one of the more valuable remedies for organisations seeking to stop access to proprietary collections of data. The remedy will be of most use where the information is not publically known or at least the nature of the collation or the insights available in relation to the data set are not known. Published facts on the other hand will lack the necessary quality of confidence without some further and confidential synergistic effect or confidentiality arising from the mere fact of its inclusion in the database. A mere non-selective list of publicly available information is unlikely to receive protection under this head, even if its collation involves some time and effort.

Where the issue is not the exclusion of others from Big Data assets, but rather careful governance of the basis on which it can be used, contract is king, the most heavily used mechanism being the licence agreement. The real deficiency of the licence agreement is that the rights are not enforceable against a non-party to the contract who gains unauthorised access to the data. Provided the contracting party is required to keep the data confidential, any unauthorised disclosure to a third party may give rise to rights in confidentiality as discussed above.

However publically accessible data perhaps poses the most significant problems for data owners, who must show that contracts are binding on web spiders and bots who cannot read and are not required to indicate agreement to website terms and conditions before accessing data. The information "scraping" cases illustrate the creativity required of lawyers to find remedies for the protection of publically available information such as airline ticket information and eBay auction listings.

Privacy compliance is an increasingly important issue for all big data assets, as attempts to anonymise data are viewed with scepticism. Compliance obligations frame all parts of the data lifecycle from notifications at the point of collection, appropriate measures to secure data, through to limitation of use only for disclosed purposes. These requirements pose particular challenges for the big data collector who cannot foresee the full range of future uses for the data. Particular care is needed in licensing data that contains personally identifiable information and before outsourcing storage or processing to an offshore cloud. Compliance in the privacy space will also dictate the time and manner of deletion of data when requested by an individual or where the purpose for the collection has been fulfilled. Mandatory data breach laws, penalties and codes of appropriate conduct around behavioural targeting are all issues on the horizon for the organisation managing its data assets.

In conclusion, the promise and potential of big data needs to be matched by a considered approach to collection, storage, licensing and use.

Without a well thought through data strategy, remedies for misuse may be hard to find.

Traditional copyright protection is unlikely to assist and contract and confidential information remedies are likely to be far more significant.

If competitors benefit from observing publically available data, remedies may be particularly very difficult to find. As a result, the battle against scrapers of data will often be more a battle of technologies (such as IP address blocking) than successful assertion of legal remedies.

Related links

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.