pseudonymization | Notes from the Trenches

Before reflexively rejecting a vendor/provider’s aggregate data clause, determine whether pushing back is really necessary.

More than ever before, data is the driver of business. Companies are inundated with new data on a daily basis, which creates a number of business challenges. One of the more prominent challenges of late has been how best to protect data within a company’s infrastructure from inadvertent and improper access and disclosure. Another important challenge is how best to “mine” data sets through data analytics, the quantitative and qualitative techniques businesses use to analyze data in order to develop business insights, conclusions, strategies, and market trend data in order to provide guidance on operational and strategic business decisions. “Aggregate data” is key to data analytics; companies take existing data, anonymize it by removing any personal or other information that can be used to identify the source of the data, and aggregate it with other anonymized data to create a new set of data on which data analytics can be performed.

The strength of the conclusions and insights learned through data analytics is directly proportional to the amount of source data used. Aggregate data comes from two primary sources: (1) internal data sets within the company’s possession or control, such as transactional data, customer data, server data, etc.; and (2) external data setssuch as free online databases of government data (e.g., US Census data) and data available from data brokers who have compiled aggregate data sets for purchase and use by businesses.

To ensure businesses have the right to use customer data in their possession for data analytics purposes, SaaS, cloud, software, and other technology agreements often contain an aggregate data clause. This clause gives a vendor/provider the right to compile, collect, and use aggregate data from customer information for the vendor/provider’s own business purposes. Many vendors/providers work hard to craft an aggregate data clause that fairly and adequately protects their data sources. Before reflexively rejecting a vendor/provider’s aggregate data clause, consider the analysis and questions in this article to determine whether pushing back is really necessary to protect your company’s interests.

The vendor/provider’s perspective

Customers often push back on aggregate data clauses for a variety of reasons, such as “it’s our policy not to give this right,” “why should you benefit from our data?” and “how can you guarantee someone won’t be able to figure out it’s us?” On the other side, a vendor or provider may argue that the aggregate data clause is a “table stakes” provision in their agreement. Under this argument, analytical data is used to generate macro-level insights which benefit both the vendor/provider and its customers, and as long as it is used in a way that does not identify a specific customer or client there is no potential harm to the customer in allowing its use for data analytics. Additionally, many vendors argue that the systems used to anonymize and aggregate data do not allow for exceptions on a per-customer basis. Additionally, vendors/providers often share insights and other conclusions drawn from data analytics with their customers and clients, e.g., through client alerts, newsletters, conferences, etc., and therefore clients benefit from allowing their data to be used in the vendor/provider’s data analytics efforts. Data analytics are often a critical part of a vendor/provider’s business plans and operations, and access to client data for analytics purposes is baked into the cost of using the service.

Is the aggregate data clause well-drafted and balanced?

Many vendors/providers take the time to craft an aggregate data clause that is fair and does not overreach. As long as the vendor/provider has protected the customer’s rights and interests in the underlying customer data, the use of a customer’s data for analytics purposes may be perfectly acceptable as a part of the overall contractual bargain between the parties. A well-drafted clause usually contains the following core provisions:

Grant of rights – A right for the vendor/provider to compile, collect, copy, modify, publish and use anonymous and aggregate data generated from or based on customer’s data and/or customer’s use of its services, for analytical and other business purposes. This is the heart of the clause. This clause gives the vendor/provider the right to combine aggregate data from multiple internal and external data sources (other customers, public data, etc.).
Protection of source data – A commitment that the customer will not be identified as the source of the aggregate data. While this is really restating that the data will be “anonymous,” some customers may want a more express commitment that the aggregate data can’t be traced back to them. I’ll talk more about this later in this article.
Scope of usage right – Language making clear either that the vendor/provider will own the aggregate data it generates (giving it the right to use it beyond the end of the customer agreement), or that its aggregate data rights take precedence over obligations with respect to the return or destruction of customer data. The common vendor/provider reason for this is that aggregate data, which cannot be used to identify the customer, is separate and distinct from customer data which remains the property (and usually the Confidential Information) of the customer under the customer agreement. Additionally, the vendor/provider often has no way to later identify and remove the aggregate data given that it has been anonymized.

Things to watch for

When reviewing an aggregate data clause, keep the following in mind:

Protection of the company’s identity. While language ensuring that a customer is not identified as the source of aggregate data works for many customers, it may not be sufficient for all. Saying a customer is not identified as the source of aggregate data (i.e., the vendor/provider will not disclose its data sources) is not the same as saying that the customer is not identifiable as the source. Consider a customer with significant market share in a given industry, or which is one of the largest customers of a vendor/provider. While the vendor/provider may not disclose its data sources (so the customer is not identified), third parties may still be able to deduce the source of the data if one company’s data forms the majority of the data set. Customers that are significant market players, or which are/may be one of a vendor’s larger clients, may want to ensure the aggregate data clause ensures the customer is not identified or identifiable as the source of the data, which puts the onus on the vendor/provider to ensure the customer’s identity is neither disclosed nor able to be deduced.

Ownership of aggregate data vs. underlying data. As long as the customer is comfortable that aggregate data generated from customer data or system usage cannot be used to identify or re-identify the customer, a customer may not have an issue with a vendor/provider treating aggregate data as separate and distinct from the customer’s data. Vendors/providers view their aggregate data set as their proprietary information and key to their data analytics efforts. However, a well-drafted aggregate data clause should not give the vendor/provider any rights to the underlying data other than to use it to generate aggregate data and data analytics.

Scope of aggregate data usage rights. There are two ways customer data can be used for analytics purposes – (1) to generate anonymized, aggregate data which is then used for data analytics purposes; or (2) to run data analytics on customer data, aggregate the results with analytics on other customer data, and ensure the resulting insights and conclusions are anonymized. Customers may be more comfortable with (1) than (2), but as long as the vendor/provider is complying with its confidentiality and security obligations under the vendor/provider agreement both data analytics approaches may be acceptable. With respect to (2), customers may want to ask whether the vendor/provider uses a third party for data analytics purposes, and if so determine whether they want to ensure the third-party provider is contractually obligated to maintain the confidentiality and security of customer data and if the vendor/provider will accept responsibility for any failure by the third party to maintain such confidentiality and security.

Use of Aggregate Data.Some customers may be uncomfortable with the idea that their data may be used indirectly through data analytics to provide a benefit to their competitors. It’s important to remember that data analytics is at a base level a community-based approach – if the whole community (e.g., all customers) allows its data be used for analytics, the insights and conclusions drawn will benefit the entire community. If this is a concern, talk to your vendor/provider about it to see how they plan to use information learned through analytics on aggregate data.

Duration of aggregate data clause usage rights. Almost every vendor/provider agreement requires that the rights to use and process customer data ends when the agreement terminates or expires. However, vendors/providers want their rights to use aggregate data to survive the termination or expiration of the agreement. A customer’s instinct may be to push back on the duration of aggregate data usage rights, arguing that the right to use aggregate data generated from the customer data should be coterminous with the customer agreement. However, if the data has truly been anonymized and aggregated, there is likely no way for a vendor/provider to reverse engineer which aggregate data came from which customer’s data. This is why many vendors/providers cannot agree to language requiring them to cease using aggregate data generated from a customer’s source data at the end of the customer relationship. One approach customers can consider is to ask vendors/providers when they consider aggregate data to be “stale” and at what point they cease using aged aggregate data, and whether they can agree to state that contractually.

Positioning an objection to the aggregate data clause. As noted earlier, the right to use data for analytics purposes is considered to be a cost of using a vendor/provider’s software or service and a “table stakes” provision for the vendor/provider, and the ability to use data for analytics purposes is already baked into the cost of the software or service. Some customers may feel this is not sufficient consideration for the right to use their data for analytics purposes. If that is the case, customers may want to consider whether to leverage an objection to the aggregate data clause as a “red herring” to obtain other concessions in the agreement (e.g., a price discount, a “give” on another contract term, or an additional service or add-on provided at no additional charge).

The GDPR view on use of aggregate data

The European Union’s new General Data Protection Regulation (GDPR), which becomes effective on May 25, 2018, makes a significant change to the ability to use personal data of EU data subjects for analytics purposes. Under the GDPR, a blanket consent for data processing purposes is no longer permitted – consent to use data must be specific and unambiguous. Unfortunately, this directly conflicts with data analytics, as the ways a data set will be analyzed may not be fully known at the time consent is obtained, and there is no right to “grandfather in” existing aggregate data sets. Simply saying the data will be used for analytics purposes is not specific enough.

Fortunately, the GDPR provides a mechanism for the continued use of aggregate data for analytics purposes without the need to obtain prior data subject consent – Pseudonymization and Data Protection by Default. Pseudonymization and data protection principles should be applied at the earliest possible point following acquisition of the data, and vendors/providers must affirmatively take data protection steps to make use of personal data

Pseudonymization – Pseudonymization is a method to separate data from the ability to link that data to an individual. This is a step beyond standard tokenization using static, or persistent, identifiers which can be used to re-link the data with the data source.
Data Protection by Default – This is a very stringent implementation of the “privacy by design” concept. Data protection should be enabled by default (e.g., an option in an app to share data with a third party should default to off).

Data analytics is an important part of every company’s “big data” strategy. Well-crafted aggregate data clauses give vendors and providers the ability to leverage as much data as possible for analytics purposes while protecting their customers. While there are reasons to push back on aggregate data clauses, they should not result in a negotiation impasse. Work with your vendors and providers to come up with language that works for both parties.

Eric Lambert has spent most of his legal career working in-house as a proactive problem-solver and business partner. He is a corporate generalist who specializes in transactional agreements, technology/software/e-commerce, privacy, marketing and practical risk management. Any opinions in this post are his own. This post does not constitute, nor should it be construed as, legal advice. He is a technophile and Internet evangelist/enthusiast. In his spare time Eric dabbles in voice-over work and implementing and integrating connected home technologies.

Notes from the Trenches

Articles, thoughts, tips, and tricks from an veteran in-house counsel

Tag Archives: pseudonymization