Exploiting Data Network Effect Securely through Data Clean Rooms

May 29, 2023

By: Sumukh Guruprasad, Associate Principal-Data Engineering

In a mall, all shops contribute to its overall footfall. This is called the network effect. Similarly, in the data context, every entity that generates data contributes more value to the network. To exploit the data network effects in an industry, we must:

Upload data to the cloud.
Make this data available to others for analysis
Ensure data privacy and protection of personal information

Today, most companies are adopting public clouds to leverage this effect.

Snowflake Data Clean Room

Snowflake has created a Cloud Data Platform for data commerce. This platform gives users access to data within or outside accounts through:

Role-based access control
Row-level security
Column data masking

Snowflake replicates this data across regions and cloud providers. Users must access read replicas within the same region and public cloud. This reduces network latency. Data does not move outside an organization’s boundaries. So managing and governing it is easier.

Enterprises spend 4 or 5 dollars on services for every dollar spent on Snowflake. The bulk of this is on human resources. Let us start by setting ourselves a goal of doing better on the cost front. Snowflake works on pay-per-use. If you do not use it, you do not pay.

A Snowflake service partner can help businesses minimize usage and identify use cases that bring you the most value. Let me elaborate with an example. I have chosen a case from advertising. Every business will need to advertise on some medium. This is an era of mass personalization. However, protecting customer privacy and compliance with regulations is just as important.

Unleashing data network effects in the advertising

Customers transact on the internet with multiple parties. They want the parties to know them to make their experience personal. Customers may object if details of their transactions are given to third parties -i.e., parties not involved in the transaction.

Traditionally, customers were identified by placing cookies on the browser. But due to greater stress on customer privacy, Google Chrome has announced the discontinuance of cookies in Chrome browser. Also, regulations are becoming stricter regarding how personal information must be handled. Data clean room solutions are emerging as one of the most popular privacy-enhanced technology to facilitate data sharing and collaboration.

Let’s say I am a Disney customer. I will most likely associate with a particular advert. Disney determines this association based on my prior usage. I don’t want third parties to know what I do on the Disney application. I may watch only adult content or cartoons, which is none of anyone’s business.

That said, it is Disney’s business to maximize revenue from adverts. In this example, Disney has data about every customer’s:

Favorite show
Maximum association towards available adverts.

Can Disney share this data in near real-time without showing it?

What do we mean by sharing data without showing it? Limit or control the questions you ask. If we restrict questions, data is as good as hidden because raw data is encrypted.

Let us say unencrypted data looks like what is shown in the table below. My record is one row. Imagine a million rows for other Disney customers. Let’s share this dataset.

Name	Favorite Show	Max Association with Ad
Sumukh (Me)	Baywatch	Nike Shoes

The data set owner (Disney) can restrict questions (queries) asked. If the question is – how many customers have:

Baywatch as their favorite show AND
Have the maximum association with Nike Shoes advert

Then Disney can do either of these two:

Allow this question
Decide to put conditions even after allowing the question. Like, revealing the answer only if the count is more than fifty. This will avoid reidentification.

Let’s say a third-party advertiser asked the allowed question and the answer is 1000 entries (i.e., those who watch Baywatch and have an affinity towards Nike’s shoe ad); then:

The party may be willing to pay a premium for ads to these 1000
Define success if the customer visits the store within five days

Disney will play the ad for 1000 customers like me. Three days later, if I visit the brand’s store, the brand will know that I visited its store, but it won’t know if I saw the ad or not.

In Disney’s data set of ads shown, my name will exist. A thousand other names will exist. In the brand’s data set of store visitors, my name will exist along with, say, 2000 others. We ask for a count of overlapping names and allow the same. We know how many saw the ad and came, but not who saw the ad and came.

Advertisers will not know if i watched Baywatch or the Nike ad on Disney’s platform. Disney will not know if I visited a Nike store. Advertisers can run a targeted campaign on a segment. They can measure the effectiveness of that campaign. In this solution, we didn’t reveal any personal information and got joint insights from the data. Let’s examine how we will implement this.

Implementing the data clean room solutions

Forbes says that every company is a software company. What does this mean to service partners of Snowflake like us? We create software to implement a specific solution at scale. We will provide services as software. You read it right. Service as software and not software as service. This will comprise the following.

An application to configure Snowflake accounts. A distributed clean room in the case above.
A self-service, business-friendly user interface for setting up rules. Which column to show or hide? Which column to aggregate (count of, sum of, mean of, etc.)? Which column to use as a common key between one data set and another? In our example, we cannot show the column “Name”. We can reveal the column “Favorite show” and the Aggregate Count of “Name.” We can join the two datasets on the column “Name.” We will set up these rules using a business-friendly interface. These rules will translate into query templates in Snowflake. This will scale as we can change or add rules without going to the IT department.

Alerts/messages about datasets shared and linked rules through SMS or email.
Stored procedures to validate query requests against the rules. Once before we send the request and once when received.
We can also use Snowflake’s data masking to mask data when necessary. Show only two digits of a phone number and mask the rest.
An audit dashboard for showing which queries ran, who ran them, and when.

LTIMindtree has built a Streamlit application that can, in the future, run natively on Snowflake to implement the above solution at scale.

LTIMindtree’s clean room solution

If you do not have data in Snowflake and want to upload it from your existing data warehouse to Snowflake, LTIMindtree has a solution for that as well.

Sumukh Guruprasad

Associate Principal-Data Engineering

Sumukh Guruprasad is a Business Technologist with 16+ years of experience. His specialty is to exploit technology capabilities to build business solutions. He holds an MBA from the Asian Institute of Management and is a Snowflake Snowpro certified advanced architect. Most recently, he was a core member of a leading bank's team that set up a Center of Excellence for Snowflake solutions to be consumed by different business units worldwide.

Latest Blogs

Inside the NVIDIA GTC 2025 Keynote – the Superbowl…

The energy and anticipation were evident even before entering the arena, and attendees had…

A Tectonic Shift: The Great Wealth Transfer

A tectonic shift in wealth is underway, and agility is the key factor that will distinguish…

Higher Education Marketing: Are You Keeping Up…

Educational institutions are at a crossroads where their future hinges on a single question:…

Exploring New Horizons for Alternative Investments…

In times of market unpredictability, alternative investments offer a valuable advantage by…

Blogs

Exploiting Data Network Effect Securely through Data Clean Rooms

Snowflake Data Clean Room

Unleashing data network effects in the advertising

Implementing the data clean room solutions

Blogger's Profile

Sumukh Guruprasad

More from Sumukh Guruprasad

Latest Blogs

Contact us

Blogs

Snowflake Data Clean Room

Unleashing data network effects in the advertising

Implementing the data clean room solutions

Blogger's Profile

Sumukh Guruprasad

More from Sumukh Guruprasad

Latest Blogs