As the Facebook / Cambridge Analytica scandal rolls into a broader examination of Facebook and it’s data practices in general, a critical question remains unanswered, “Just how much does Facebook know about you?”.
Over two days of appearances before the US Senate and House, Facebook CEO Mark Zuckerberg provides some answers and a lot of promises to follow up.
The one point he repeated was that Facebook has allowed you to download your information for quite a while now. In fact, there have been a few articles recently on this data download and its contents.
One thing is for sure: this data download does not contain all of the information that Facebook knows about you. Only “your information”. That might seem like a picky distinction. It’s not.
Critical Definition
“Your information” consists of things you’ve uploaded and posted to the social network.
“Information about you” consists of the things that Facebook knows about you from your activities on the network and what it has inferred from that behaviour and other data sets.
This distinction came up in a conversation I recently had with Robyn Breshnahan on CBC Radio’s Ottawa Morning.
When you choose to upload content or post on Facebook, you also get the option to select the audience. Most users stick with the current set of defaults; Public, Friends, Friends Except…
But this feature is far more powerful then it first seems. You can use the little-known Friend List feature to setup subsets of your friends that have different views of your profile as well as being targets for new posts.
The challenge — and you’ll see this challenge a lot on the platform — is that the feature is somewhat hidden away.
Regardless of if you’ve found this or other privacy features, once your data in on the platform, Facebook will analyze it. All of the user-facing privacy features are used to control what other users on the network can see…not what Facebook itself sees.
So if you upload those cute photos of your kids and share them only with your family, Facebook still knows that you have kids and the people who you trust with details of their lives.
More attributes on the profile the network is continually building about you. The more accurate the profile, the more valuable it is for Facebook.
Data Download
Back to Mr Zuckerberg’ss appearance and his repeated claim that you can download your information from the network any time you want.
True. It’s straightforward to do.
Four simple steps and you’ll get an email about ten minutes later with a unique, expiring link to your data:
- Click the arrow in the top right of Facebook.com
- Select Settings
- Click “Download a copy of your Facebook data” (under General Account Settings)
- Click “Start My Archive”
You’ll get an email in about 10 minutes with a custom link to your data download.
So what’s in that download?
Once you open up the .zip file, navigate to the index.htm file, and open it up.
You’ll be presented with the main interface that Facebook provides for the data. It’s a barebones web interface that helps navigate the included files;
html/
ads.htm
apps.htm
contact_info.htm
events.htm
friends.htm
messages.htm
photos.htm
pokes.htm
security.htm
timeline.htm
videos.htm
index.htm
messages/
*.html
files/
*.*
gifs/
*.gif
photos/
*.jpg
stickers/
*.png
videos/
*.mp4
photos/
1.html
1/
*.jpg
...
videos/
*.mp4
The setup is pretty self-explanatory, but since the photos are such low resolution, you’re better off just exploring the data through the web interface provided.
In addition to your information, you’ll get a hint of what Facebook knows about you.
You get a copy of all of the content you’ve uploaded (though as a low resolution version) and a hint of what Facebook knows about you.
Data Footprint
Determining what information is out there about you is difficult. One reason for that is the different storage policies for different types of data that Facebook stores.
From a technical and business perceptive, this is reasonable. Some types of data are no longer useful after a specific period. Other types of data can be expensive to process and store.
As you browse your data download, you’ll notice that some data is barebones (e.g., “Ad topics”), some is time-limited (e.g., “Session Updates”), while others are comprehensive (e.g., “Messages”).
Using the web interface, you start to get an impression of just how many data points you’ve provided Facebook and a hint of what Facebook has extracted about your behaviour.
But there’s a lot more hidden under the covers in this download.
Downloaded Insights
I’ve published a simple script that will pull out some of the behavioural insights from your data download.
Run this tool against your data download (don’t worry, everything stays locally) and it will write some .csv. files to the output directory.
Running this against my download generated the following summary;
{
"photos": {
"earliest": "2014-04-26-14-40-00",
"latest": "2018-03-20-08-14-00",
"with_location": 70,
"with_no_location": 291,
"total_photos": 361
},
"locations": {
"total": 54,
"unique": 54,
"earliest": "2017-01-31-15-57-00",
"latest": "2018-03-20-20-36-00"
},
"sessions": {
"total": 950,
"earliest": "2018-02-01-14-37-00",
"latest": "2018-03-24-13-14-00"
},
"timeline": {
"total": 463,
"earliest": "2007-04-11-17-32-00",
"latest": "2018-03-22-11-57-00"
},
"messenger_conversations": {
"total": 358,
"earliest": "2007-04-11-17-25-00",
"latest": "2018-03-21-17-07-00"
},
"videos": {
"total": 80,
"earliest": "2012-08-27-12-27-00",
"latest": "2018-01-03-08-24-00"
},
"profile_created": "2007-04-10-16-40-00",
"ads": {
"ad_topics": 129,
"advertisers_with_your_info": 129
}
}
I’m a light Facebook user and have my personal profile restricted to family and friends. I use a Page (facebook.com/marknca) for my professional presence on the network.
But Facebook still knows a startling amount about me.
Locations
Hidden away in the security information is a listing of recent IPs from which Facebook has seen activity on my account. Based on these IPs, they’ve inferred a location.
Geolocation based on IP addresses is traditionally inaccurate.
Addresses usually correlate with the location of the organization or entity that was assigned the address. That’s not necessarily where the address is in use.
Think about your ISP. They have been assigned numerous IP address blocks that they then use for various customers. These would all report back as being “located” at the ISP.
But that’s not the case with the Facebook data.
The geographic locations contained in the data download are far more accurate than expected. In my data set, the locations were almost all accurate within a few blocks of wherever I was around the world.
This implies that Facebook is augmenting this data with additional insights. Given user’s ability to “check-in” to places and frequent posting of photos that contain geographic coordinates, it’s not a leap to assume that these data points are correlated to augment geo-location by IP address.
Sessions
In addition to inferring the location from IPs, Facebook keeps a moving window of session login and (more importantly) update information.
Most users rarely log in to Facebook as an event. Their sessions persist for weeks if not months. Facebook tracks both events. Each time you load the mobile app or refresh/interact with the website or interact with a website using Facebook code, your session is updated.
Each update includes;
- a timestamp
- the IP address the request came from
- the browser string for the request
- the session cookie used
This gives Facebook an accurate look at your usage habits.
This heat map displays two months of my usage, February and March 2018.
The X axis displays unique IP addresses. The Y address is the timeline (one segment for each day). The darker the blue, the more activity recorded that day.
Immediately, you can tell when I was travelling and when I was home for an extended period. This information shows Facebook more about my behaviour (I travel regularly) and when the best time to show me various ads or content would be.
Profiling
Facebook currently has 2.1 billion active users around the world. It’s safe to say that while some portion of those users understand that they are being profiled and track, the vast majority are in the dark as to the extent of what Facebook knows about them.
Stepping back, it’s obvious that Facebook would want to build profiles as accurate as possible. Facebook’s business is selling access to these users based on their profile.
The businesses that buy advertising on the platform are Facebook’s customers.
Facebook will take any action that improves their business and satisfies their customers. The line they are required to walk is to ensure that the user base — who generate the raw material for Facebook’s product — are kept content and active on the platform.
This week’s appearance in front of the US Senate and House adds a small twist to that balancing act. Facebook must now be hyper-aware of the political impact of it’s actions, in addition to how its users view its actions.
Regardless, Facebook is and will be for the foreseeable future, the dominant player in the social media. Rightly or wrongly, social media networks are ad driven.
The true cost of that? Our privacy. Whether we consent or not.