In the context of privacy, we often talk about metadata as a weak link. While data can be easy to hide and encrypt, metadata is often far more difficult to conceal.
Metadata is data about data. For example, if this article is the data, its metadata will include information about its word count, what language it is written in, when the article was first published, and whether it is linked to an image. As you request this article from our server, the metadata of this transfer will include the time of the request and your IP address (or the IP address of your VPN service).
Metadata is very useful as it reduces the amount of information needed to process a file and can make especially large files easier to manage. But it can also be a threat to your privacy, and in many cases metadata includes all the information necessary to identify you and guess the nature of a relationship. For example, if metadata reveals that you received a call from a police station, followed by an unanswered call to your son’s mobile phone, followed by a call to a lawyer, a big portion of the story is already revealed, even if we haven’t listened to any of the contents of the conversations.
Types of metadata
There are two main types of metadata.
Structural metadata
Structural metadata is information relating to how data is stored.
Often structural metadata can be observed and calculated just from looking at the file. Let’s imagine the file we are looking at as a book: The structural metadata would include whether the cover is hard or soft, the shape of the book, and its weight and dimensions.
The structural metadata of a phone conversation would include the length and time of the conversation.
In the case of a digital image, the metadata would include the size and the file type.
Descriptive metadata
Descriptive metadata is additional information that helps humans and computers learn the contents of books and files.
A book’s descriptive metadata will include the title, author, printing data, edition, and possibly even a short summary in the back. The ISBN number is also part of this descriptive metadata.
A phone conversation would also have a lot of descriptive metadata attached to it, such as who made the call, who the call was to, and where the call was made from.
For an image, the descriptive metadata can be extremely detailed. It can include the manufacturer of the camera, any editing software used, lens aperture time, exposure time, orientation, color space, brightness, the owner of the camera, and even the GPS location of the image.
Examples of metadata
Metadata can be used to organize all kinds of digital information. And it really is used in a massive variety of ways. Here are some examples of how metadata is used by services you probably interact with every day.
Email: Every email you send and receive includes the sender and recipient’s name and email address, the time it was sent, the IP address it was sent from, and other message-specific data like the subject line. The metadata is used to send the message to the right place and then organize and display it correctly.
Phones: Telephone networks use metadata to connect phone calls and to log call data for billing and other purposes. Metadata might include the caller’s number, the time and duration of the call, and even the GPS location of the people talking.
Social networking: Have you noticed how third-party apps you sign up for via Facebook or Twitter always request access to your basic information, friend list and more? What they’re doing is accessing the metadata stored by your social networking account to identify you. Your Facebook likes and interests can also be considered as personal metadata about you, which is used by the service to target ads and page suggestions that might interest you.
Web pages: Metadata is pretty much what makes the internet searchable. Typical web page metadata includes the page title, a description, the date published, keywords, and much, much more. This metadata is used by search engines to catalog the web so you can search it easily.
Digital media libraries: If you have an account with iTunes, Netflix, or other entertainment providers, it’s metadata that keeps all your music and movies organized and nicely displayed. Typical mp3 metadata includes the artist’s name, song title, album name, year of release, and more.
How to remove or reduce metadata
Reducing your trail of metadata can be difficult. Generally, the more a service knows about you, the more metadata is created with your every move.
Use software, not online services
When using online services, you are generating metadata that the service can use to learn more about you for profit. Instead of using web-based tools and services, you can switch to open-source software for tasks ranging from document and spreadsheet processing to image editing and storing Bitcoin.
Remove metadata from files
For many files, such as images and documents, the program creating these files will embed some metadata revealing additional information. You can remove such metadata with tools such as these:
- Mac OS X: ImageOptim
- Windows: Microsoft Office Document Inspector
- Linux: Metadata Anonymisation Toolkit
Create noise
The most advanced and effective way to make metadata worthless is to induce noise—producing additional data to create inaccurate metadata. If your computer sends out various encrypted requests for web pages every second, it will be hard to deduce which sites you were actually reading and frequenting. However, doing this properly, with a high degree of randomness, is difficult; it might still be possible to filter your actions from those of the automated machine.
Beware of metadata
When revealing information about yourself, be aware that this data can often be used to identify you. Even when the contents of your communications are encrypted, there might still be enough information left visible to find out more about who you are and what you are up to.
Comments
Hi Lexie
I found your Metadata article very interesting and you showed us how to remove metadata from files, but what I would like to know is what are the implications of removing it? Will it make it difficult to upload or download files/photos to/from a website or send in an email? Are there any negative implications or is it safe to always remove it?
Thanks, and keep up the good work! (And enjoy the pasta:-))
Would much appreciate a bit more detail about ImageOptim. What exactly does it do, how should it be configured, what are pitfalls of using it etc. Like any other piece of software, there are tips and tricks and it’s very difficult to keep up with the many idiosyncrasies of the various tools you offer for consideration. A deep dive tutorial(admittedly perhaps beyond your intention or time availability to produce such material.) would be helpful. To just take the word of this blogger that this should be used is asking a bit much.
Can you elaborate on more about using software, not online services?
Like which softwares replaces online services?