4.1.1. Data in a Social Media Post#
Let’s look at all the data that we can see when we look at a tweet on what was Twitter:
In this screenshot of Twitter, we can see the following information:
The account that posted it:
User handle is @dog_rates
User name is WeRateDogs®
User profile picture is a circular photo of a white dog
This user has a blue checkmark
The date of the tweet: Feb 10, 2020
The text of the tweet: “This is Woods. He’s here to help with the dishes. Specifically, the pre-rinse, where he licks every item he can. 12/10”
The photos in the tweet: Three photos of a puppy on a dishwasher
The number of replies: 1,533
The number of retweets: 26.2K
The number of likes: 197.8K
Data and Metadata#
One way we can categorize the data in this tweet is to separate it into data and metadata, like this:
Metadata is information about some data. So we often think about a dataset as consisting of the main pieces of data (whatever those are in a specific situation), and whatever other information we have about that data (metadata).
For example:
If we think of a tweet’s contents (text and photos) as the main data of a tweet, then additional information such as the user, time, and responses would be considered metadata.
If we download information about a set of tweets (text, user, time, etc.) to analyze later, we might consider that set of information as the main data, and our metadata might be information about our download process, such as when we collected the tweet information, which search term we used to find it, etc.
Now that we’ve looked some at the data in a tweet, let’s look next at how different pieces of this information are saved.