5.4.2. Demo: Dictionaries#
Choose Social Media Platform: Reddit | Discord | Bluesky | No Coding
We’ve talked about lists, but the other data organization we need to work with social media data is dictionaries
.
As we mentioned in chapter 4, dictionaries allow us to combine pieces of information by naming them (sort of like variables).
So for example, the information about a user might have the following pieces of data:
Username
Twitter handle
Profile Picture:
Follows
Python has two ways of doing these types of dictionaries: dict
and objects
Dictionaries (dict
)#
We can create dictionaries in Python by storing values into key
s inside of curly braces {
}
, like this:
user_1 = {
"username": "kylethayer",
"twitter_handle": "@kylemthayer",
"profile_picture": "kylethayer.jpg",
"follows": ["@SusanNotess", "@UW", "@UW_iSchool", "@ajlunited"]
}
In the code above, inside of the curly braces are a set of lines. Each line has a string (the key
, or name of the value), followed by a colon (:
), followed by a value that is to be saved for the key. At the end of all but the last line is a comma (,
) which indicates that another key
and value will come next.
Now that we’ve saved some values for some keys in the dictionary now saved in user_1, we can look up the values by using square brackets ([
, ]
) with the key name inside, like this:
user_1_username = user_1["username"]
display(user_1_username)
'kylethayer'
user_1_handle = user_1["twitter_handle"]
display(user_1_handle)
'@kylemthayer'
user_1_picture = user_1["profile_picture"]
display(user_1_picture)
'kylethayer.jpg'
user_1_follows = user_1["follows"]
display(user_1_follows)
['@SusanNotess', '@UW', '@UW_iSchool', '@ajlunited']
Objects#
The other way of saving information that works similarly in Python is through an object. We won’t be creating any in this book, but we will have to get data from some.
The main difference from what we will need is that while in dictionaries we use square brackets and put the key name in quotes as a string (e.g., user_1["profile_picture"]
), in an object you use a period (.
) and don’t put they key name (called a field
) in quotes (e.g., user_1.profile_picture
)
We have already seen code that used this period to get something from an object a few times, specifically getting functions from them, like:
client.send_post(...
normal_message.upper()
When we go through data from Bluesky, sometimes we will need to use .
to get parts of the information out of objects, and sometimes we will need to use [" "]
to get information out of dictionaries.
Looping through lists of dictionaries#
Now that we’ve seen loops, lists, and dictionaries, we can go to Bluesky, load a feed and look through multiple posts.
But first, we need our helper functions for converting Bluesky feed weblink urls and atproto uris:
helper function for atproto links#
NOTE: You don’t need to worry about the details of how this works, it just is here to make the code later easier to use.
import re #load a "regular expression" library for helping to parse text
from atproto import IdResolver # Load the atproto IdResolver library to get offical ATProto user IDs
# function to convert a feed from a weblink url to the special atproto "at" URI
def getATFeedLinkFromURL(url):
# Get the user did and feed id from the weblink url
match = re.search(r'https://bsky.app/profile/([^/]+)/feed/([^/]+)', url)
if not match:
raise ValueError("Invalid Bluesky feed URL format.")
user_handle, feed_id = match.groups()
# Get the official atproto user ID (did) from the handle
resolver = IdResolver()
did = resolver.handle.resolve(user_handle)
if not did:
raise ValueError(f'Could not resolve DID for handle "{user_handle}".')
# Construct the at:// URI
post_uri = f"at://{did}/app.bsky.feed.generator/{feed_id}"
return post_uri
# function to convert a post's special atproto "at" URI to a weblink url
def getWebLinkFromPost(post):
# Get the user id and post id from the weblink url
match = re.search(r'at://([^/]+)/app.bsky.feed.post/([^/]+)', post.uri)
if not match:
raise ValueError("Invalid Bluesky atproto post URL format.")
user_id, post_id = match.groups()
post_uri = f"https://bsky.app/profile/{user_id}/post/{post_id}"
return post_uri
Now we can continue logging in to Bluesky and look through multiple posts.
load atproto library#
# Load some code called "Client" from the "atproto" library that will help us work with Bluesky
from atproto import Client
(optional) make a fake Bluesky connection with the fake_atproto library#
For testing purposes, we”ve added this line of code, which loads a fake version of atproto, so it wont actually connect to Bluesky. If you want to try to actually connect to Bluesky, don’t run this line of code.
%run ../../fake_apis/fake_atproto.ipynb
login to Bluesky#
# Login to Bluesky
# TODO: put your account name and password below
client = Client(base_url="https://bsky.social")
client.login("your_account_name.bsky.social", "m#5@_fake_bsky_password_$%Ds")
find a list of posts from a feed#
We can now load a feed and find a list of posts.
Note: If you run this on real Bluesky, we can’t gurantee anything about how offensive what you might find is.
feedUrl = "https://bsky.app/profile/shouldhaveanimal.bsky.social/feed/aaab56iiatpdo"
atFeedLink = getATFeedLinkFromURL(feedUrl)
post_info_list = client.app.bsky.feed.get_feed({'feed': atFeedLink}).feed
Loop through the list of posts#
The variable post_info_list
now has a list of Bluesky post info. So we can use a for loop to go through each post, and then use .
to access info from each post (other pieces of information would need [" "]
to access).
For each of the post, we will use print
to display information about the tweet
for post_info in post_info_list:
print("Info for submission with cid: " + str(post_info.post.cid))
print(" author handle: " + str(post_info.post.author.handle))
print(" text: " + str(post_info.post.record.text))
print(" created at: " + str(post_info.post.record.created_at))
print(" number of likes: " + str(post_info.post.like_count))
print(" number of replies: " + str(post_info.post.reply_count))
print(" number of reposts: " + str(post_info.post.repost_count))
print(" url: " + str(getWebLinkFromPost(post_info.post)))
print()
Info for submission with cid: 904tjwdf093j
author handle: fake_user.bsky.social
text: Look at my cute dog!
created at: 2014-01-01
number of likes: 23
number of replies: 7
number of reposts: 5
url: https://bsky.app/profile/did:plc:fake_user/post/fake_post_id
Info for submission with cid: 704tjwdf093j
author handle: pretend_user.bsky.social
text: I like lizards
created at: 2014-02-01
number of likes: 23
number of replies: 7
number of reposts: 5
url: https://bsky.app/profile/did:plc:pretend_user/post/fake_post_id
Info for submission with cid: 534tjwdf093j
author handle: imaginary_user.bsky.social
text: Look at my cute cat!
created at: 2014-03-01
number of likes: 23
number of replies: 7
number of reposts: 5
url: https://bsky.app/profile/did:plc:imaginary_user/post/fake_post_id