4.5.3. Demo: Data from a Bluesky Post#
Choose Social Media Platform: Bluesky | Reddit | Discord | No Coding
Let’s see what the data actually looks like from a Bluesky Post!
Helper function for atproto links#
Before we begin though, we’ll need helper functions to turn a link to a Bluesky feed into the format expected by the atproto library, and get a weblink back from the atproto format link (and the reverse).
NOTE: You don’t need to worry about the details of how this works, it just is here to make the code later easier to use.
import re #load a "regular expression" library for helping to parse text
def get_at_post_link_from_url(url):
# Extract username and post ID from the URL
match = re.search(r'https://bsky.app/profile/([^/]+)/post/([^/]+)', url)
if not match:
raise ValueError("Invalid Bluesky post URL format.")
user_handle, post_id = match.groups()
author_profile = client.app.bsky.actor.get_profile({'actor': user_handle})
# Construct the at:// URI
post_uri = f"at://{author_profile.did}/app.bsky.feed.post/{post_id}"
return post_uri
# function to convert a post's special atproto "at" URI to a weblink url
def get_weblink_from_post(post):
# Get the user id and post id from the weblink url
match = re.search(r'at://([^/]+)/app.bsky.feed.post/([^/]+)', post.uri)
if not match:
raise ValueError("Invalid Bluesky atproto post URL format.")
user_id, post_id = match.groups()
post_uri = f"https://bsky.app/profile/{user_id}/post/{post_id}"
return post_uri
Log into atproto (or fake atproto)#
Now we need to do our normal Bluesky login steps (and optional fake atproto step)
Load atproto library#
# Load some code called "Client" from the "atproto" library that will help us work with Bluesky
from atproto import Client
(Optional) Step 1b: Make a fake Bluesky connection with the fake_atproto library#
For testing purposes, we”ve added this line of code, which loads a fake version of atproto, so it wont actually connect to Bluesky. If you want to try to actually connect to Bluesky, don’t run this line of code.
%run ../../fake_apis/fake_atproto.ipynb
Step 2: Login to Bluesky#
To use this on your real Bluesky account, copy your bluesky account name and login into the code below, replacing our fake bluesky name and password.
# Login to Bluesky
# TODO: put your account name and password below
client = Client(base_url="https://bsky.social")
client.login("your_account_name.bsky.social", "m#5@_fake_bsky_password_$%Ds")
Find Bluesky data#
Below we have the code to find data about a specific post on BlueSky. You can get these web link urls for posts by pressing the “share” button on BlueSky.
Don’t worry if you don’t understand this part yet. We are just doing this, so we can get to the point of seeing what bluesky post data looks like.
# The weblink url of a specific post
post_url = "https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y"
# Get the special at version of the weblink
post_at_link = get_at_post_link_from_url(post_url)
# load data for the post we linked (the function lets us load multiple posts)
post_results = client.get_posts([post_at_link])
# get the first result (since we know we only asked about one post)
post = post_results.posts[0]
Look at data in Bluesky submission#
Now we will look at some of the data that came back!
Again, don’t worry too much about the code, we want to look at the data and data types.
post text:#
display("The data type of the post text is: " + type(post.record.text).__name__)
display("The post text is: " + post.record.text)
'The data type of the post text is: str'
'The post text is: This is a fake fact about movie costuming and I find it so interesting!'
As you can see above, the text of a post is a string (str
) data type.
post content id (cid)#
display("The data type of the post content id is: " + type(post.cid).__name__)
display("The post content id is: " + str(post.cid))
'The data type of the post content id is: str'
'The post content id is: 904tjwdf093j'
The post content id is a piece of text (str
) that looks like random letters and numbers. This is how the post is referred to inside Bluesky’s computers. So if someone is commenting on a post, Bluesky just puts uses content ID they were commenting on to see where to display it.
post created at#
display("The data type of the post created_at is: " + type(post.record.created_at).__name__)
display("The created_at at is: " + str(post.record.created_at))
'The data type of the post created_at is: str'
'The created_at at is: 2014-01-01'
The created at time for the submission is a string (str
), which is in Coordinated Universal Time zone.
number of likes#
display("The data type of the number of likes is: " + type(post.like_count).__name__)
display("The number of likes is: " + str(post.like_count))
'The data type of the number of likes is: int'
'The number of likes is: 25'
The number of likes is is a whole number (int
).
number of replies#
display("The data type of the number of replies is: " + type(post.reply_count).__name__)
display("The number of replies is: " + str(post.reply_count))
'The data type of the number of replies is: int'
'The number of replies is: 2'
The number of replies is is a whole number (int
). Note: You can also get a data structure of all the comments, but we will look at that later.
number of reposts#
display("The data type of the number of reposts is: " + type(post.repost_count).__name__)
display("The number of reposts is: " + str(post.repost_count))
'The data type of the number of reposts is: int'
'The number of reposts is: 13'
The number of reposts is a whole number (int
).
Link to post#
We can use one of our helper functions to get a website url link to the post.
web_link = get_weblink_from_post(post)
display("The data type of the post url weblink is: " + type(web_link).__name__)
display("The number of the post url weblink is: " + str(web_link))
'The data type of the post url weblink is: str'
'The number of the post url weblink is: https://bsky.app/profile/did:plc:fake_user.bsky.social/post/fake_post_id'
The submission url is a string (str
).
Still more!#
In addition to the data we looked at above, there are even more options for bluesky posts. The documentation seems them seems a bit incomplete, but you can see more info about posts here: