4.5.3. Demo: Data from a Bluesky Post#
Choose Social Media Platform: Reddit | Discord | Bluesky | No Coding
Let’s see what the data actually looks like from a Bluesky Post!
Helper function for atproto links#
Before we begin though, we’ll need helper functions to turn a link to a Bluesky feed into the format expected by the atproto library, and get a weblink back from the atproto format link (and the reverse).
NOTE: You don’t need to worry about the details of how this works, it just is here to make the code later easier to use.
import re #load a "regular expression" library for helping to parse text
from atproto import IdResolver # Load the atproto IdResolver library to get offical ATProto user IDs
# function to convert a feed from a weblink url to the special atproto "at" URI
def getATFeedLinkFromURL(url):
# Get the user did and feed id from the weblink url
match = re.search(r'https://bsky.app/profile/([^/]+)/feed/([^/]+)', url)
if not match:
raise ValueError("Invalid Bluesky feed URL format.")
user_handle, feed_id = match.groups()
# Get the official atproto user ID (did) from the handle
resolver = IdResolver()
did = resolver.handle.resolve(user_handle)
if not did:
raise ValueError(f'Could not resolve DID for handle "{user_handle}".')
# Construct the at:// URI
post_uri = f"at://{did}/app.bsky.feed.generator/{feed_id}"
return post_uri
# function to convert a post's special atproto "at" URI to a weblink url
def getWebLinkFromPost(post):
# Get the user id and post id from the weblink url
match = re.search(r'at://([^/]+)/app.bsky.feed.post/([^/]+)', post.uri)
if not match:
raise ValueError("Invalid Bluesky atproto post URL format.")
user_id, post_id = match.groups()
post_uri = f"https://bsky.app/profile/{user_id}/post/{post_id}"
return post_uri
Log into atproto (or fake atproto)#
Now we need to do our normal Bluesky login steps (and optional fake atproto step)
Load atproto library#
# Load some code called "Client" from the "atproto" library that will help us work with Bluesky
from atproto import Client
(Optional) Step 1b: Make a fake Bluesky connection with the fake_atproto library#
For testing purposes, we”ve added this line of code, which loads a fake version of atproto, so it wont actually connect to Bluesky. If you want to try to actually connect to Bluesky, don’t run this line of code.
%run ../../fake_apis/fake_atproto.ipynb
Step 2: Login to Bluesky#
To use this on your real Bluesky account, copy your bluesky account name and login into the code below, replacing our fake bluesky name and password.
# Login to Bluesky
# TODO: put your account name and password below
client = Client(base_url="https://bsky.social")
client.login("your_account_name.bsky.social", "m#5@_fake_bsky_password_$%Ds")
Find Bluesky data#
Below I have the code to find a recent Bluesky post from the feed Animal by the user shouldhaveanimal.bsky.social.
Don’t worry if you don’t understand this part yet. We are just doing this, so we can get to the point of seeing what tweet data looks like.
Note: If you run this on real Bluesky, we can’t gurantee anything about how offensive what you might find is.
feedUrl = "https://bsky.app/profile/shouldhaveanimal.bsky.social/feed/aaab56iiatpdo"
atFeedLink = getATFeedLinkFromURL(feedUrl)
feed = client.app.bsky.feed.get_feed({'feed': atFeedLink}).feed
recent_post = feed[0].post
Look at data in Bluesky submission#
Now we will look at some of the data that came back!
Again, don’t worry too much about the code, we want to look at the data and data types.
post text:#
display("The data type of the post text is: " + type(recent_post.record.text).__name__)
display("The post text is: " + recent_post.record.text)
'The data type of the post text is: str'
'The post text is: Look at my cute dog!'
As you can see above, the text of a post is a string (str
) data type.
post content id (cid)#
display("The data type of the post content id is: " + type(recent_post.cid).__name__)
display("The post content id is: " + str(recent_post.cid))
'The data type of the post content id is: str'
'The post content id is: 904tjwdf093j'
The post content id is a piece of text (str
) that looks like random letters and numbers. This is how the post is referred to inside Bluesky’s computers. So if someone is commenting on a post, Bluesky just puts uses content ID they were commenting on to see where to display it.
post created at#
display("The data type of the post created_at is: " + type(recent_post.record.created_at).__name__)
display("The created_at at is: " + str(recent_post.record.created_at))
'The data type of the post created_at is: str'
'The created_at at is: 2014-01-01'
The created at time for the submission is a string (str
), which is in Coordinated Universal Time zone.
number of likes#
display("The data type of the number of likes is: " + type(recent_post.like_count).__name__)
display("The number of likes is: " + str(recent_post.like_count))
'The data type of the number of likes is: int'
'The number of likes is: 23'
The number of likes is is a whole number (int
).
number of replies#
display("The data type of the number of replies is: " + type(recent_post.reply_count).__name__)
display("The number of replies is: " + str(recent_post.reply_count))
'The data type of the number of replies is: int'
'The number of replies is: 7'
The number of replies is is a whole number (int
). Note: You can also get a data structure of all the comments, but we will look at that later.
number of reposts#
display("The data type of the number of reposts is: " + type(recent_post.repost_count).__name__)
display("The number of reposts is: " + str(recent_post.repost_count))
'The data type of the number of reposts is: int'
'The number of reposts is: 5'
The number of reposts is a whole number (int
).
Link to post#
We can use one of our helper functions to get a website url link to the post.
webLink = getWebLinkFromPost(recent_post)
display("The data type of the post url weblink is: " + type(webLink).__name__)
display("The number of the post url weblink is: " + str(webLink))
'The data type of the post url weblink is: str'
'The number of the post url weblink is: https://bsky.app/profile/did:plc:fake_user/post/fake_post_id'
The submission url is a string (str
).
Still more!#
In addition to the data we looked at above, there are even more options for bluesky posts. The documentation seems them seems a bit incomplete, but you can see more info about posts here: