A5: Best Comments#
In this assignment you will be modifying a recursive function that prints replies on a Bluesky post. Your goal will be to only show the best replies. It will be up to you to decide what rules you use to decide which comments are the best comments.
Choose Social Media Platform: Bluesky | Reddit | Discord | No Coding
Helper functions#
We’ll need a few helper functions before we get started
helper function to display text in an indented box#
(You don’t need to worry about how this works. This is that function that helps display posts in indented boxes)
from IPython.display import HTML, Image, display
import html
def display_indented(text, left_margin=0, color="white"):
display(
HTML(
"<pre style='border:solid 1px;padding:3px;margin-top:3px;margin-bottom:3px;margin-left:"+ str(left_margin) + "px;background-color:"+color+"'>" +
html.escape(text) +
"</pre>"
)
)
helper function for atproto links#
NOTE: You don’t need to worry about the details of how this works, it just is here to make the code later easier to use.
In order to make this as simple as possible, we’re providing a helper function to turn the url for a bluesky post (easy to get) into a uri that the bluesky API understands (not as easy to get). You also don’t need to worry about how this works!
We’ll also a provide a helper function to get the author of a post (you can use this in your should_display() function!)
import re #load a "regular expression" library for helping to parse text
from atproto import IdResolver # Load the atproto IdResolver library to get offical ATProto user IDs
def get_at_post_link_from_url(url):
# Initialize and log in with the client
# Extract username and post ID from the URL
match = re.search(r'https://bsky.app/profile/([^/]+)/post/([^/]+)', url)
if not match:
raise ValueError("Invalid Bluesky post URL format.")
user_handle, post_id = match.groups()
# Construct the at:// URI
post_uri = f"at://{user_handle}/app.bsky.feed.post/{post_id}"
return post_uri
def get_author_profle_from_post(post):
author_did = post.author.did
author_profile = client.app.bsky.actor.get_profile({'actor': author_did})
return author_profile
# function to convert a feed from a weblink url to the special atproto "at" URI
def get_at_feed_link_from_url(url):
# Get the user did and feed id from the weblink url
match = re.search(r'https://bsky.app/profile/([^/]+)/feed/([^/]+)', url)
if not match:
raise ValueError("Invalid Bluesky feed URL format.")
user_handle, feed_id = match.groups()
# Get the official atproto user ID (did) from the handle
resolver = IdResolver()
did = resolver.handle.resolve(user_handle)
if not did:
raise ValueError(f'Could not resolve DID for handle "{user_handle}".')
# Construct the at:// URI
post_uri = f"at://{did}/app.bsky.feed.generator/{feed_id}"
Bluesky Setup#
Now we can continue logging in to Bluesky and look through multiple posts.
load atproto library#
# Load some code called "Client" from the "atproto" library that will help us work with Bluesky
from atproto import Client
(optional) make a fake Bluesky connection with the fake_atproto library#
For testing purposes, we”ve added this line of code, which loads a fake version of atproto, so it wont actually connect to Bluesky. If you want to try to actually connect to Bluesky, don’t run this line of code.
%run ../../../../fake_apis/fake_atproto.ipynb
login to Bluesky#
# Login to Bluesky
# TODO: Put your bluesky account info in the bluesky_keys.py file
%run bluesky_keys.py
client = Client(base_url="https://bsky.social")
client.login(handle, password)
Code to print a post with all comments and replies#
We are providing these function that recursively prints a post and all replies, but depends on whether a should_display
function returns True or False to decide if it actually displays a post. (Note: the should_display
function is defined later in this notebook. If a should_display
comes back false for a post, the post wont be displayed, nor will any replies to it)
The print_post_thread
is a function that takes a Bluesky Post weblink (url) (instructions on where to get one below), downloads the thread that follows that post, and then uses the print_post_and_replies
function to print out that post and the replies to that post.
def print_post_thread(postUrl, show_hidden=False):
at_post_link = get_at_post_link_from_url(postUrl)
# Fetch the post details
post_data = client.get_post_thread(at_post_link)
print_post_and_replies(post_data.thread, show_hidden=show_hidden)
The print_post_and_replies
function takes a given post and recursively prints that post as well as all replies to that post (which will also print all the replies to those replies, etc.)
def print_post_and_replies(postInfo, num_indents=0, show_hidden=False):
# make sure this post isn't blocked (since we can't read blocked posts)
if not (hasattr(postInfo,'blocked') and postInfo.blocked):
post = postInfo.post
replies = postInfo.replies
# If replies is None, make it an empty array (so the for loop later doesn't crash)
if not replies:
replies = []
display_text = (
post.record.text + "\n" +
"-- " + str(post.author.display_name) + " (" + str(post.author.handle) + ")\n" +
" (likes: " + str(post.like_count) +
", replies: " + str(post.reply_count) +
", reposts: " + str(post.repost_count) +
", quotes: " + str(post.quote_count) +
") - "
)
if should_display(post):
display_indented(display_text, num_indents*20)
for reply in replies:
print_post_and_replies(reply, num_indents = num_indents + 1, show_hidden=show_hidden)
elif(show_hidden):
display_indented(display_text, num_indents*20, color='LightCoral')
TODO: Create Your Content Moderation Algorithm#
Your job is to invent and implement your own rule inside the should_display
function for what post count as the “best posts” and therefore should be displayed. The rule can be complicated or simple, it just can’t be the same as the current rule. You can aim for focusing on only hiding a few posts that you judge are bad, or for only showing a few posts you judge are the very best, or a combination of those.
When you are making your rule you may want to use different comparison operators (like == for equals, > for greater than, etc.) and different logical operators (like and
for both things must be true, or
for at least one thing must be true, etc.). See a list of python operators at w3schools
Some things you can use when you are deciding whether to display a post or not:
The text of the commnet: post.record.text
The likes of the comment: post.like_count
The number of replies: post.reply_count
The number of reposts: post.repost_count
The number of quotes: post.quote_count
author display name: post.author.display_name
author handle: post.author.handle
You can also look up more about the author by uncommenting the optional author_profile lookup line (author_profile = get_author_profle_from_post(post)
). Then you can get:
author bio/description: author.description
author’s number of followers: author.followers_count
author’s number of people they folllow: author.follows_count
author’s number of posts: author.posts_count
You can use any other information you can figure out about the post as well, such as the sentiment analysis that was demoed previously.
def should_display(post):
#TODO: Make your own rule
# optional code below: Get the full author profile (uncomment to use)
# author_profile = get_author_profle_from_post(post)
# for a demonstration, we only display comments with the lower case letters "and"
# Note: that the way we are checking here, a comment that has the word "sand" would show up
# since "and" appears in "sand"
has_letters_and = "and" in post.record.text
if(has_letters_and):
return True
else:
return False
Getting urls to test#
In order to use our function, we need to grab the url of a Bluesky post to test it with. Once you find the post, find the ‘Copy Link to Post’ option to get a web url for the post.
It should be something like: https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y
Then paste the entire URL in as the string for the first argument to the print_post_thread
function, as in the example below. Try it out!
print_post_thread("https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y", False)
This is a fake fact about movie costuming and I find it so interesting! -- Imaginary User (imaginary_user.bsky.social) (likes: 25, replies: 2, reposts: 13, quotes: 7) -
I saw a completely unrelated movie once and I liked it! -- Pretend User (pretend_user.bsky.social) (likes: 1, replies: 1, reposts: 0, quotes: 0) -
If we also want to see what comments are being skipped, we can use an optional argument for print_post_and_replies
by setting show_hidden = True
, and the comments that are being skipped will show up with a reddish background.
print_post_thread("https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y", True)
This is a fake fact about movie costuming and I find it so interesting! -- Imaginary User (imaginary_user.bsky.social) (likes: 25, replies: 2, reposts: 13, quotes: 7) -
Wow! That is a cool fake fact! -- Fake User (fake_user.bsky.social) (likes: 6, replies: 2, reposts: 5, quotes: 1) -
I saw a completely unrelated movie once and I liked it! -- Pretend User (pretend_user.bsky.social) (likes: 1, replies: 1, reposts: 0, quotes: 0) -
I don't see how that's relevant -- Fake User (fake_user.bsky.social) (likes: 2, replies: 0, reposts: 1, quotes: 1) -
Test it out with 3 Bluesky threads!#
Now, after you’ve modified the should_display
, try testing out your algorithm on three new posts (make sure they have replies!), answering follow up questions after each one.
In the sections below, replace the ?????
s with a bluesky url, and run the code. Then answer the questions about how that went.
At the very end will be more reflection questions.
TODO: Print bluesky thread 1#
print_post_thread('https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y', show_hidden = True)
This is a fake fact about movie costuming and I find it so interesting! -- Imaginary User (imaginary_user.bsky.social) (likes: 25, replies: 2, reposts: 13, quotes: 7) -
Wow! That is a cool fake fact! -- Fake User (fake_user.bsky.social) (likes: 6, replies: 2, reposts: 5, quotes: 1) -
I saw a completely unrelated movie once and I liked it! -- Pretend User (pretend_user.bsky.social) (likes: 1, replies: 1, reposts: 0, quotes: 0) -
I don't see how that's relevant -- Fake User (fake_user.bsky.social) (likes: 2, replies: 0, reposts: 1, quotes: 1) -
TODO: Bluesky thread 1 follow-up questions#
Write an answer in response to each of these questions (you can edit this text by double clicking it):
Look through the output of print_post_thread()
based on your modified should_display
function.
Did your function tend to keep most posts or tend to hide most posts?
TODO: Your answer here
Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?
TODO: Your answer here
TODO: Print bluesky thread 2#
print_post_thread('https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y', show_hidden = True)
This is a fake fact about movie costuming and I find it so interesting! -- Imaginary User (imaginary_user.bsky.social) (likes: 25, replies: 2, reposts: 13, quotes: 7) -
Wow! That is a cool fake fact! -- Fake User (fake_user.bsky.social) (likes: 6, replies: 2, reposts: 5, quotes: 1) -
I saw a completely unrelated movie once and I liked it! -- Pretend User (pretend_user.bsky.social) (likes: 1, replies: 1, reposts: 0, quotes: 0) -
I don't see how that's relevant -- Fake User (fake_user.bsky.social) (likes: 2, replies: 0, reposts: 1, quotes: 1) -
TODO: Bluesky thread 2 follow-up questions#
Write an answer in response to each of these questions (you can edit this text by double clicking it):
Look through the output of print_post_thread()
based on your modified should_display
function.
Did your function tend to keep most posts or tend to hide most posts?
TODO: Your answer here
Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?
TODO: Your answer here
TODO: Print bluesky thread 3#
print_post_thread('https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y', show_hidden = True)
This is a fake fact about movie costuming and I find it so interesting! -- Imaginary User (imaginary_user.bsky.social) (likes: 25, replies: 2, reposts: 13, quotes: 7) -
Wow! That is a cool fake fact! -- Fake User (fake_user.bsky.social) (likes: 6, replies: 2, reposts: 5, quotes: 1) -
I saw a completely unrelated movie once and I liked it! -- Pretend User (pretend_user.bsky.social) (likes: 1, replies: 1, reposts: 0, quotes: 0) -
I don't see how that's relevant -- Fake User (fake_user.bsky.social) (likes: 2, replies: 0, reposts: 1, quotes: 1) -
TODO: Bluesky thread 3 follow-up questions#
Write an answer in response to each of these questions (you can edit this text by double clicking it):
Look through the output of print_post_thread()
based on your modified should_display
function.
Did your function tend to keep most posts or tend to hide most posts?
TODO: Your answer here
Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?
TODO: Your answer here
TODO: Final Reflection questions#
Write an answer in response in response to each of these questions:
Explain why you chose the rules you did for selecting the best comments?
TODO: Your answer here
What was most challenging about coming up with your rules?
TODO: Your answer here
What additional information or rules do you wish you could have used?
TODO: Your answer here
If someone or some group wanted to make sure their comments were shown by your function, what would they do? How hard would this be?
TODO: Your answer here
If someone or some group wanted to make sure someone else’s comments were NOT shown by your function, what would they do (if anything)? How hard would this be?
TODO: Your answer here
If Bluesky adopted this rule as a universal rule for which posts to display, what do you think would happen? (e.g., Would people change posting strategies? Would posting look different than currently? Would it get overwhelmed with spam?)
TODO: Your answer here