A5: Best Comments

A5: Best Comments#

In this assignment you will be modifying a recursive function that prints replies on a Bluesky post. Your goal will be to only show the best replies. It will be up to you to decide what rules you use to decide which comments are the best comments.

Choose Social Media Platform: Bluesky | Reddit | Discord | No Coding

Helper functions#

We’ll need a few helper functions before we get started

helper function to display text in an indented box#

(You don’t need to worry about how this works. This is that function that helps display posts in indented boxes)

from IPython.display import HTML, Image, display
import html
def display_indented(text, left_margin=0, color="white"):
    display(
        HTML(
            "<pre style='border:solid 1px;padding:3px;margin-top:3px;margin-bottom:3px;margin-left:"+ str(left_margin) + "px;background-color:"+color+"'>" + 
            html.escape(text) + 
            "</pre>"
        )
    )

helper function for atproto links#

NOTE: You don’t need to worry about the details of how this works, it just is here to make the code later easier to use.

In order to make this as simple as possible, we’re providing a helper function to turn the url for a bluesky post (easy to get) into a uri that the bluesky API understands (not as easy to get). You also don’t need to worry about how this works!

We’ll also a provide a helper function to get the author of a post (you can use this in your should_display() function!)

import re #load a "regular expression" library for helping to parse text
from atproto import IdResolver # Load the atproto IdResolver library to get offical ATProto user IDs

def get_at_post_link_from_url(url):
    # Initialize and log in with the client

    # Extract username and post ID from the URL
    match = re.search(r'https://bsky.app/profile/([^/]+)/post/([^/]+)', url)
    if not match:
        raise ValueError("Invalid Bluesky post URL format.")
    user_handle, post_id = match.groups()

    # Construct the at:// URI
    post_uri = f"at://{user_handle}/app.bsky.feed.post/{post_id}"

    return post_uri


def get_author_profle_from_post(post):
    author_did = post.author.did
    author_profile = client.app.bsky.actor.get_profile({'actor': author_did})
    return author_profile

# function to convert a feed from a weblink url to the special atproto "at" URI
def get_at_feed_link_from_url(url):
    
    # Get the user did and feed id from the weblink url
    match = re.search(r'https://bsky.app/profile/([^/]+)/feed/([^/]+)', url)
    if not match:
        raise ValueError("Invalid Bluesky feed URL format.")
    user_handle, feed_id = match.groups()

    # Get the official atproto user ID (did) from the handle
    resolver = IdResolver()
    did = resolver.handle.resolve(user_handle)
    if not did:
        raise ValueError(f'Could not resolve DID for handle "{user_handle}".')

    # Construct the at:// URI
    post_uri = f"at://{did}/app.bsky.feed.generator/{feed_id}"

Bluesky Setup#

Now we can continue logging in to Bluesky and look through multiple posts.

load atproto library#

# Load some code called "Client" from the "atproto" library that will help us work with Bluesky
from atproto import Client

(optional) make a fake Bluesky connection with the fake_atproto library#

For testing purposes, we”ve added this line of code, which loads a fake version of atproto, so it wont actually connect to Bluesky. If you want to try to actually connect to Bluesky, don’t run this line of code.

%run ../../../../fake_apis/fake_atproto.ipynb

Fake atproto (bsky.app) is replacing the atproto.blue library. Fake atproto doesn't need real passwords, and prevents you from accessing real Bluesky

login to Bluesky#

# Login to Bluesky
# TODO: Put your bluesky account info in the bluesky_keys.py file
%run bluesky_keys.py

client = Client(base_url="https://bsky.social")
client.login(handle, password)

Fake atproto is pretending to set up a client connection to: https://bsky.social

Fake atproto is pretending log into your account:

Code to print a post with all comments and replies#

We are providing these function that recursively prints a post and all replies, but depends on whether a should_display function returns True or False to decide if it actually displays a post. (Note: the should_display function is defined later in this notebook. If a should_display comes back false for a post, the post wont be displayed, nor will any replies to it)

The print_post_thread is a function that takes a Bluesky Post weblink (url) (instructions on where to get one below), downloads the thread that follows that post, and then uses the print_post_and_replies function to print out that post and the replies to that post.

def print_post_thread(postUrl, show_hidden=False):

    at_post_link = get_at_post_link_from_url(postUrl)
    
    # Fetch the post details
    post_data = client.get_post_thread(at_post_link)
    
    print_post_and_replies(post_data.thread, show_hidden=show_hidden)

The print_post_and_replies function takes a given post and recursively prints that post as well as all replies to that post (which will also print all the replies to those replies, etc.)

def print_post_and_replies(postInfo, num_indents=0, show_hidden=False):
    
    # make sure this post isn't blocked (since we can't read blocked posts)
    if not (hasattr(postInfo,'blocked') and postInfo.blocked):
        
        post = postInfo.post
        replies = postInfo.replies

        # If replies is None, make it an empty array (so the for loop later doesn't crash)
        if not replies:
            replies = []
    
        display_text = (
            post.record.text + "\n" +
            "-- " + str(post.author.display_name) + " (" + str(post.author.handle) + ")\n" + 
            " (likes: " + str(post.like_count) + 
            ", replies: " + str(post.reply_count) +
            ", reposts: " + str(post.repost_count) +
            ", quotes: " + str(post.quote_count) +
            ") - " 
        )

        if should_display(post):
            display_indented(display_text, num_indents*20)
            for reply in replies:
                print_post_and_replies(reply, num_indents = num_indents + 1, show_hidden=show_hidden)
                
        elif(show_hidden):
            display_indented(display_text, num_indents*20, color='LightCoral')

TODO: Create Your Content Moderation Algorithm#

Your job is to invent and implement your own rule inside the should_display function for what post count as the “best posts” and therefore should be displayed. The rule can be complicated or simple, it just can’t be the same as the current rule. You can aim for focusing on only hiding a few posts that you judge are bad, or for only showing a few posts you judge are the very best, or a combination of those.

When you are making your rule you may want to use different comparison operators (like == for equals, > for greater than, etc.) and different logical operators (like and for both things must be true, or for at least one thing must be true, etc.). See a list of python operators at w3schools

Some things you can use when you are deciding whether to display a post or not:

The text of the commnet: post.record.text
The likes of the comment: post.like_count
The number of replies: post.reply_count
The number of reposts: post.repost_count
The number of quotes: post.quote_count
author display name: post.author.display_name
author handle: post.author.handle

You can also look up more about the author by uncommenting the optional author_profile lookup line (author_profile = get_author_profle_from_post(post)). Then you can get:

author bio/description: author.description
author’s number of followers: author.followers_count
author’s number of people they folllow: author.follows_count
author’s number of posts: author.posts_count
You can use any other information you can figure out about the post as well, such as the sentiment analysis that was demoed previously.

def should_display(post):
    #TODO: Make your own rule

    # optional code below: Get the full author profile (uncomment to use)
    # author_profile = get_author_profle_from_post(post)
    
    # for a demonstration, we only display comments with the lower case letters "and" 
    # Note: that the way we are checking here, a comment that has the word "sand" would show up
    #       since "and" appears in "sand"
    has_letters_and = "and" in post.record.text
    
    if(has_letters_and):
        return True
    else:
        return False

Getting urls to test#

In order to use our function, we need to grab the url of a Bluesky post to test it with. Once you find the post, find the ‘Copy Link to Post’ option to get a web url for the post.

Bluesky Post. The three dot "Open post options menu" is opened, and from there the "Copy Link to Post" option is selected.

It should be something like: https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y

Then paste the entire URL in as the string for the first argument to the print_post_thread function, as in the example below. Try it out!

print_post_thread("https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y", False)

This is a fake fact about movie costuming and I find it so interesting!
-- Imaginary User (imaginary_user.bsky.social)
 (likes: 25, replies: 2, reposts: 13, quotes: 7) -

I saw a completely unrelated movie once and I liked it!
-- Pretend User (pretend_user.bsky.social)
 (likes: 1, replies: 1, reposts: 0, quotes: 0) -

If we also want to see what comments are being skipped, we can use an optional argument for print_post_and_replies by setting show_hidden = True, and the comments that are being skipped will show up with a reddish background.

print_post_thread("https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y", True)

This is a fake fact about movie costuming and I find it so interesting!
-- Imaginary User (imaginary_user.bsky.social)
 (likes: 25, replies: 2, reposts: 13, quotes: 7) -

Wow! That is a cool fake fact!
-- Fake User (fake_user.bsky.social)
 (likes: 6, replies: 2, reposts: 5, quotes: 1) -

I saw a completely unrelated movie once and I liked it!
-- Pretend User (pretend_user.bsky.social)
 (likes: 1, replies: 1, reposts: 0, quotes: 0) -

I don't see how that's relevant
-- Fake User (fake_user.bsky.social)
 (likes: 2, replies: 0, reposts: 1, quotes: 1) -

Test it out with 3 Bluesky threads!#

Now, after you’ve modified the should_display, try testing out your algorithm on three new posts (make sure they have replies!), answering follow up questions after each one.

In the sections below, replace the ?????s with a bluesky url, and run the code. Then answer the questions about how that went.

At the very end will be more reflection questions.

TODO: Print bluesky thread 1#

print_post_thread('https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y', show_hidden = True)

This is a fake fact about movie costuming and I find it so interesting!
-- Imaginary User (imaginary_user.bsky.social)
 (likes: 25, replies: 2, reposts: 13, quotes: 7) -

Wow! That is a cool fake fact!
-- Fake User (fake_user.bsky.social)
 (likes: 6, replies: 2, reposts: 5, quotes: 1) -

I saw a completely unrelated movie once and I liked it!
-- Pretend User (pretend_user.bsky.social)
 (likes: 1, replies: 1, reposts: 0, quotes: 0) -

I don't see how that's relevant
-- Fake User (fake_user.bsky.social)
 (likes: 2, replies: 0, reposts: 1, quotes: 1) -

TODO: Bluesky thread 1 follow-up questions#

Write an answer in response to each of these questions (you can edit this text by double clicking it):

Look through the output of print_post_thread() based on your modified should_display function.

Did your function tend to keep most posts or tend to hide most posts?

TODO: Your answer here

Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?

TODO: Your answer here

TODO: Print bluesky thread 2#

print_post_thread('https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y', show_hidden = True)

This is a fake fact about movie costuming and I find it so interesting!
-- Imaginary User (imaginary_user.bsky.social)
 (likes: 25, replies: 2, reposts: 13, quotes: 7) -

Wow! That is a cool fake fact!
-- Fake User (fake_user.bsky.social)
 (likes: 6, replies: 2, reposts: 5, quotes: 1) -

I saw a completely unrelated movie once and I liked it!
-- Pretend User (pretend_user.bsky.social)
 (likes: 1, replies: 1, reposts: 0, quotes: 0) -

I don't see how that's relevant
-- Fake User (fake_user.bsky.social)
 (likes: 2, replies: 0, reposts: 1, quotes: 1) -

TODO: Bluesky thread 2 follow-up questions#

Write an answer in response to each of these questions (you can edit this text by double clicking it):

Look through the output of print_post_thread() based on your modified should_display function.

Did your function tend to keep most posts or tend to hide most posts?

TODO: Your answer here

Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?

TODO: Your answer here

TODO: Print bluesky thread 3#

print_post_thread('https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y', show_hidden = True)

This is a fake fact about movie costuming and I find it so interesting!
-- Imaginary User (imaginary_user.bsky.social)
 (likes: 25, replies: 2, reposts: 13, quotes: 7) -

Wow! That is a cool fake fact!
-- Fake User (fake_user.bsky.social)
 (likes: 6, replies: 2, reposts: 5, quotes: 1) -

I saw a completely unrelated movie once and I liked it!
-- Pretend User (pretend_user.bsky.social)
 (likes: 1, replies: 1, reposts: 0, quotes: 0) -

I don't see how that's relevant
-- Fake User (fake_user.bsky.social)
 (likes: 2, replies: 0, reposts: 1, quotes: 1) -

TODO: Bluesky thread 3 follow-up questions#

Write an answer in response to each of these questions (you can edit this text by double clicking it):

Look through the output of print_post_thread() based on your modified should_display function.

Did your function tend to keep most posts or tend to hide most posts?

TODO: Your answer here

Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?

TODO: Your answer here

TODO: Final Reflection questions#

Write an answer in response in response to each of these questions:

Explain why you chose the rules you did for selecting the best comments?

TODO: Your answer here

What was most challenging about coming up with your rules?

TODO: Your answer here

What additional information or rules do you wish you could have used?

TODO: Your answer here

If someone or some group wanted to make sure their comments were shown by your function, what would they do? How hard would this be?

TODO: Your answer here

If someone or some group wanted to make sure someone else’s comments were NOT shown by your function, what would they do (if anything)? How hard would this be?

TODO: Your answer here

If Bluesky adopted this rule as a universal rule for which posts to display, what do you think would happen? (e.g., Would people change posting strategies? Would posting look different than currently? Would it get overwhelmed with spam?)

TODO: Your answer here

A5: Best Comments

Contents

A5: Best Comments#

Helper functions#

helper function to display text in an indented box#

helper function for atproto links#

Bluesky Setup#

load atproto library#

(optional) make a fake Bluesky connection with the fake_atproto library#

login to Bluesky#

Code to print a post with all comments and replies#

TODO: Create Your Content Moderation Algorithm#

Getting urls to test#

Test it out with 3 Bluesky threads!#

TODO: Print bluesky thread 1#

TODO: Bluesky thread 1 follow-up questions#

TODO: Print bluesky thread 2#

TODO: Bluesky thread 2 follow-up questions#

TODO: Print bluesky thread 3#

TODO: Bluesky thread 3 follow-up questions#

TODO: Final Reflection questions#