A5: Best Comments#

Choose Social Media Platform: Reddit | Discord | Bluesky | No Coding

In this assignment you will be modifying a recursive function that prints a comments and replies on a reddit post. Your goal will be to only show the best comments and replies. It will be up to you to decide what rules you use to decide which comments are the best comments.

Reddit Praw Setup#

# make sure praw library is installed
import praw

(optional) make a fake praw connection with the fake_praw library

For testing purposes, we’ve added this line of code, which loads a fake version of praw, so it wont actually connect to reddit. If you want to try to actually connect to reddit, don’t run this line of code.

%run ../../../../fake_apis/fake_praw.ipynb
Fake praw is replacing the praw library. Fake praw doesn't need real passwords, and prevents you from accessing real reddit
%run reddit_keys.py
# Give the praw code your reddit account info so
# it can perform reddit actions
reddit = praw.Reddit(
    username=username, password=password,
    client_id=client_id, client_secret=client_secret,
    user_agent="a custom python script for user /" + str(username)
)
Fake praw is pretending to collect account info to use on reddit

Helper function to display text in an indented box#

(You don’t need to worry about how this works)

from IPython.display import HTML, Image, display
import html
def display_indented(text, left_margin=0, color="white"):
    display(
        HTML(
            "<pre style='border:solid 1px;padding:3px;margin-left:"+str(left_margin)+"px;background-color:"+color+"'>" + 
            html.escape(text) + 
            "</pre>"
        )
    )

Code to print a post with all comments and replies#

We are providing these function that recursively prints all comments and replies, but depends on whether a should_display function returns True or False to decide if it actually displays a comment. (Note: if a should_display comes back false for a comment, the comment wont be displayed, nor will any replies to it)

The print_post_and_replies is a function that takes a postId (instructions on where to get one later), prints information about that post, and then uses the print_comment_and_replies function to print out all comments and replies.

def print_post_and_replies(postId, show_hidden=False):
    submission = reddit.submission(postId)
    
    print("Comments and replies for post from /"+ submission.subreddit.display_name + ":" )
    display(HTML('<a href="'+"https://www.reddit.com/" + submission.permalink +'">'+submission.title+'</a>'))
    
    submission.comment_sort = "old"
    submission.comments.replace_more()
    comments = submission.comments
    
    for comment in comments:
        print_comment_and_replies(comment, show_hidden=show_hidden)

The print_post_and_replies function recursively prints a comment as well as all replies to that comment, as well as all replies to those replies, etc.

def print_comment_and_replies(comment, num_indents=0, show_hidden=False):
    
    replies = comment.replies

    display_text = (
        comment.body + "\n" +
        "-- " + str(comment.author) + 
        " (score " + str(comment.score) + ")"
    )
    
    if(should_display(comment)):# check if we should display this comment
        display_indented(display_text, num_indents*20)

        #print replies (and the replies of those, etc.)
        for reply in replies:
            print_comment_and_replies(reply, num_indents = num_indents + 1, show_hidden=show_hidden)
    
    elif(show_hidden): #If we want to still see which posts we are hiding, color them LightCoral so we can see they are hidden
        display_indented(display_text, num_indents*20, color='LightCoral')

TODO: Create Your Content Moderation Algorithm#

Your job is to invent and implement your own rule inside the should_display function for what comments count as the “best comments” and therefore should be displayed. The rule can be complicated or simple, it just can’t be the same as the current rule. You can aim for focusing on only hiding a few comments that you judge are bad, or for only showing a few comments you judge are the very best, or a combination of those.

When you are making your rule you may want to use different comparison operators (like == for equals, > for greater than, etc.) and different logical operators (like and for both things must be true, or for at least one thing must be true, etc.). See a list of operators here: https://www.w3schools.com/python/python_operators.asp

Some things you can use when you are deciding whether to display a tweet or not:

  • The text of the comment: comment.body

  • The score of the comment: comment.score

  • If the comment is “distinguished”: comment.distinguished

  • If the comment was edited: comment.edited

  • If the comment was made by the author of the original post: comment.is_submitter

  • The number of replies: len(comment.replies)

You can see more by looking at the official documentation for lists of attributes of a comment

You can also look at attributes of the author such as (though you’ll have to do an extra if comment.author check to make sure the author exists):

  • author name: comment.author.name

  • author comment karma: comment.author.comment_karma

  • author link karma: comment.author.link_karma

You can see more by looking at lists of attributes of a redditor

  • You can use any other information you can figure out about the comment as well, such as the sentiment analysis that was demoed previously.

def should_display(comment):
    #TODO: Make your own rule
    
    # for a demonstration, we only display comments with the lower case letters "the" 
    # Note: that the way we are checking here, a comment that has the word "there" would show up
    #       since "the" appears in "there"
    has_letters_the = "the" in comment.body
    
    if(has_letters_the):
        return True
    else:
        return False

Finding post IDs and testing our code#

In order to test it out, we need to find an id of a reddit post that has comments on it. Once you have a reddit post open in your browser, find or copy the url website address and look for the piece of random letters after https://www.reddit.com/r/[subredditname]/comments/, which is the id.

For example, in this post, the id is ‘fuulky’: Screenshot of reddit with a post up. The website url is "https://www.reddit.com//r/MovieDetails/comments/fuulky/in_little_women_2019_laurie_and_jo_swap_articles/". There is a circle drawn around the letters "fuulky" which appears after "comments/"

Now we can test it out by calling the print_post_and_replies with the string 'fuulky' as the argument, and see what comments are displayed.

As you work on your changes to the should_display function, you can test it out on different threads by looking up more ids, like: 'vfs5oh' and 'lzvvwp'.

print_post_and_replies('fuulky')
Comments and replies for post from /MovieDetails(Fake):

If we also want to see what comments are being skipped, we can use an optional argument for print_post_and_replies by setting show_hidden = True, and the comments that are being skipped will show up with a reddish background.

print_post_and_replies('fuulky', show_hidden = True)
Comments and replies for post from /MovieDetails(Fake):
Wow! That is a cool fake fact!
-- FakeAuthor (score 6)
I saw a completely unrelated movie once!
-- PretendAuthor (score 1)

TODO! Test it with 3 reddid threads#

Now, after you’ve modified the should_display, try testing out your algorithm on three new posts, answering follow up questions after each one.

In the sections below, replace the ?????s with a reddit post id, and run the code. Then answer the questions about how that went.

At the very end will be more reflection questions.

TODO: Print reddit thread 1#

print_post_and_replies('?????', show_hidden = True)
Comments and replies for post from /MovieDetails(Fake):
Wow! That is a cool fake fact!
-- FakeAuthor (score 6)
I saw a completely unrelated movie once!
-- PretendAuthor (score 1)

TODO: Reddit thread 1 follow-up questions#

Write an answer in response to each of these questions (you can edit this text by double clicking it):

Look through the output of print_post_and_replies() based on your modified should_display function.

Did your function tend to keep most posts or tend to hide most posts?

TODO: Your answer here

Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?

TODO: Your answer here

TODO: Print reddit thread 2#

print_post_and_replies('?????', show_hidden = True)
Comments and replies for post from /MovieDetails(Fake):
Wow! That is a cool fake fact!
-- FakeAuthor (score 6)
I saw a completely unrelated movie once!
-- PretendAuthor (score 1)

TODO: Reddit thread 2 follow-up questions#

Write an answer in response to each of these questions (you can edit this text by double clicking it):

Look through the output of print_post_and_replies() based on your modified should_display function.

Did your function tend to keep most posts or tend to hide most posts?

TODO: Your answer here

Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?

TODO: Your answer here

TODO: Print reddit thread 3#

print_post_and_replies('?????', show_hidden = True)
Comments and replies for post from /MovieDetails(Fake):
Wow! That is a cool fake fact!
-- FakeAuthor (score 6)
I saw a completely unrelated movie once!
-- PretendAuthor (score 1)

TODO: Reddit thread 3 follow-up questions#

Write an answer in response to each of these questions (you can edit this text by double clicking it):

Look through the output of print_post_and_replies() based on your modified should_display function.

Did your function tend to keep most posts tend to hide most posts?

TODO: Your answer here

Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?

TODO: Your answer here

TODO: Final Reflection questions#

Write an answer in response in response to each of these questions:

Explain why you chose the rules you did for selecting the best comments?

TODO: Your answer here

What was most challenging about coming up with your rules?

TODO: Your answer here

What additional information or rules do you wish you could have used?

TODO: Your answer here

If someone or some group wanted to make sure their comments were shown by your function, what would they do? How hard would this be?

TODO: Your answer here

If someone or some group wanted to make sure someone else’s comments were NOT shown by your function, what would they do (if anything)? How hard would this be?

TODO: Your answer here

If Reddit adopted this rule as a universal rule for which comments to display, what do you think would happen? (e.g., would people change commenting strategies? would comments look different than currently? would it get overwhelmed with spam?)

TODO: Your answer here