A5: Best Comments#
Choose Social Media Platform: Reddit | Discord | Bluesky | No Coding
In this assignment you will be modifying a recursive function that prints a comments and replies on a reddit post. Your goal will be to only show the best comments and replies. It will be up to you to decide what rules you use to decide which comments are the best comments.
Reddit Praw Setup#
# make sure praw library is installed
import praw
(optional) make a fake praw connection with the fake_praw library
For testing purposes, we’ve added this line of code, which loads a fake version of praw, so it wont actually connect to reddit. If you want to try to actually connect to reddit, don’t run this line of code.
%run ../../../../fake_apis/fake_praw.ipynb
%run reddit_keys.py
# Give the praw code your reddit account info so
# it can perform reddit actions
reddit = praw.Reddit(
username=username, password=password,
client_id=client_id, client_secret=client_secret,
user_agent="a custom python script for user /" + str(username)
)
Helper function to display text in an indented box#
(You don’t need to worry about how this works)
from IPython.display import HTML, Image, display
import html
def display_indented(text, left_margin=0, color="white"):
display(
HTML(
"<pre style='border:solid 1px;padding:3px;margin-left:"+str(left_margin)+"px;background-color:"+color+"'>" +
html.escape(text) +
"</pre>"
)
)
Code to print a post with all comments and replies#
We are providing these function that recursively prints all comments and replies, but depends on whether a should_display
function returns True or False to decide if it actually displays a comment. (Note: if a should_display
comes back false for a comment, the comment wont be displayed, nor will any replies to it)
The print_post_and_replies
is a function that takes a postId (instructions on where to get one later), prints information about that post, and then uses the print_comment_and_replies
function to print out all comments and replies.
def print_post_and_replies(postId, show_hidden=False):
submission = reddit.submission(postId)
print("Comments and replies for post from /"+ submission.subreddit.display_name + ":" )
display(HTML('<a href="'+"https://www.reddit.com/" + submission.permalink +'">'+submission.title+'</a>'))
submission.comment_sort = "old"
submission.comments.replace_more()
comments = submission.comments
for comment in comments:
print_comment_and_replies(comment, show_hidden=show_hidden)
The print_post_and_replies
function recursively prints a comment as well as all replies to that comment, as well as all replies to those replies, etc.
def print_comment_and_replies(comment, num_indents=0, show_hidden=False):
replies = comment.replies
display_text = (
comment.body + "\n" +
"-- " + str(comment.author) +
" (score " + str(comment.score) + ")"
)
if(should_display(comment)):# check if we should display this comment
display_indented(display_text, num_indents*20)
#print replies (and the replies of those, etc.)
for reply in replies:
print_comment_and_replies(reply, num_indents = num_indents + 1, show_hidden=show_hidden)
elif(show_hidden): #If we want to still see which posts we are hiding, color them LightCoral so we can see they are hidden
display_indented(display_text, num_indents*20, color='LightCoral')
TODO: Create Your Content Moderation Algorithm#
Your job is to invent and implement your own rule inside the should_display
function for what comments count as the “best comments” and therefore should be displayed. The rule can be complicated or simple, it just can’t be the same as the current rule. You can aim for focusing on only hiding a few comments that you judge are bad, or for only showing a few comments you judge are the very best, or a combination of those.
When you are making your rule you may want to use different comparison operators (like == for equals, > for greater than, etc.) and different logical operators (like and
for both things must be true, or
for at least one thing must be true, etc.). See a list of operators here: https://www.w3schools.com/python/python_operators.asp
Some things you can use when you are deciding whether to display a tweet or not:
The text of the comment:
comment.body
The score of the comment:
comment.score
If the comment is “distinguished”:
comment.distinguished
If the comment was edited:
comment.edited
If the comment was made by the author of the original post:
comment.is_submitter
The number of replies:
len(comment.replies)
You can see more by looking at the official documentation for lists of attributes of a comment
You can also look at attributes of the author such as (though you’ll have to do an extra if comment.author
check to make sure the author exists):
author name:
comment.author.name
author comment karma:
comment.author.comment_karma
author link karma:
comment.author.link_karma
You can see more by looking at lists of attributes of a redditor
You can use any other information you can figure out about the comment as well, such as the sentiment analysis that was demoed previously.
def should_display(comment):
#TODO: Make your own rule
# for a demonstration, we only display comments with the lower case letters "the"
# Note: that the way we are checking here, a comment that has the word "there" would show up
# since "the" appears in "there"
has_letters_the = "the" in comment.body
if(has_letters_the):
return True
else:
return False
Finding post IDs and testing our code#
In order to test it out, we need to find an id of a reddit post that has comments on it. Once you have a reddit post open in your browser, find or copy the url website address and look for the piece of random letters after https://www.reddit.com/r/[subredditname]/comments/
, which is the id.
For example, in this post, the id is ‘fuulky’:
Now we can test it out by calling the print_post_and_replies
with the string 'fuulky'
as the argument, and see what comments are displayed.
As you work on your changes to the should_display
function, you can test it out on different threads by looking up more ids, like: 'vfs5oh'
and 'lzvvwp'
.
print_post_and_replies('fuulky')
Comments and replies for post from /MovieDetails(Fake):
If we also want to see what comments are being skipped, we can use an optional argument for print_post_and_replies
by setting show_hidden = True
, and the comments that are being skipped will show up with a reddish background.
print_post_and_replies('fuulky', show_hidden = True)
Comments and replies for post from /MovieDetails(Fake):
Wow! That is a cool fake fact! -- FakeAuthor (score 6)
I saw a completely unrelated movie once! -- PretendAuthor (score 1)
TODO! Test it with 3 reddid threads#
Now, after you’ve modified the should_display
, try testing out your algorithm on three new posts, answering follow up questions after each one.
In the sections below, replace the ?????
s with a reddit post id, and run the code. Then answer the questions about how that went.
At the very end will be more reflection questions.
TODO: Print reddit thread 1#
print_post_and_replies('?????', show_hidden = True)
Comments and replies for post from /MovieDetails(Fake):
Wow! That is a cool fake fact! -- FakeAuthor (score 6)
I saw a completely unrelated movie once! -- PretendAuthor (score 1)
TODO: Reddit thread 1 follow-up questions#
Write an answer in response to each of these questions (you can edit this text by double clicking it):
Look through the output of print_post_and_replies()
based on your modified should_display
function.
Did your function tend to keep most posts or tend to hide most posts?
TODO: Your answer here
Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?
TODO: Your answer here
TODO: Print reddit thread 2#
print_post_and_replies('?????', show_hidden = True)
Comments and replies for post from /MovieDetails(Fake):
Wow! That is a cool fake fact! -- FakeAuthor (score 6)
I saw a completely unrelated movie once! -- PretendAuthor (score 1)
TODO: Reddit thread 2 follow-up questions#
Write an answer in response to each of these questions (you can edit this text by double clicking it):
Look through the output of print_post_and_replies()
based on your modified should_display
function.
Did your function tend to keep most posts or tend to hide most posts?
TODO: Your answer here
Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?
TODO: Your answer here
TODO: Print reddit thread 3#
print_post_and_replies('?????', show_hidden = True)
Comments and replies for post from /MovieDetails(Fake):
Wow! That is a cool fake fact! -- FakeAuthor (score 6)
I saw a completely unrelated movie once! -- PretendAuthor (score 1)
TODO: Reddit thread 3 follow-up questions#
Write an answer in response to each of these questions (you can edit this text by double clicking it):
Look through the output of print_post_and_replies()
based on your modified should_display
function.
Did your function tend to keep most posts tend to hide most posts?
TODO: Your answer here
Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?
TODO: Your answer here
TODO: Final Reflection questions#
Write an answer in response in response to each of these questions:
Explain why you chose the rules you did for selecting the best comments?
TODO: Your answer here
What was most challenging about coming up with your rules?
TODO: Your answer here
What additional information or rules do you wish you could have used?
TODO: Your answer here
If someone or some group wanted to make sure their comments were shown by your function, what would they do? How hard would this be?
TODO: Your answer here
If someone or some group wanted to make sure someone else’s comments were NOT shown by your function, what would they do (if anything)? How hard would this be?
TODO: Your answer here
If Reddit adopted this rule as a universal rule for which comments to display, what do you think would happen? (e.g., would people change commenting strategies? would comments look different than currently? would it get overwhelmed with spam?)
TODO: Your answer here