{ "cells": [ { "cell_type": "markdown", "id": "0e509938-900e-4dad-a17a-cf0a55758e24", "metadata": {}, "source": [ "# A5: Best Comments\n", "\n", "In this assignment you will be modifying a recursive function that prints replies on a Bluesky post. Your goal will be to only show the best replies. It will be up to you to decide what rules you use to decide which comments are the best comments." ] }, { "cell_type": "markdown", "id": "123456789-930485093240532940945-0324095320945904325", "metadata": { "tags": [] }, "source": [" _Choose Social Media Platform: __Bluesky__ | Reddit | Discord | No Coding_ "] }, { "cell_type": "markdown", "id": "c9ecdab8-7659-4bfd-9861-09aa865be64d", "metadata": {}, "source": [ "## Helper functions\n", "We'll need a few helper functions before we get started" ] }, { "cell_type": "markdown", "id": "ba48704b-63b5-427a-9910-744c310812e0", "metadata": {}, "source": [ "### helper function to display text in an indented box\n", "(You don't need to worry about how this works. This is that function that helps display posts in indented boxes)" ] }, { "cell_type": "code", "execution_count": null, "id": "8abc6574-375c-43ed-bf6d-1698cdd580ac", "metadata": {}, "outputs": [], "source": [ "from IPython.display import HTML, Image, display\n", "import html\n", "def display_indented(text, left_margin=0, color=\"white\"):\n", " display(\n", " HTML(\n", " \"
\" + \n", " html.escape(text) + \n", " \"\"\n", " )\n", " )" ] }, { "cell_type": "markdown", "id": "c98370f7-b5d0-4f59-890a-d549b13a4348", "metadata": {}, "source": [ "### helper function for atproto links\n", "_NOTE: You don't need to worry about the details of how this works, it just is here to make the code later easier to use._\n", "\n", "In order to make this as simple as possible, we're providing a helper function to turn the url for a bluesky post (easy to get) into a uri that the bluesky API understands (not as easy to get). You also don't need to worry about how this works!\n", "\n", "We'll also a provide a helper function to get the author of a post (you can use this in your should_display() function!)" ] }, { "cell_type": "code", "execution_count": null, "id": "b0daacc2-ab92-443d-8361-a261568b293d", "metadata": {}, "outputs": [], "source": [ "import re #load a \"regular expression\" library for helping to parse text\n", "from atproto import IdResolver # Load the atproto IdResolver library to get offical ATProto user IDs\n", "\n", "def get_at_post_link_from_url(url):\n", " # Initialize and log in with the client\n", "\n", " # Extract username and post ID from the URL\n", " match = re.search(r'https://bsky.app/profile/([^/]+)/post/([^/]+)', url)\n", " if not match:\n", " raise ValueError(\"Invalid Bluesky post URL format.\")\n", " user_handle, post_id = match.groups()\n", "\n", " # Construct the at:// URI\n", " post_uri = f\"at://{user_handle}/app.bsky.feed.post/{post_id}\"\n", "\n", " return post_uri\n", "\n", "\n", "def get_author_profle_from_post(post):\n", " author_did = post.author.did\n", " author_profile = client.app.bsky.actor.get_profile({'actor': author_did})\n", " return author_profile\n", "\n", "# function to convert a feed from a weblink url to the special atproto \"at\" URI\n", "def get_at_feed_link_from_url(url):\n", " \n", " # Get the user did and feed id from the weblink url\n", " match = re.search(r'https://bsky.app/profile/([^/]+)/feed/([^/]+)', url)\n", " if not match:\n", " raise ValueError(\"Invalid Bluesky feed URL format.\")\n", " user_handle, feed_id = match.groups()\n", "\n", " # Get the official atproto user ID (did) from the handle\n", " resolver = IdResolver()\n", " did = resolver.handle.resolve(user_handle)\n", " if not did:\n", " raise ValueError(f'Could not resolve DID for handle \"{user_handle}\".')\n", "\n", " # Construct the at:// URI\n", " post_uri = f\"at://{did}/app.bsky.feed.generator/{feed_id}\"\n" ] }, { "cell_type": "markdown", "id": "d989991e-83d0-40e9-aabe-a5d96c71aa7c", "metadata": {}, "source": [ "## Bluesky Setup\n", "Now we can continue logging in to Bluesky and look through multiple posts.\n", "### load atproto library" ] }, { "cell_type": "code", "execution_count": null, "id": "f922c5c5-6f1e-4745-91d6-01b2f2cb95f0", "metadata": {}, "outputs": [], "source": [ "# Load some code called \"Client\" from the \"atproto\" library that will help us work with Bluesky\n", "from atproto import Client" ] }, { "cell_type": "markdown", "id": "5bf8f681-6f4d-439c-9ac4-64b3864b1363", "metadata": {}, "source": [ "### (optional) make a fake Bluesky connection with the fake_atproto library\n", "For testing purposes, we\"ve added this line of code, which loads a fake version of atproto, so it wont actually connect to Bluesky. __If you want to try to actually connect to Bluesky, don't run this line of code.__" ] }, { "cell_type": "code", "execution_count": null, "id": "cc14364a-52e0-47f1-a9f9-d950c623f238", "metadata": {}, "outputs": [], "source": [ "%run ../../../../fake_apis/fake_atproto.ipynb" ] }, { "cell_type": "markdown", "id": "533fc7ee-e0c7-46c9-871f-5a50a002b500", "metadata": {}, "source": [ "### login to Bluesky" ] }, { "cell_type": "code", "execution_count": null, "id": "0e002ab7-5d33-4cac-b414-7a6241d83592", "metadata": {}, "outputs": [], "source": [ "# Login to Bluesky\n", "# TODO: Put your bluesky account info in the bluesky_keys.py file\n", "%run bluesky_keys.py\n", "\n", "client = Client(base_url=\"https://bsky.social\")\n", "client.login(handle, password)" ] }, { "cell_type": "markdown", "id": "3475f679-bbf7-462d-9358-2c5f0b6f329d", "metadata": {}, "source": [ "## Code to print a post with all comments and replies\n", "We are providing these function that recursively prints a post and all replies, but depends on whether a `should_display` function returns True or False to decide if it actually displays a post. (Note: the `should_display` function is defined later in this notebook. If a `should_display` comes back false for a post, the post wont be displayed, nor will any replies to it)\n", "\n", "The `print_post_thread` is a function that takes a Bluesky Post weblink (url) (instructions on where to get one below), downloads the thread that follows that post, and then uses the `print_post_and_replies` function to print out that post and the replies to that post. " ] }, { "cell_type": "code", "execution_count": null, "id": "a19093ab-8f99-49ce-80c8-d2b1b360ddbd", "metadata": {}, "outputs": [], "source": [ "def print_post_thread(postUrl, show_hidden=False):\n", "\n", " at_post_link = get_at_post_link_from_url(postUrl)\n", " \n", " # Fetch the post details\n", " post_data = client.get_post_thread(at_post_link)\n", " \n", " print_post_and_replies(post_data.thread, show_hidden=show_hidden)" ] }, { "cell_type": "markdown", "id": "4a8f8283-b8f1-40ae-9259-995cddacfa98", "metadata": {}, "source": [ "The `print_post_and_replies` function takes a given post and recursively prints that post as well as all replies to that post (which will also print all the replies to those replies, etc.)" ] }, { "cell_type": "code", "execution_count": null, "id": "b58734bc-cdbc-43a5-aabe-2740b1f7e273", "metadata": {}, "outputs": [], "source": [ "def print_post_and_replies(postInfo, num_indents=0, show_hidden=False):\n", " \n", " # make sure this post isn't blocked (since we can't read blocked posts)\n", " if not (hasattr(postInfo,'blocked') and postInfo.blocked):\n", " \n", " post = postInfo.post\n", " replies = postInfo.replies\n", "\n", " # If replies is None, make it an empty array (so the for loop later doesn't crash)\n", " if not replies:\n", " replies = []\n", " \n", " display_text = (\n", " post.record.text + \"\\n\" +\n", " \"-- \" + str(post.author.display_name) + \" (\" + str(post.author.handle) + \")\\n\" + \n", " \" (likes: \" + str(post.like_count) + \n", " \", replies: \" + str(post.reply_count) +\n", " \", reposts: \" + str(post.repost_count) +\n", " \", quotes: \" + str(post.quote_count) +\n", " \") - \" \n", " )\n", "\n", " if should_display(post):\n", " display_indented(display_text, num_indents*20)\n", " for reply in replies:\n", " print_post_and_replies(reply, num_indents = num_indents + 1, show_hidden=show_hidden)\n", " \n", " elif(show_hidden):\n", " display_indented(display_text, num_indents*20, color='LightCoral')" ] }, { "cell_type": "markdown", "id": "455f10a7-5458-4402-b394-523885d196d2", "metadata": {}, "source": [ "## TODO: Create Your Content Moderation Algorithm\n", "Your job is to invent and implement your own rule inside the `should_display` function for what post count as the \"best posts\" and therefore should be displayed. The rule can be complicated or simple, it just can't be the same as the current rule. You can aim for focusing on only hiding a few posts that you judge are bad, or for only showing a few posts you judge are the very best, or a combination of those.\n", "\n", "When you are making your rule you may want to use different comparison operators (like == for equals, > for greater than, etc.) and different logical operators (like `and` for both things must be true, `or` for at least one thing must be true, etc.). See a [list of python operators at w3schools](https://www.w3schools.com/python/python_operators.asp)\n", "\n", "Some things you can use when you are deciding whether to display a post or not:\n", "\n", "* The text of the commnet: post.record.text\n", "* The likes of the comment: post.like_count\n", "* The number of replies: post.reply_count\n", "* The number of reposts: post.repost_count\n", "* The number of quotes: post.quote_count\n", "* author display name: post.author.display_name\n", "* author handle: post.author.handle\n", "\n", "You can also look up more about the author by uncommenting the optional author_profile lookup line (`author_profile = get_author_profle_from_post(post)`). Then you can get:\n", "* author bio/description: author.description\n", "* author's number of followers: author.followers_count\n", "* author's number of people they folllow: author.follows_count\n", "* author's number of posts: author.posts_count\n", "\n", "* You can use any other information you can figure out about the post as well, such as the sentiment analysis that was demoed previously.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5e61bf5a-0490-4d54-91de-9faced06a15e", "metadata": {}, "outputs": [], "source": [ "def should_display(post):\n", " #TODO: Make your own rule\n", "\n", " # optional code below: Get the full author profile (uncomment to use)\n", " # author_profile = get_author_profle_from_post(post)\n", " \n", " # for a demonstration, we only display comments with the lower case letters \"and\" \n", " # Note: that the way we are checking here, a comment that has the word \"sand\" would show up\n", " # since \"and\" appears in \"sand\"\n", " has_letters_and = \"and\" in post.record.text\n", " \n", " if(has_letters_and):\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "markdown", "id": "4a2dd7bd-a21e-4fe6-804d-3c595ee2dbea", "metadata": {}, "source": [ "## Getting urls to test\n", "In order to use our function, we need to grab the url of a Bluesky post to test it with. Once you find the post, find the 'Copy Link to Post' option to get a web url for the post.\n", "\n", "\n", "\n", "It should be something like: [https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y](https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y)\n", "\n", "Then paste the entire URL in as the string for the first argument to the `print_post_thread` function, as in the example below. Try it out!" ] }, { "cell_type": "code", "execution_count": null, "id": "30843c09-003b-470d-b11a-5b054d803322", "metadata": {}, "outputs": [], "source": [ "print_post_thread(\"https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y\", False)\n" ] }, { "cell_type": "markdown", "id": "b69e1e3e-02eb-4ef7-b462-13f1c4192417", "metadata": {}, "source": [ "If we also want to see what comments are being skipped, we can use an optional argument for `print_post_and_replies` by setting `show_hidden = True`, and the comments that are being skipped will show up with a reddish background." ] }, { "cell_type": "code", "execution_count": null, "id": "c39a56f4-f5f9-4826-a7ad-0c2d1a174b4c", "metadata": {}, "outputs": [], "source": [ "print_post_thread(\"https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y\", True)" ] }, { "cell_type": "markdown", "id": "7424e1f6-4752-4fa7-8241-ec10c01ca1ca", "metadata": {}, "source": [ "## Test it out with 3 Bluesky threads!" ] }, { "cell_type": "markdown", "id": "90f4d382-ac77-4acc-a2ef-540ec4bfbf32", "metadata": {}, "source": [ "Now, after you've modified the `should_display`, try testing out your algorithm on three new posts (make sure they have replies!), answering follow up questions after each one.\n", "\n", "In the sections below, replace the `?????`s with a bluesky url, and run the code. Then answer the questions about how that went.\n", "\n", "At the very end will be more reflection questions." ] }, { "cell_type": "markdown", "id": "15a78495-9de0-4fc6-b2b1-c0de6f813dca", "metadata": {}, "source": [ "### TODO: Print bluesky thread 1" ] }, { "cell_type": "code", "execution_count": null, "id": "6f126b47-6ed4-4e4f-9e38-c23f9da678dd", "metadata": {}, "outputs": [], "source": [ "print_post_thread('https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y', show_hidden = True)\n" ] }, { "cell_type": "markdown", "id": "3ae2306c-c14a-4418-8984-2d4f7519f264", "metadata": {}, "source": [ "### TODO: Bluesky thread 1 follow-up questions\n", "Write an answer in response to each of these questions (you can edit this text by double clicking it):\n", "\n", "Look through the output of `print_post_thread()` based on your modified `should_display` function.\n", "\n", "Did your function tend to keep most posts or tend to hide most posts?" ] }, { "cell_type": "markdown", "id": "a59aa9e4-dd2c-496a-80d0-0f7ae21a3539", "metadata": {}, "source": [ "TODO: Your answer here" ] }, { "cell_type": "markdown", "id": "b598eb88-6d82-4756-a315-57ef5c1ed4a2", "metadata": {}, "source": [ "Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?" ] }, { "cell_type": "markdown", "id": "daaf3757-2354-4b17-9540-c404a503e0bd", "metadata": {}, "source": [ "TODO: Your answer here" ] }, { "cell_type": "markdown", "id": "185f23e4-635e-4bb6-9caa-2ccb6697ed50", "metadata": {}, "source": [ "### TODO: Print bluesky thread 2" ] }, { "cell_type": "code", "execution_count": null, "id": "a10106dd-b9e2-48b6-b19c-eaa33a1e8004", "metadata": {}, "outputs": [], "source": [ "print_post_thread('https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y', show_hidden = True)\n" ] }, { "cell_type": "markdown", "id": "834d9462-4036-43df-be88-6d077e7c3433", "metadata": {}, "source": [ "### TODO: Bluesky thread 2 follow-up questions\n", "Write an answer in response to each of these questions (you can edit this text by double clicking it):\n", "\n", "Look through the output of `print_post_thread()` based on your modified `should_display` function.\n", "\n", "Did your function tend to keep most posts or tend to hide most posts?" ] }, { "cell_type": "markdown", "id": "54e0cf7b-f98f-42e2-bc6c-abb8646b3202", "metadata": {}, "source": [ "TODO: Your answer here" ] }, { "cell_type": "markdown", "id": "52e7b3bc-ad21-484b-ae6f-a03e1fd5ae90", "metadata": {}, "source": [ "Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?" ] }, { "cell_type": "markdown", "id": "5bc20f07-1438-45db-8f4c-68e21da3cb91", "metadata": {}, "source": [ "TODO: Your answer here" ] }, { "cell_type": "markdown", "id": "c4418a5a-1226-4530-a75a-210c1fb829f1", "metadata": {}, "source": [ "### TODO: Print bluesky thread 3" ] }, { "cell_type": "code", "execution_count": null, "id": "6a38ab4f-02cd-4699-9dfb-8167a5c1837a", "metadata": {}, "outputs": [], "source": [ "print_post_thread('https://bsky.app/profile/realgdt.bsky.social/post/3lihunicmds2y', show_hidden = True)\n" ] }, { "cell_type": "markdown", "id": "2fea24ca-348b-449b-b93c-400a6152d01d", "metadata": {}, "source": [ "### TODO: Bluesky thread 3 follow-up questions\n", "Write an answer in response to each of these questions (you can edit this text by double clicking it):\n", "\n", "Look through the output of `print_post_thread()` based on your modified `should_display` function.\n", "\n", "Did your function tend to keep most posts or tend to hide most posts?" ] }, { "cell_type": "markdown", "id": "26f9b99f-df06-4c69-8fd6-e24363fc0eb0", "metadata": {}, "source": [ "TODO: Your answer here" ] }, { "cell_type": "markdown", "id": "f3caac56-f032-4cb0-a735-52fed55f498f", "metadata": {}, "source": [ "Do you see any pattern to the contents of the posts you showed versus hid (e.g., did it actually select better quality or more interesting posts)?" ] }, { "cell_type": "markdown", "id": "1ce2b3b5-8451-44ed-846b-4e12c444aa60", "metadata": {}, "source": [ "TODO: Your answer here" ] }, { "cell_type": "markdown", "id": "d4682f88-59bb-4126-bf00-5b0221efee54", "metadata": {}, "source": [ "## TODO: Final Reflection questions\n", "Write an answer in response in response to each of these questions:\n", "\n", "Explain why you chose the rules you did for selecting the best comments?" ] }, { "cell_type": "markdown", "id": "37e91274-332d-4eb1-b365-5d2c312bb6e1", "metadata": {}, "source": [ "TODO: Your answer here" ] }, { "cell_type": "markdown", "id": "bcf28910-7289-4305-bff5-be2dbcd5cbc1", "metadata": {}, "source": [ "What was most challenging about coming up with your rules?" ] }, { "cell_type": "markdown", "id": "647b7aa1-29ee-4780-b972-95be904a79be", "metadata": {}, "source": [ "TODO: Your answer here" ] }, { "cell_type": "markdown", "id": "a4991f4a-d0ce-448e-ae8c-2f0afdb66d6a", "metadata": {}, "source": [ "What additional information or rules do you wish you could have used?" ] }, { "cell_type": "markdown", "id": "29ce86f6-dd00-4bad-b166-b5374ed151d5", "metadata": {}, "source": [ "TODO: Your answer here" ] }, { "cell_type": "markdown", "id": "d12a1f50-7d6c-4c8a-a939-cae7bf9c3fb6", "metadata": {}, "source": [ "If someone or some group wanted to make sure their comments were shown by your function, what would they do? How hard would this be?" ] }, { "cell_type": "markdown", "id": "fa6a53d9-6ba8-4e6b-9f13-6dbcc0365dc7", "metadata": {}, "source": [ "TODO: Your answer here" ] }, { "cell_type": "markdown", "id": "9c8c3e2a-6740-4293-abc8-39f1aa8d01a2", "metadata": {}, "source": [ "If someone or some group wanted to make sure someone else's comments were NOT shown by your function, what would they do (if anything)? How hard would this be?" ] }, { "cell_type": "markdown", "id": "37610a35-6670-4524-b479-10e70c2aaf7b", "metadata": {}, "source": [ "TODO: Your answer here" ] }, { "cell_type": "markdown", "id": "6d1fa4de-89d7-4b20-89e8-9637e4e29759", "metadata": {}, "source": [ "If Bluesky adopted this rule as a universal rule for which posts to display, what do you think would happen? (e.g., Would people change posting strategies? Would posting look different than currently? Would it get overwhelmed with spam?)\n" ] }, { "cell_type": "markdown", "id": "ccc77ee5-b4ca-4929-811c-98bb9b3c8f4a", "metadata": {}, "source": [ "TODO: Your answer here" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.5" } }, "nbformat": 4, "nbformat_minor": 5 }