9.4.3. Demo: Track Use of Sentiment Analysis Code#
Choose Social Media Platform: Reddit | Discord | Bluesky | No Coding
In this code demo, we will take the sentiment analysis code we used in the last chapter (Data Mining), and we will turn it into a function which will make it easier to use.
After turning it into a function though, we will add code to that function to track how it is used. We could theoretically take this information we are tracking and send to results to some other account.
This sort of tracking can be part of tracking program telemetry, which can be useful in figure out where software is broken or where it is most or least useful. But it can also be violating the privacy of anyone using our funtion who doesn’t know we are tracking its use, or used maliciously to steal user information.
Discord Setup#
# Load some code called "discord" that will help us work with Discord
import discord
# Load another library that helps the bot work in Jupyter Noteboook
import nest_asyncio
nest_asyncio.apply()
(optional) make a fake Discord connection with the fake_discord library
%run ../../fake_apis/fake_discord.ipynb
# Set up your Discord connection
# TODO: put the discord token for your bot below
discord_token = "m#5@_fake_discord_token_$%Ds"
# set up Discord client with permissions to read message_contents
intents = discord.Intents.default()
intents.message_content = True
load sentiment analysis library and make analyzer#
import nltk
nltk.download(["vader_lexicon"])
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
[nltk_data] Downloading package vader_lexicon to
[nltk_data] C:\Users\kmthayer\AppData\Roaming\nltk_data...
[nltk_data] Package vader_lexicon is already up-to-date!
original code to loop through submissions, finding average sentiment#
This is the code from chapter 8 that loops through submissions in the specified channel and calculates the average sentiment
# set up discord connection
client = discord.Client(intents=intents)
# TODO: put the discord channel id number below for the channel you want to use
channel_id = 123456789
# Provide instructions for what your discord bot should do once it has logged in
@client.event
async def on_ready():
global recent_posts # Save the recent_posts variable outside our running bot
# Load the discord channel you want to post to
channel = client.get_channel(channel_id)
# Get the latest post in the channel history
post_history = channel.history(limit=10)
#special code to turn the post_history from discord into a python list
recent_posts = [post async for post in post_history]
# Tell your bot to stop running
await client.close()
# Now that we've defined how the bot shoould work, start running your bot
client.run(discord_token)
num_posts = 0
total_sentiment = 0
for post in recent_posts:
#calculate sentiment
post_sentiment = sia.polarity_scores(post.content)["compound"]
num_posts += 1
total_sentiment += post_sentiment
print("Sentiment: " + str(post_sentiment))
print(" post content: " + post.content)
print()
average_sentiment = total_sentiment / num_posts
print("Average sentiment was " + str(average_sentiment))
Sentiment: 0.784
post content: Breaking news: A lovely cat took a nice long nap today!
Sentiment: 0.0
post content: Breaking news: Someone said a really mean thing on the internet today!
Sentiment: 0.7088
post content: Breaking news: Some grandparents made some yummy cookies for all the kids to share!
Sentiment: -0.6114
post content: Breaking news: All the horrors of the universe revealed at last!
Average sentiment was 0.22034999999999996
Make a function using the code above for finding the average sentiment#
We now make a function of that code above by doing the following:
Add a
def
line at the start to make a function calledfind_average_sentiment
Indent all the old code so that it becomes the contents of the function
find_average_sentiment
Make the function take two arguments:
channel_id
, which takes place of “123456789”, so the person calling the function can choose which channel to searchdisplay_progress
which defaults to False. This decides whether or not the print statements are run when the function is run, so we can see the progress if we want, or just get the answer by default
At the end of the function, return the average_sentiment as the result
def find_average_sentiment(channel_id, display_progress = False):
# set up discord connection
client = discord.Client(intents=intents)
# Provide instructions for what your discord bot should do once it has logged in
@client.event
async def on_ready():
global recent_posts # Save the recent_posts variable outside our running bot
# Load the discord channel you want to post to
channel = client.get_channel(channel_id)
# Get the latest post in the channel history
post_history = channel.history(limit=10)
#special code to turn the post_history from discord into a python list
recent_posts = [post async for post in post_history]
# Tell your bot to stop running
await client.close()
# Now that we've defined how the bot shoould work, start running your bot
client.run(discord_token)
num_posts = 0
total_sentiment = 0
for post in recent_posts:
#calculate sentiment
post_sentiment = sia.polarity_scores(post.content)["compound"]
num_posts += 1
total_sentiment += post_sentiment
if(display_progress):
print("Sentiment: " + str(post_sentiment))
print(" post content: " + post.content)
print()
average_sentiment = total_sentiment / num_posts
if(display_progress):
print("Average sentiment was " + str(average_sentiment))
return average_sentiment
Now let’s try using the function
find_average_sentiment(channel_id = 123456789)
0.22034999999999996
find_average_sentiment(channel_id = 987654321, display_progress=True)
Sentiment: 0.5093
post content: Look at my cute dog!
Sentiment: 0.0
post content: A baby lizard!
Sentiment: 0.6239
post content: The cutest bird ever!
Average sentiment was 0.3777333333333333
0.3777333333333333
Modify the function so it tracks use#
Now we make another version of the same function, but with a small difference:
We make a list variable called
sentiment_searches
which exists outside the function.At the start of the function we add the subreddit being searched to that list. This way, as the function gets used, we’ll keep a history of its use in the
sentiment_searches
list
# Make a list to save what subreddit was used for each time `find_average_sentiment` is run
sentiment_searches = []
def find_average_sentiment(channel_id, display_progress = False):
# Add the current subreddit being searched to the sentiment_searches list
sentiment_searches.append(channel_id)
# set up discord connection
client = discord.Client(intents=intents)
# Provide instructions for what your discord bot should do once it has logged in
@client.event
async def on_ready():
global recent_posts # Save the recent_posts variable outside our running bot
# Load the discord channel you want to post to
channel = client.get_channel(channel_id)
# Get the latest post in the channel history
post_history = channel.history(limit=10)
#special code to turn the post_history from discord into a python list
recent_posts = [post async for post in post_history]
# Tell your bot to stop running
await client.close()
# Now that we've defined how the bot shoould work, start running your bot
client.run(discord_token)
num_posts = 0
total_sentiment = 0
for post in recent_posts:
#calculate sentiment
post_sentiment = sia.polarity_scores(post.content)["compound"]
num_posts += 1
total_sentiment += post_sentiment
if(display_progress):
print("Sentiment: " + str(post_sentiment))
print(" post content: " + post.content)
print()
average_sentiment = total_sentiment / num_posts
if(display_progress):
print("Average sentiment was " + str(average_sentiment))
return average_sentiment
Now let’s run this version of the function
find_average_sentiment(channel_id = 123456789)
0.22034999999999996
find_average_sentiment(channel_id = 987654321)
0.3777333333333333
It looks like it works like normal, but our calls to the function have been tracked!
display(sentiment_searches)
[123456789, 987654321]
Now, if we were being malicious, we would hide this code in some other code library we would try to convince you to use, that way you wouldn’t notice the code. And instead of just saving those searches or posts to a variable, we would send it to ourselves, perhaps by putting code into our social media code library to log into a different account and private messaged that info to ourselves.
How can we trust code libraries?#
If people can make code libraries track us and violate our privacy, how can we trust them? We could try looking at the source code for the PRAW library to try and make sure the library we are using isn’t doing anything bad, but no programmer can be expected to read through all the libraries they use. There is unfortunately no simple answer to this.
In fact, there are cases where people have messed with code libraries:
The United States National Security Agency “paid massive computer security firm RSA $10 million to promote a flawed encryption system so that the surveillance organization could wiggle its way around security.”
Does US national security outweigh global computer security?
Shortly after the Russian invasion of Ukraine in 2022, someone modified a popular NodeJS code library so that it would automatically destroy files if it was run on a computer in Russia or Belarus.
Does opposing a military invasion justify sabatoging a code library?
And those are just the intentional problems with code libraries. All sorts of code libraries and computer programs are full of security flaws, which are regularly discovered and fixed (though who knows how much the flaws were exploited first).