9.4.3. Demo: Track Use of Sentiment Analysis Code#

Choose Social Media Platform: Reddit | Discord | Bluesky | No Coding

In this code demo, we will take the sentiment analysis code we used in the last chapter (Data Mining), and we will turn it into a function which will make it easier to use.

After turning it into a function though, we will add code to that function to track how it is used. We could theoretically take this information we are tracking and send to results to some other account.

This sort of tracking can be part of tracking program telemetry, which can be useful in figure out where software is broken or where it is most or least useful. But it can also be violating the privacy of anyone using our funtion who doesn’t know we are tracking its use, or used maliciously to steal user information.

Discord Setup#

# Load some code called "discord" that will help us work with Discord
import discord

# Load another library that helps the bot work in Jupyter Noteboook
import nest_asyncio
nest_asyncio.apply()

(optional) make a fake Discord connection with the fake_discord library

%run ../../fake_apis/fake_discord.ipynb
Fake discord is replacing the discord.py library. Fake discord doesn't need real passwords, and prevents you from accessing real discord
# Set up your Discord connection
# TODO: put the discord token for your bot below
discord_token = "m#5@_fake_discord_token_$%Ds"

# set up Discord client with permissions to read message_contents
intents = discord.Intents.default()
intents.message_content = True 

load sentiment analysis library and make analyzer#

import nltk
nltk.download(["vader_lexicon"])
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\kmthayer\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!

original code to loop through submissions, finding average sentiment#

This is the code from chapter 8 that loops through submissions in the specified channel and calculates the average sentiment

# set up discord connection
client = discord.Client(intents=intents)

# TODO: put the discord channel id number below for the channel you want to use
channel_id = 123456789

# Provide instructions for what your discord bot should do once it has logged in
@client.event
async def on_ready():
    global recent_posts # Save the recent_posts variable outside our running bot
    
    # Load the discord channel you want to post to
    channel = client.get_channel(channel_id)

    # Get the latest post in the channel history
    post_history = channel.history(limit=10)
    
    #special code to turn the post_history from discord into a python list
    recent_posts = [post async for post in post_history]

    # Tell your bot to stop running
    await client.close()
    
# Now that we've defined how the bot shoould work, start running your bot
client.run(discord_token)

num_posts = 0
total_sentiment = 0

for post in recent_posts:
    
    #calculate sentiment
    post_sentiment = sia.polarity_scores(post.content)["compound"]
    num_posts += 1
    total_sentiment += post_sentiment

    print("Sentiment: " + str(post_sentiment))
    print("   post content: " + post.content)
    print()


average_sentiment = total_sentiment / num_posts
print("Average sentiment was " + str(average_sentiment))
Fake discord is pretending to set up a client connection
Fake discord bot is fake logging in and starting to run
Fake discord bot is shutting down
Sentiment: 0.784
   post content: Breaking news: A lovely cat took a nice long nap today!

Sentiment: 0.0
   post content: Breaking news: Someone said a really mean thing on the internet today!

Sentiment: 0.7088
   post content: Breaking news: Some grandparents made some yummy cookies for all the kids to share!

Sentiment: -0.6114
   post content: Breaking news: All the horrors of the universe revealed at last!

Average sentiment was 0.22034999999999996

Make a function using the code above for finding the average sentiment#

We now make a function of that code above by doing the following:

  • Add a def line at the start to make a function called find_average_sentiment

  • Indent all the old code so that it becomes the contents of the function find_average_sentiment

  • Make the function take two arguments:

    • channel_id, which takes place of “123456789”, so the person calling the function can choose which channel to search

    • display_progress which defaults to False. This decides whether or not the print statements are run when the function is run, so we can see the progress if we want, or just get the answer by default

  • At the end of the function, return the average_sentiment as the result

def find_average_sentiment(channel_id, display_progress = False):
    # set up discord connection
    client = discord.Client(intents=intents)

    # Provide instructions for what your discord bot should do once it has logged in
    @client.event
    async def on_ready():
        global recent_posts # Save the recent_posts variable outside our running bot

        # Load the discord channel you want to post to
        channel = client.get_channel(channel_id)

        # Get the latest post in the channel history
        post_history = channel.history(limit=10)

        #special code to turn the post_history from discord into a python list
        recent_posts = [post async for post in post_history]

        # Tell your bot to stop running
        await client.close()

    # Now that we've defined how the bot shoould work, start running your bot
    client.run(discord_token)

    num_posts = 0
    total_sentiment = 0

    for post in recent_posts:

        #calculate sentiment
        post_sentiment = sia.polarity_scores(post.content)["compound"]
        num_posts += 1
        total_sentiment += post_sentiment
        
        if(display_progress):
            print("Sentiment: " + str(post_sentiment))
            print("   post content: " + post.content)
            print()


    average_sentiment = total_sentiment / num_posts
    if(display_progress):
        print("Average sentiment was " + str(average_sentiment))
    
    return average_sentiment

Now let’s try using the function

find_average_sentiment(channel_id = 123456789)
Fake discord is pretending to set up a client connection
Fake discord bot is fake logging in and starting to run
Fake discord bot is shutting down
0.22034999999999996
find_average_sentiment(channel_id = 987654321, display_progress=True)
Fake discord is pretending to set up a client connection
Fake discord bot is fake logging in and starting to run
Fake discord bot is shutting down
Sentiment: 0.5093
   post content: Look at my cute dog!

Sentiment: 0.0
   post content: A baby lizard!

Sentiment: 0.6239
   post content: The cutest bird ever!

Average sentiment was 0.3777333333333333
0.3777333333333333

Modify the function so it tracks use#

Now we make another version of the same function, but with a small difference:

  • We make a list variable called sentiment_searches which exists outside the function.

  • At the start of the function we add the subreddit being searched to that list. This way, as the function gets used, we’ll keep a history of its use in the sentiment_searches list

# Make a list to save what subreddit was used for each time `find_average_sentiment` is run
sentiment_searches = []

def find_average_sentiment(channel_id, display_progress = False):
    
    # Add the current subreddit being searched to the sentiment_searches list
    sentiment_searches.append(channel_id)
    
    # set up discord connection
    client = discord.Client(intents=intents)

    # Provide instructions for what your discord bot should do once it has logged in
    @client.event
    async def on_ready():
        global recent_posts # Save the recent_posts variable outside our running bot

        # Load the discord channel you want to post to
        channel = client.get_channel(channel_id)

        # Get the latest post in the channel history
        post_history = channel.history(limit=10)

        #special code to turn the post_history from discord into a python list
        recent_posts = [post async for post in post_history]

        # Tell your bot to stop running
        await client.close()

    # Now that we've defined how the bot shoould work, start running your bot
    client.run(discord_token)

    num_posts = 0
    total_sentiment = 0

    for post in recent_posts:

        #calculate sentiment
        post_sentiment = sia.polarity_scores(post.content)["compound"]
        num_posts += 1
        total_sentiment += post_sentiment
        
        if(display_progress):
            print("Sentiment: " + str(post_sentiment))
            print("   post content: " + post.content)
            print()


    average_sentiment = total_sentiment / num_posts
    if(display_progress):
        print("Average sentiment was " + str(average_sentiment))
    
    return average_sentiment

Now let’s run this version of the function

find_average_sentiment(channel_id = 123456789)
Fake discord is pretending to set up a client connection
Fake discord bot is fake logging in and starting to run
Fake discord bot is shutting down
0.22034999999999996
find_average_sentiment(channel_id = 987654321)
Fake discord is pretending to set up a client connection
Fake discord bot is fake logging in and starting to run
Fake discord bot is shutting down
0.3777333333333333

It looks like it works like normal, but our calls to the function have been tracked!

display(sentiment_searches)
[123456789, 987654321]

Now, if we were being malicious, we would hide this code in some other code library we would try to convince you to use, that way you wouldn’t notice the code. And instead of just saving those searches or posts to a variable, we would send it to ourselves, perhaps by putting code into our social media code library to log into a different account and private messaged that info to ourselves.

How can we trust code libraries?#

If people can make code libraries track us and violate our privacy, how can we trust them? We could try looking at the source code for the PRAW library to try and make sure the library we are using isn’t doing anything bad, but no programmer can be expected to read through all the libraries they use. There is unfortunately no simple answer to this.

In fact, there are cases where people have messed with code libraries:

And those are just the intentional problems with code libraries. All sorts of code libraries and computer programs are full of security flaws, which are regularly discovered and fixed (though who knows how much the flaws were exploited first).