How to build a Twitter bot in 5 minutes (and tweet the entire Boris script)

The weather is bad, covid restrictions are still there and you are starting to get tired of endlessly switching from baking to binge-watching your favourite TV show on Netflix, aren’t you?

Last weekend I was exactly in this mood. So, inspired from the Fleabag script bot I decided to grasp the opportunity to learn how to build Twitter bots.

I did one for La Grande Bellezza (The Great Beauty) and one for Boris, two inexhaustible sources of Italian memes and life lessons (two categories apparently uncorrelated but actually hyper-interdependent).

If you’re also eager to learn how to build a bot, you’re in the right place: in this post I will show you how to easily set up a Twitter bot in Python which systematically tweets every hour a quote from your favourite movie.

1. Go grab the script

First things first, we need the file from which our bot will randomly choose a sentence to tweet. One approach is to retrieve and use directly the script of the movie, from sites storing free available scripts, as The Internet Movie Database. But since only a small portion of the movies are available on imsdb, here instead we rely on movie subtitles, freely downloadable from OpenSubtitle.

Just select the language you prefer, type the movie/TV show title in the search bar and download the subtitles in the srt format. In this guide I will use the English subs of the multi-award winning Parasite movie.

To extract the clean text from the .srt file we could either do it with manual parsing, filtering out the timings and the HTML tags, or with the help of the pysrt package. This is the code I used to manually parse the subtitles into a .txt file:

import re

def parse_file(fname):
    file = open(fname, "r")
    lines = file.readlines()
    file.close()
    script_parsed = ''
    for line in lines:
        if re.search('^[0-9]+$', line) is None and re.search('^[0-9]{2}:[0-9]{2}:[0-9]{2}', line) is None and re.search('^$', line) is None:
            line = re.sub('<[^<]+?>', '', line) 
            line = re.sub('{.*}', '', line)
            script_parsed += ' ' + line.rstrip('\n')
        script_parsed = script_parsed.lstrip()
    return script_parsed
script = parse_file("./2020_Parasite.srt")
output_file = open("2020_Parasite_script.txt", "w")
output_file.write(script)
output_file.close() 

Now that we have in 2020_Parasite_script.txt the clean movie script we can build the actual bot.

2. Set up your Twitter bot

Create the Twitter account to get started. Then, head over to developer.twitter.com to apply for a developer account. Select Hobbyist, then flag Making a bot and compile all the required fields. Once you have completed this step, and you’ve verified your developer account, you can create an app from the dashboard.

Now, you can generate from Keys and Tokens both the Consumer Keys and the Authentication Tokens, that you will need to link your bot script with the Twitter app: annotate and carefully protect them. Also, be sure to have enabled both the Read and Write permissions from App Settings → App permissions.

At this stage, everything is ready to build the bot. First of all, let’s import all the Python modules and load the script file of the movie. To make things easier, we leverage the nltkpackage for Natural Language Processing to smoothly split the corpus into sentences. We also rely on the tweepypackage, a simple wrapper of the Twitter API. You can easily install them via pip.

import tweepy
import random
import nltk
nltk.download('punkt')
from nltk import tokenize

corpus = open('2020_Parasite_script.txt', 'r').read()
sentences = tokenize.sent_tokenize(corpus)

We next pick a random quote (of a compatible character limit size) from this set set of sentences:

n_sentences = len(sentences)
length = 260
while length > 250 or length < 5:
	quote = sentences[random.randint(0,n_sentences-1)].strip()
	quote = " ".join(quote.split())
	length = len(quote)

print(quote)

Finally, we use the previously generated keys to authenticate and tweet the selected quote.

CONSUMER_KEY = "MYCONSUMERKEY" # insert here your keys
CONSUMER_SECRET = "MYCONSUMERKEY"
ACCESS_TOKEN_KEY = "MYTOKENKEY"
ACCESS_TOKEN_SECRET = "MYTOKENSECRET"

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN_KEY, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
api.update_status(quote)

up

Now we are really (almost) done.

Substitute your consumer and access token keys (and eventually hide them in a .env file) and put everything in a tweet_parasite.py script. We inserted a print so we can see an example of what we get when we run it from the terminal:

$ python tweet_parasite.py
You can just drop me off at Hyehwa station.

3. Go PRO and deploy the bot on Heroku

Perfect! The bot is working, we checked that it tweets correctly the selected quote but we want it to do it automatically and continuously. Great. That’s where Heroku comes into play. Sign up for a free account and register a new app to run your bot. Follow the instructions to log in with your Heroku credentials locally, then in the folder containing the Python bot tweet_parasite.py and the movie script 2020_Parasite_script.txt add a requirements.txt file with the nltk and tweepy versions you installed (you can check them via pip list). In my case this is:

nltk==3.5
tweepy==3.10.0

Now, you can commit your project to heroku.

$ git add .
$ git commit -m "initialize"
$ heroku create
$ git push heroku master

You can finally add the free Heroku Scheduler add-on to run your app at specific intervals. From the scheduler dashboard you can add jobs that will run the bot every 10 minutes, every hour, or at specific hours every day.

I hope this was helpful! :)