Perhaps you’ve run across Twitter bots like @thesefutures or @thinkpiecebot and wondered how to do that yourself. It’s surprisingly simple and you should give it a try! For the less-technically-inclined, one can even create a bot with minimal code using a tool that will handle the heavy lifting for you. But if, Reader dearest, you desire something more than that— you’ve come to the right place. Today we’re going to get our hands dirty with Python and the Twitter API, and code our own Twitter bot from scratch. Later, we’ll even go over deploying the bot to Heroku and letting it loose to act autonomously in the cloud. If this piques your curiosity, then I’d love to show you how simple it can really be, with a pinch of Python magic.
Introductory matters
Before we dive in, let’s go over the tools we’ll be using. Of course, you’ll need Python 3, git, and your favorite Python IDE (I’m partial to PyCharm). To interface with Twitter, we’ll be working with the excellent Twitter API wrapper Tweepy. To generate our tweets, we’ll also work with Markovify, a Markov chain library. Today, we’ll get you set up with Tweepy & Markovify, write the TweetBot
class, and get your new bot up & running. Roll up your sleeves, we’ve got some code to write!
Setup: Getting your Twitter credentials & setting up your libraries
To start, you’ll need to get some Twitter credentials so that you can access the API. Unless you want your bot to post to your personal Twitter, I recommend creating a new Twitter account for your bot and going through this process with the dedicated account. First, you’ll need to register a new twitter app for your bot. I should mention here that you’ll need to associate a unique phone number with the account in order to create an app. If you don’t have a spare phone number laying around, you can either make use of a service like Google Voice or Burner, or register the app on your primary Twitter account and go through the lengthy process of transferring the API key to another account, as outlined on Molly White’s blog. Now would also be a good time to go over the automation rules and best practices. In any case, once you’ve got an account registered, let’s proceed:
Getting your credentials
Go to apps.twitter.com and click on the “Create New App” button. You’ll need to fill out a few fields for your bot’s name, description, and website, and to agree to the Twitter Developer Agreement, and then you’ll have a new app! Now, go to the Permissions tab and ensure that your app has read and write permissions. Finally, go to the Keys and Access Tokens tab, and create your access tokens. We’ll need to store these for your bot.
It’s important that you keep these keys private, because anyone with access to them can access the API in your name and freely post to your account. To prevent this from happening, we’re going to ensure that your keys do not get checked in to version control. Make a directory for your project, git init
as you usually would, then create a file entitled .gitignore
with these contents:
twitter_credentials.py
Now, let’s create twitter_credentials.py
and store your access keys in it:
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
Copy and paste the keys and access tokens from Twitter into the appropriate variables in twitter_credentials.py
, and you’ll be ready to move on.
Installing Tweepy & Markovify
This should be the easiest part of the whole process!
pip install tweepy
pip install markovify
That’s it! Now we’re ready to code.
Tweeting with Tweepy
OK, let’s dive into the code. Make yourself a tweetbot.py
, and import Tweepy and your Twitter keys:
import tweepy
from twitter_credentials import consumer_key, consumer_secret, access_token, access_token_secret
Now we need to get authorized with Twitter. Tweepy makes this super simple:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
Let’s test it out!
api.update_status("Hello, world! Just testing Python twitter automation with tweepy")
Head over to your bot’s twitter, and you should see a tweet!
If this worked out for you, you’re all set to move on to our next challenge: procedural text generation.
Generating gibberish with Markovify
Now that our bot can post to Twitter, we need to teach it to speak. We’re going to do this with Markov chains. To horribly oversimplify, what a Markov chain does is this: when given an input state, figure out the most likely state to follow it. We’re going to train a Markov model on a corpus of natural-language text, allowing us to procedurally generate semi-coherent gibberish in the language of your choice. (If you’re algorithmically-minded, this is actually really easy to implement by hand, but we’re going to stick with a tried-and-true library for this bot.)
Getting a corpus
Before generating Markov chains, we’ll need a corpus. This doesn’t need to be anything special; a flat text file is all you’ll need. The larger the corpus, the better that your results will be— something novel-length would be ideal. If you’re having trouble finding a corpus, you could grab some public domain literature from Project Gutenberg, or use my favorite corpus, the entire text of Charlotte Brontë’s Jane Eyre. Whatever corpus you choose, just download it and save it as corpus.txt
in your project directory.
It’s Markov time!
In your tweetbot.py
you’ll need to import Markovify:
import markovify
Now we’ll load up the corpus and build a model with it.
with open("corpus.txt") as corpus_file:
corpus = corpus_file.read()
model = markovify.Text(corpus)
Time to take our new Markov model for a spin! The make_sentence()
method will generate a sentence from your model:
In[1]: model.make_sentence()
Out[1]: 'I remembered the answer of the house, and once attempted chastisement; but as mere strangers, they had to deceive a fine old hall, rather neglected of late occurrence.'
It worked! But if we want our bot to post on Twitter, we’ll need to find a way to keep the length of the generated sentences under 140 characters. Conveniently, Markovify provides the make_short_sentence()
method to do exactly that. Let’s give it a try!
In[2]: model.make_short_sentence(140)
Out[2]: 'Refuse to be happy at his features, beautiful in their still severity; at his eyes, bright and dark conjectures.'
Feel free to play around with the parameters until you like your model’s results. Once you’re all set, we can move on to building our bot.
Sketch up a TweetBot class
Let’s pause a moment and figure out what we want our bot to do before we start writing it. If we want our bot to run autonomously, we’ll need it to do at least these things:
- Authenticate with Twitter
- Load a text corpus
- Make a Markov model from that corpus
- Make sentences from the model
- Tweet those sentences
- Automate itself to tweet every X seconds
Let’s try to break this down into some variables and methods. Clearly, we’ll need to give our bot a corpus
to load and a delay
in seconds between tweets. We’ll need to store a markov model
in order to generate tweets. We’ve also seen from noodling with Tweepy that we’ll need an api
object. Of these, the api
and the model
are the only ones we’ll need to use repeatedly, so they’ll be class fields; the other two need merely be method arguments. As for methods: we’ll only need to authenticate, load our corpus, and make our model once, so it makes sense to put these in the constructor. However, if we pull out the corpus and modeling into a helper method, we’ll also be able to change our corpus after the bot is initilized. Making sentences and tweeting them can go in the same method, since we’ll always be doing both together. Lastly, automating can be its own thing. If we mock up those class methods, it’ll look like this:
class TweetBot:
def __init__(self, corpus):
#load corpus & build model
#initialize Twitter authorization with Tweepy
pass
def load_corpus(self, corpus):
#open our corpus & run it through Markovify
pass
def tweet(self):
#generate Markov tweet & send it
pass
def automate(self, delay):
#automatically tweet every delay seconds
pass
Now that we know what our bot class will look like, let’s start filling in the code.
Putting it all together
init
If we want to initialize our Markov model in the constructor, we’ll need to pass it the path to our corpus
. We can then call the load_corpus
method we’ll later write. As for the authentication, it’ll work just the same as in our earlier Tweepy exploration, except that we’ll need to store it in a field.
def __init__(self, corpus):
self.load_corpus(corpus)
#initialize Twitter authorization with Tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
self.api = tweepy.API(auth)
load_corpus
Just like in our Markovify example, this method will take the path to our corpus file, load its contents into a variable, and train a Markov model
with it. We’ll then store the model
as a field.
def load_corpus(self, corpus):
with open(corpus) as corpus_file:
corpus_lines = corpus_file.read()
self.model = markovify.Text(corpus_lines)
tweet
We’ve already seen how to generate a tweet from our model
and how to update our status, so we can just put them together. However, we don’t want our bot to crash if a tweet fails for some reason (say, an improperly-formed tweet, or hitting a rate limit). So, let’s wrap it in a try-except block to catch any Tweepy errors that get thrown.
def tweet(self):
message = self.model.make_short_sentence(140)
try:
self.api.update_status(message)
except tweepy.TweepError as error:
print(error.reason)
automate
Now we just need to get our bot to post every delay
seconds until we stop it. The easiest way to do this is Python’s sleep
function. Let’s import it:
from time import sleep
Now the rest is trivial.
def automate(self, delay):
while True:
self.tweet()
sleep(delay)
Tying it all together
Now that we have all the pieces, let’s put them together, then write ourselves a main method to automatically tweet every hour.
import tweepy
import markovify
from time import sleep
from twitter_credentials import consumer_key, consumer_secret, access_token, access_token_secret
class TweetBot:
def __init__(self, corpus):
self.load_corpus(corpus)
#initialize Twitter authorization with Tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
self.api = tweepy.API(auth)
def load_corpus(self, corpus):
with open(corpus) as corpus_file:
corpus_lines = corpus_file.read()
self.model = markovify.Text(corpus_lines)
def tweet(self):
message = self.model.make_short_sentence(140)
try:
self.api.update_status(message)
except tweepy.TweepError as error:
print(error.reason)
def automate(self, delay):
while True:
self.tweet()
sleep(delay)
def main():
bot = TweetBot("corpus.txt")
bot.automate(3600)
if __name__ == "__main__":
main()
Now, we can run our bot from the command line:
python tweetbot.py
Check your bot’s Twitter feed, and you should see a new tweet. Congrats! You’ve learned how to write Twitter bots. Now that you have a working bot, why not tinker around with it a bit? When you’re satisfied, make sure to commit all your changes to git, as you’ll need git for next time.
Next time on Build You a TweetBot
In the next installment, we’ll go over polishing up your app’s interface with argparse
, using environment variables with dotenv
, and migrating our bot to the cloud with Heroku. If you want to see this code in action, you can check out my bot @MechaBronte. Lastly, if you want to see where we’ll eventually be going with this, all my code is up on GitHub.
Happy botsmithing!