Creating a Bot that Quotes Pop Culture in Context

Published: Wed 04 October 2017

In Random.

So I’m interested in creating a bot that could quote movie quotes back to your in a context that makes sense. I don’t know how your conversations with your buddies go, but 70% of our everyday conversation is quoting things in new and different contexts, so I was interested to see if I could collect enough data to do this with some common quotes.

The first part of any good machine learning project is to collect data. I read that OpenAI was using Reddit comments to learn language, so I figured I’d use Reddit as a source. Live data seemed to be the most interesting way to pull data out.

The problem is that I’ve vastly overestimated (apparently) how much people quote famous movie quotes. I left the program running overnight and instead of having 100’s of matches, I only had one. So I’ll have to figure out a different way to get data out.

From a technology standpoint, this was a relatively easy problem to solve. I’m using PRAW, the Python Reddit API wrapper to get comments out. I wanted some flexibility for spelling, so I ended up using the Levenshtein algorithim from the python-Levenshtien package. There’s probably a better algo to do this, see this stackoverflow post about that issue, but Levenshtien was good enough for a proof of concept.

The plan was to grab all of the parent comment’s text if there was a match that was good close to any of the my movie quotes. But like I said, there’s not enough data being returned from grabbing live comments to make this feasible.

I ran this in tmux on a digital ocean instance I have running for 12 hours. You could also have this write to a file as well. I turned off the similarity matching and have just been watching the raw text come in while I write this post.

Man, that ‘Remind me!’ bot is popular.

import praw
from Levenshtein import ratio

config = {'client_id': ''
          'client_secret': '',
          'password': '',
          'username': ''}

# quotes is a list of string
quotes = []
USER_AGENT = 'Movie Quote Bot by /u/beohoff'
reddit = praw.Reddit(client_id=config['client_id'],
                     client_secret=config['client_secret'],
                     password=config['password'],
                     username=config['username'],
                     user_agent=USER_AGENT)

for quote in quotes:
    quote_len = len(quote)
    if quote_len > greatest_length:
        greatest_length = quote_len
    if quote_len < least_length:
        least_length = quote_len


for comment in reddit.subreddit('AskReddit+movies+funny+pics').stream.comments():
    text = comment.body.lower()
    len_text = len(text)
    if len_text + 7 > greatest_length or len_text - 4 < least_length:
        continue

    greatest = 0
    best_quote = ''

    for quote in quotes:
        value = ratio(text, quote)
        if value > .75:
            print(text, quote)

Guess it’s back to the drawing board for how to get enough data to create a bot that can respond in context with movie quotes.

Comments !

Subscribe to the mailing list

* indicates required

links

social