Homework 1: Regular Expressions and Chatbot

Due Thursday February 22, 11:59PM EST

Gradescope Starter Code + Data Late Day Request Form (accepted up to 24 hours before deadline)
For this homework assignment, you will practice using regular expressions and write a simple chatbot. The assignment is split into two main sections: regex practice and MiddChat.
Learning Goals:
Once you complete this assignment, you should:
  • Be able to write simple regular expressions to match a variety of patterns
  • Be comfortable using regular expressions to write a very simple chatbot
  • Reflect on properties of your chatbot that could be improved (perhaps with more complex algorithms)
This assignment is connected to the following overall learning goals of the course:
  • Be familiar with NLP methods in three key areas: text classification, text generation, and language understanding
Submit these files: golf.py, chatbot.py, report.md
Leaderboard: There is a leaderboard for this assignment - you are scored based on the length of your regular expressions. However, this is only for fun, not for your grade (I expect a many-way tie at the top could be possible). Because it is only for fun (and there is no hidden test set), there is no leaderboard submission limit.
Credits: This assignment is adapted from the Discover NLP course materials authored by Julie Medero, Xanda Schofield, and Richard Wicentowski (Lab 1).

RegEx Practice

Implementation Details

You should begin the assignment by practicing regular expressions by playing RegEx Golf.

You must complete the following (autograded):

  • Warmup
  • Anchors
  • It never ends
  • Ranges
  • Backrefs

Resources

Starter Code

The starter code for this part of the assignment (golf.py) includes space for you to write in your regular expressions. You can copy them directly from the RegEx Golf website to the corresponding strings in the Python file.

Allowed External Resources

Since these puzzles are online, there are (undoubtedly) solutions posted somewhere. I trust you not to look for those solutions, but to submit the best solutions you can come up with on your own. You may use the Python re library documentation, as well as the textbook.

Deliverables

In the file golf.py, fill in the empty strings with your best (that is, shortest) regular expression for each of the puzzles that you solved. A satisfactory solution for each problem requires that your regular expression didn’t avoid the challenge of writing rules that generalize properties of the words (e.g., r"^(word1|word2|word3...)$" would be ridiculously long and a little silly). That said, if your length seems multiple times longer than what’s on the leaderboard, challenge yourself to get one that’s shorter!


MiddChat

For this part of the assignment, you will make an ELIZA-like chatbot that you can talk to in a web interface like ChatGPT (but perhaps a bit less smart).

Implementation Details

You must modify the make_reply method in chatbot.py to respond to messages from users, the examples method to give a few examples of how to interact with your chatbot, and the get_name method to return the name of your chatbot. Your make_reply method must make use of regular expressions using the Python re library. In particular, you should use each of the following at least once:

  • Regular expression groups ((...)). Note: these should be used to capture content that your chatbot processes later; don’t just put parentheses around a regular expression and then ignore the group.
  • Character classes ([...], \d, \D, \w, \W, \s, \S, \b, etc.)
  • Regular expression quantifiers (+, *, ? and/or {...})

You can choose the domain and personality of your bot. It can be a Rogerian psychologist like ELIZA, or you can choose a different domain. If you choose a different domain, make sure it’s one where you can reasonably expect repetition, so regular expressions will be effective.

Feel free to add additional features to your bot, such adding instance variables to store information from previous messages.

Resources

Starter Code

The starter code for this part of the assignment (chatbot.py) includes a class with the three methods that you must implement. You may add additional methods if you want (including an __init__ method), but you must keep the method names and arguments as-is. The starter code also includes a basic main method that will allow you to interact with your chatbot.

Finally, the starter code includes a report.md file to writeup your report (see the Report section below).

MiddChat on HuggingFace Spaces

When you submit your code on gradescope, your bot will automatically be deployed to the MiddChat app if you pass the basic automated tests. You’ll be able to chat with your chatbot as well as your classmate’s chatbots (you’ll need to do this for your report).

Deliverables

Code

On Gradescope, you should submit your completed chatbot.py file with your implementation.

Report

You should also submit a .md report that describes your Bot. That report should have the following sections:

  • Regex Description: Clearly describe how you make use of each of the required regular expression features in your make_message function.
  • Analysis: Describe at least one interaction with your bot that worked well, and at least one interaction with your bot that works poorly (or not as one might expect). You should include a screenshot or transcript of your conversations in your writeup. This should include explanation not just of what happened, but why it happened, and why that was good or bad.
  • Another Chatbot: Describe at least one interaction that you had with a classmate’s chatbot. Did they implement any features that you would have liked to implement in your chatbot if you had extra time? Which features of regular expressions might they have used to achieve that functionality (regular expression groups, character classes, quantifiers, anchors, etc.)?

The starter code has a Markdown file that you can use to write your analysis. If you don’t have experience with markdown, this is a good resource to get started. You can use this site to render/preview your markdown file (you can also render it in vscode).