CS 457 Final Project

Deliverables and Deadlines

Each deliverable links to the expectations for that deliverable.

  • Proposal Due April 11, 11:59PM EST
  • Project Check-In 1 Due April 25, 11:59PM EST
  • Project Check-In 2 Due May 2, 11:59PM EST
  • Presentation Week 12 during class
  • Draft of Report and Code Due May 9, 11:59PM EST
    • Submitting a draft is not required, but anything that is not submitted by this date will receive a grade of S or NC with no opportunity for revision.
  • Final Report and Code Due May 21, 11:59PM EST

Scope

To synthesize your knowledge, you will complete a NLP research project during the second half of the semester. Your project is expected to have some novel component, but your contribution can be small relative to what would be expected to publish a research paper.

Your project should not exactly replicate the model in a single research paper using the same data set and evaluation method. A project that takes the IMDB dataset, trains a BERT model to predict sentiment, and evaluates using F1 score is not sufficient. I would ask you to build on this idea by:

  • Evaluating multiple approaches
  • Evaluating on additional datasets and carefully analyzing the types of mistakes that models make
  • Designing your own evaluation method that goes beyond correctness on a standard test set (think of the CheckList paper that was assigned as supplemental reading early in the semester)

Most students will write up a final report that contains the following sections:

  • Literature Review
  • Data
  • Methods
  • Results
  • Ethical Considerations

If you have a project idea that does not fit these guidelines, come speak with me before the deadline for the proposal.

Groups

You will work in groups of 2-3 for the project. Expectations are similar for pairs and groups of 3.

Students are expected to contribute to all components of the project. Splitting the work so that one person works on the report and presenatation and one person works on the code is unacceptable.

Proposal

The project proposal is due on April 11th. It does not “count” towards your grade, but your proposal must be accepted in order to continue with your project.

Your proposal should be submitted as a PDF or markdown file, and should include:

  • The names of your group members
  • The problem that you intend to address, answering the following three questions:
    • How this problem is related to the class?
    • What is interesting about this problem?
    • What is the novel component of your project?
  • The data that you will use for your project
    • Be especially careful when choosing data; make sure that if you are planning to use data that is mentioned in a paper, you are able to download it freely without having to reach out to the authors or have access to an API
  • A high level description of how you will evaluate your results
  • A tentative week-by-week timeline describing how you intend to complete your project before May 21st
    • This should include your intended deliverable for project check-in 2
      • For projects that compare a new model to a baseline, this should include an implementation of your baseline model and drafts of the data and methods sections of your report. The draft sections will touch on the soundness and reproducibility aspects of the project.
      • For other projects, you should propose an equivalent deliverable, which will include both code and drafts of section(s) of your report.

Grading

Your project will be graded S/R/NC based on six components. Some of these components should be demonstrated in multiple deliverables.

Literature Review

  • Associated Deliverables: your first project check-in and your final report
  • In your literature review, you will discuss academic papers that are relevant for your project.
  • There might be multiple reasons why a paper is relevant, including the data, method, and evaluation metric. Your literature review should synthesize the components of the paper that are relevant to your work.
  • Looking for papers:
    • A good place to start looking for related work is the ACL anthology. Google scholar and semantic scholar are also good resources, but are broader in scope and might include unpublished work which might not be high quality.
    • Once you’ve found a relevant paper, you will want to look at papers that it cites (which is easy to do by scrolling down to the references)
    • You’ll also want to look at papers that cite the paper you’re reading. Google scholar has a “cited by” link that allows you to do just that
    • Keep notes as you read papers. You’ll probably read a lot of papers at the level of the “first pass” that we talked about early in the semseter in order to determine whether or not they are relevant.
  • Requirements:
    • Groups of 2: at least 300 words and at least 6 citations
    • Groups of 3: at least 450 words and at least 9 citations

Ethical Considerations

  • Associated Deliverables: your draft report (optional) and your final report
  • In your final report, you should discuss ethical considerations associated with the task/system itself and/or the data that you use. The following bullet points discuss some things that you might consider, but they are not exhaustive:
    • Task/System
      • Potential use cases of your system for nefarious purposes
      • Interpretability of the system (can you tell why it makes the predictions it does? does it matter?)
      • Algorithmic fairness
      • Resources used, in consideration of the green AI movement
    • Data
      • Informed consent (or lack thereof)
      • Copyright
      • Annotation quality
      • Annotator pay
  • It is possible that you believe there are no ethical considerations associated with your work. If you are in this situation, you may write 200 words defending that position.

Reproducibility

  • Associated Deliverables: your second project check-in and your final report/code
  • A key component of any research project is whether or not others are able to reproduce your work. A primary goal of your report is to document what you did so that others are able to reproduce it.
  • In your final report, you should include:
    • Information about your data
      • Where it came from
      • How many examples there are in the dataset
      • The distribution of labels
      • How you split your data into train/test/development sets
    • Information about your models
      • How features are extracted
      • The types of models used and the hyperparameters
  • In your code, you should include thorough documentation. This includes at a bare minimum a README.md file that describes how to run your code to reproduce the main results in your paper. For instance, if you report 56% accuracy from your baseline model and 80% accuracy from your new and improved model, you should describe the series of scripts or commands that you would run to land at both of those results.
  • As a santity check, imagine that you are a CS 457 student in 2025 and you want to (a) replicate this project and (b) build on it, and they only have access to either your report or your code. Would they reasonably be able to get the same results that you do (perhaps with some small fluctuation due to factors like different random seeds)?

Soundness

  • Associated Deliverables: your second project check-in and your final report/code
  • Your project should be technically sound, have a meaningful comparison to a baseline method, and use sound experimental design (e.g., do not train on your test data!)
  • A project that focuses on modeling should provide:
    • At least one simpler baseline model that your model is compared to
    • One or more reasonable evaluation metrics to compare your model to other approaches
      • This might be a standard metric like F1 score or BLEU, or you might need to perform some type of human evaluation (for example for a chatbot)
  • A project that does not focus on modeling should provide additional justification (in the form of citations) for the approach that is used

Written Report

  • Associated Deliverables: your draft report (optional) and your final report.
  • Many sections of your written report will be graded with respect to other components: literature review, ethical considerations, reproducibility, and soundness
  • I expect most student’s reports to contain the following sections. If your work does not fit with this structure, you are expected to discuss it with me before submitting your proposal.
    • Literature Review
    • Data
    • Methods
    • Results
    • Ethical Considerations
  • Additional Requirements
    • Formatting: your report should be written in LaTeX using the ACL conference template. If you haven’t used LaTeX before, don’t leave this until the last minute! Please see the getting started with overleaf page for more information and a link to the template.
    • Figures: your report should contain at least one figure. This might be a diagram of your system, a learning curve from training your model, a screenshot of an interaction with your system, or a plot of your results.
    • Tables: your report should contain at least one table. This might be a summary of statistics about your data, a result table, or a table that enumerates and categorizes errors that the system makes.
    • Discussion: your report should contain a brief discussion of your results (this may be part of the results section). You should discuss whether or not your results confirmed your initial hypothesis, how the results fit in to the existing literature, and how you might build on these results.

In-Class Presentation

  • Associated Deliverables: presentation during the last week of class
    • I will randomly assign groups to present on Tuesday or Thursday. I encourage you to attend the presentations of the other section!
  • There are two choices for this - choose one!
  • Option 1: Poster
    • See the Project Posters resource for details on how to get started and the expected format.
    • Your poster will be a condensed version of your final report, with more visuals and less text.
    • I recommend that you create your poster using google slides or powerpoint.
    • You can print your poster at the Armstrong library.
  • Option 2: Live Demo
    • See the Project Demos with Gradio resource for details on how to get started.
    • You have the option to create a demo of your system, which will be hosted as a Hugging Face space within the CS 457 Hugging Face organization.
    • I strongly recommend using a gradio interface for your demo, as you won’t get any extra credit for custom web design. It should be fairly simple to plug in your existing project code and saved model to gradio.
    • Your demo should feature the main component of your project. For instance, if you built a chatbot, you should be able to chat with the chatbot. If you built a text classifier, users should be able to enter some text and get a label.
    • In addition to the code for the demo, you should provide a README file that describes how to interact with your demo and any limitations it may have. For instance, if it only works with single words written using ASCII characters, state that. Your demo should not crash when it is tested on reasonable input.
  • For either option, you should prepare a 2-4 minute verbal overview of your project