This lab is adapted from Phil Chodrow, who adapted it from an activity created by Evan Peck (Bucknell University).

Activity 0

Lab Partnership Planning

Before starting today’s lab assignment, exchange contact information with your partner and find at least three hours in your schedule during which you can meet with your partner to finish the lab. If you are genuinely unable to find a time, please come speak with me!

Reminder: you and your partner are a team! You should not move forward to one activity until you are both comfortable with the previous activity.

Introduction

In a recent lab, we studied the potential role of algorithms in prioritizing the needs of some people over others. Here, we’ll experiment with this same idea in another, very real context that may impact your future prospects for jobs and other opportunities.

In this lab, you’ll practice:

  • Reading Files
  • Dictionary Operations
  • For-loops
  • Processing large quantities of data

Whose Application Gets Read?

Imagine that you are consulting for Prestigious University (PU), a very famous university that receives tens of thousands of applications every year. Due to recent budget cuts, there aren’t enough staff to read every single application. PU has hired you to design a filtering algorithm for them that is designed to find a subset of applicants whose applications should be read more carefully. PU’s theory is that if an applicant has, say, consistently very poor grades, then their application may not even be worth reading. So, another way of saying things is that your algorithm should screen out those applicants.1

Applicant Grade Data

Data on the applicants’ grades is supplied as a file or as a dictionary.2 The list contains:

  • Overall GPA in the applicant’s first, second, third, and fourth years of high school.
  • GPA in each of the following groups of courses in the applicant’s third and fourth years:
    • Literature
    • Science
    • Math
    • Social studies
    • Electives

A complete set of data in an application file might look like this:

FIRST_YEAR     3.4
SECOND_YEAR    3.3
THIRD_YEAR     2.7
FOURTH_YEAR    3.6
LITERATURE     4.1
SCIENCE        3.5
MATH           2.8
SOCIAL_STUDIES 3.2
ELECTIVES      3.0

Note that each row is a pair of labels and GPAs.

As a dictionary, the data in the file would be represented like this:3

EXAMPLE = {
    "FIRST_YEAR": 3.4, 
    "SECOND_YEAR": 3.3,
    "THIRD_YEAR": 2.7, 
    "FOURTH_YEAR": 3.6, 
    "LITERATURE": 4.1, 
    "SCIENCE": 3.5, 
    "MATH": 2.8, 
    "SOCIAL_STUDIES": 3.2, 
    "ELECTIVES": 3.0
}

For example, we can access the applicant’s third-year GPA like this:

EXAMPLE["THIRD_YEAR"]

Activity 1

Click on lab7.zip to download the files for this lab as a zip file. Extract the files from the zipped folder and open the .py files in Thonny. Open the .txt files in textedit (mac) or notepad (windows).

Activity 2

You’ll start by writing code that loads data from an application file and returns a dictionary representing the applicant’s GPAs for each subject and year. The formatting of the file will generally match what you see above, but it might differ slightly. You should not assume that the rows are in any particular order, or that there are any particular number of spaces between the label and the GPA.

Your function should be called load_application should take one argument (the filename) and return a dictionary where the keys represent the subject/class year and the values represent the GPA. The keys should be strings and the values should be floats. Your function should work for all three example files that you downloaded, and there should be no spaces in the keys.

  • HINT: Slide 17 from Thursday has a program that does something very similar, and may be helpful to look at!
  • HINT: You may find the string.split() method to be helpful for this activity!

This activity is autograded! Submit on gradescope and make sure you are passing the Activity 2 tests before moving on.

Analyze Applicants

We’re now going to complete a series of functions that are going to analyze an applicant. Each of these functions returns a Boolean value (True or False) which determine whether the application is recommended to be read by a human reader for further review.

Your long-term goal is to implement the following functions, and check that they work. As examples to help you get started, I’ve implemented the first two functions for you. We’re not going to implement all of them right now, though.

  • analyze_applicant_1(): recommends applicants that have an average of their five subject areas above 3.0.
  • analyze_applicant_2(): recommends applicants that have no GPA score below 2.5 in any of their five subject areas.
  • analyze_applicant_3(): recommends applicants that achieved a GPA above 3.2 in at least three of their four years of secondary school.
  • analyze_applicant_4(): recommends applicants that achieved a higher average GPA in their third and fourth years than in the first and second years.

Activity 3

Copy and paste the following code into lab.py, under the ACTIVITY 3 area of the file. Then, run the file. Talk through the code with your partner to ensure that you understand how it works.

def analyze_applicant_1(data):
    """
    recommends an applicant if the average of their subject scores is larger than 3.0
    """

    # get the subject scores
    subject_scores = [data["LITERATURE"], data["SCIENCE"], data["MATH"], 
                      data["SOCIAL_STUDIES"], data["ELECTIVES"]]

    # compute the average (mean) of the subject scores
    # average is sum of all scores divided by number of scores
    mean = sum(subject_scores) / len(subject_scores)

    # return whether the mean is > 3.0
    return mean > 3.0

print("\nExamples for analyze_applicant_1()")
print("----------------------------------")    
print(analyze_applicant_1(load_application("EXAMPLE_2.txt")))
print(analyze_applicant_1(load_application("EXAMPLE_3.txt")))

This activity is autograded (you will pass if the function has been copied)! Submit on gradescope and make sure you are passing the Activity 3 tests before moving on.

Activity 4

Now, do the same thing for analyze_applicant_2().

def analyze_applicant_2(data):
    """
    recommends applicants that have no GPA score below 2.5 in any of their five subject areas
    """

    # get the subject scores
    subject_scores = [data["LITERATURE"], data["SCIENCE"], data["MATH"], 
                      data["SOCIAL_STUDIES"], data["ELECTIVES"]]

    # there are no scores below 2.5 if the lowest score is greater than or equal to 2.5
    return min(subject_scores) >= 2.5

print("\nExamples for analyze_applicant_2()")
print("----------------------------------")    
print(analyze_applicant_2(load_application("EXAMPLE_2.txt")))
print(analyze_applicant_2(load_application("EXAMPLE_3.txt")))

Run the file. Do analyze_applicant_1() and analyze_applicant_2() agree or disagree on the two examples? (just discuss with your partner)

This activity is autograded (you will pass if the function has been copied)! Submit on gradescope and make sure you are passing the Activity 4 tests before moving on.

Create Your Own Recommender

Later in the assignment, we’ll implement analyze_applicant_3() and analyze_applicant_4(). For now, though, we’re going to focus on your creative ideas for making a recommender filter.

Activity 5

In English, not code, share a rule for recommending applicants on the ed message board post “Lab 7 Analyze Applicant Algorithms”. You are welcome to use ideas from the analyze_applicant_*() functions described above, but please make sure to include your own original spin. What do you think would be the most fair way to recommend applicants for further review?

Suggestion: use a rule that includes information from both the year scores and the subject-area scores.

One person from each group must share on the message board. You do not need to write anything in the Activity 5 portion of the lab file.

Overly simplistic rules that only use one of the GPAs from the dictionary (for this and the next activity) will not receive full credit.

Activity 6

Pick another group’s rule from the message board, and copy it down in the Activity 6 area. The rule must differ from yours!

Implement!!

Activity 7

Implement four functions:

  1. analyze_applicant_3() from above.
    • HINT: there should be some similarities with the count_below function that you wrote for HW 4.
  2. analyze_applicant_4() from above.
  3. our_analyze_applicant(), your idea from Activity 5.
  4. their_analyze_applicant(), the idea of the other group from Activity 6.

Remember:

  • analyze_applicant_3(): recommends applicants that achieved a GPA above 3.2 in at least three of their four years of secondary school.
  • analyze_applicant_4(): recommends applicants that achieved a higher average GPA in their third and fourth years than in the first and second years.

Write your function definitions in the Activity 7 area. You don’t need to demonstrate them on any examples, although you may wish to do so in order to check for errors.

This activity is autograded! Submit on gradescope and make sure you are passing the Activity 7 tests before moving on.

Test At Scale

If we only had to look at one or two applicants, we wouldn’t need a program for this at all – we’d just do it by hand. But what if we have 100,000 applicants?

I’ve included some functions (which you imported from applications.py) to help you generate and view large numbers of random example apps:

  1. create_app() creates a single random application.
  2. create_apps(num_apps) will create a list containing num_apps applications. NOTE: this is a list of dictionaries.
  3. print_apps(apps, first_n) will print (not return) the first first_n apps in a list.

So, for example:

apps = create_apps(1000)  # create 1000 apps
print_apps(apps, 5)       # print the first 5

Let’s first write a function that will allow us to experiment with the behavior of one of your filtering functions. This is a complicated function – take your time and build it up slowly!

Activity 8

Implement a function called analyze_many_apps() that will show an analysis of a large set of applications. This function should print several messages, but does not need to return anything. Here’s an example of what it will print. Because you will be using random applications, your results might look slightly different from mine.

----SUCCESSFUL APPS----
FIRST_YEAR 3.8, SECOND_YEAR 3.9, THIRD_YEAR 3.0, FOURTH_YEAR 3.7, LITERATURE 2.6, SCIENCE 2.3, MATH 3.4, SOCIAL_STUDIES 3.4, ELECTIVES 3.3
FIRST_YEAR 2.5, SECOND_YEAR 3.1, THIRD_YEAR 2.6, FOURTH_YEAR 3.6, LITERATURE 2.3, SCIENCE 2.6, MATH 3.0, SOCIAL_STUDIES 3.3, ELECTIVES 3.8
FIRST_YEAR 3.4, SECOND_YEAR 3.4, THIRD_YEAR 3.7, FOURTH_YEAR 3.2, LITERATURE 3.8, SCIENCE 2.3, MATH 3.2, SOCIAL_STUDIES 3.2, ELECTIVES 2.9
FIRST_YEAR 2.8, SECOND_YEAR 2.7, THIRD_YEAR 2.3, FOURTH_YEAR 2.9, LITERATURE 3.6, SCIENCE 2.8, MATH 3.4, SOCIAL_STUDIES 3.4, ELECTIVES 2.1
FIRST_YEAR 3.4, SECOND_YEAR 2.9, THIRD_YEAR 3.0, FOURTH_YEAR 3.1, LITERATURE 3.3, SCIENCE 2.8, MATH 3.4, SOCIAL_STUDIES 3.7, ELECTIVES 3.6
---UNSUCCESSFUL APPS---
FIRST_YEAR 2.6, SECOND_YEAR 3.8, THIRD_YEAR 2.5, FOURTH_YEAR 2.5, LITERATURE 2.9, SCIENCE 2.2, MATH 3.2, SOCIAL_STUDIES 3.2, ELECTIVES 2.7
FIRST_YEAR 3.3, SECOND_YEAR 3.0, THIRD_YEAR 2.5, FOURTH_YEAR 2.7, LITERATURE 2.2, SCIENCE 2.9, MATH 3.4, SOCIAL_STUDIES 2.3, ELECTIVES 2.5
FIRST_YEAR 2.2, SECOND_YEAR 2.8, THIRD_YEAR 3.1, FOURTH_YEAR 3.2, LITERATURE 2.8, SCIENCE 3.9, MATH 2.4, SOCIAL_STUDIES 2.1, ELECTIVES 2.2
FIRST_YEAR 3.0, SECOND_YEAR 2.7, THIRD_YEAR 2.6, FOURTH_YEAR 2.8, LITERATURE 2.0, SCIENCE 3.2, MATH 2.7, SOCIAL_STUDIES 2.1, ELECTIVES 2.5
FIRST_YEAR 3.9, SECOND_YEAR 3.0, THIRD_YEAR 3.1, FOURTH_YEAR 2.9, LITERATURE 3.0, SCIENCE 2.3, MATH 3.3, SOCIAL_STUDIES 3.4, ELECTIVES 2.9
-----------------------
Acceptance rate = 0.484

Your function should take a list of applications, which you can assume was generated by create_apps(). It should:

  1. Inside the function, create a list called recommended of apps that were recommended by analyze_applicant_1(). Create another list called not_recommended of apps that were NOT recommended by analyze_applicant_1().
    • HINT: Loop through the elements of apps. Use an if-statement to check whether an app was accepted. If it was accepted, append() it to the recommended list. Otherwise, append() it to the not_recommended list.
  2. Print ---SUCCESSFUL APPS---
  3. The first 5 recommended apps to see a sample of recommendations.
    • HINT: If you’ve constructed your lists correctly, all you need to do is call print_apps(recommended, 5).
  4. Print ---UNSUCCESSFUL APPS---
  5. The first 5 not-recommended apps to see a sample of apps that weren’t recommended. Same hint as Step 3.
  6. Print ---------------------
  7. Compute the recommendation rate. This is the percentage of applications that were recommended.
    • HINT: If you constructed your lists correctly, the percentage is len(recommended)/len(apps).
  8. Print the acceptance rate.

Once you’ve implemented this function, run it like this:

apps = create_apps(100000) # 100000 apps
analyze_many_apps(apps)

Place both your function definition and an example of calling the function in the ACTIVITY 8 area of lab.py.

This activity is autograded! Submit on gradescope and make sure you are passing the Activity 8 tests before moving on.

Activity 9

Now we can use the function you just wrote to help us form an initial judgment about whether analyze_applicant_1() is a good method for filtering applicants.

Record the approximate acceptance rate from your analysis. Then, write down whether you think analyze_applicant_1() is a fair way to filter applicants, and explain why. Do this in the ANALYSIS COMPARISON area of lab.py.

You may want to run analyze_many_apps() several times in order to get a feel for things when composing your answer to this question.

Analysis and Reflection

Now that we are able to analyze a single algorithm for filtering applicants, we can use it to analyze the other analyze_applicant() functions.

Activity 10

Modify your code from Activity 8 so that instead of analyze_applicant_1(), it uses analyze_applicant_2(), analyze_applicant_3(), analyze_applicant_4(), our_analyze_applicant(), and their_analyze_applicant(). Repeat Activity 9 for each of these different analysis methods. For each of these, you should record the acceptance rate, whether or not you think the filter method is fair, and why. Make sure you’ve written at least one full sentence about why each algorithm is fair or unfair.

Activity 11

We’ve done a lot of analysis now. Which of these algorithms do you think is the best? Keep in mind that we care both about who gets recommended by the algorithm, but also how many. Remember, recommending more applicants dramatically increases the human labor required for PU to read applications manually. Please write at least 3 full sentences.

Submission

One member of your group should submit the file lab.py in the Lab 7 assignment on Gradescope. Make sure to add your partner to the submission! You don’t need to submit the applications.py file or any of the .txt files.

Partner Feedback

Fill out the partner feedback form by the lab due date for participation points!

  1. I should note that Middlebury Admissions states that they read every single application, and I have no reason to doubt this. I use the example of college admissions only because it’s one that’s perhaps fresh in many of your memories. However, this is a very real scenario in job applications, and it is likely to have an impact on your own hiring someday. 

  2. Of course, the fourth year data is partial. 

  3. The spacing here is purely a stylistic choice to make it more readable. If you print this dictionary in Python, everything would be on one line!