Homework 4: Statistics and Testing

In this homework assignment, you’ll get additional practice working with lists and loops to compute statistics from a dataset of temperature data from Middlebury.1 Additionally, you will practice writing tests for your code. Below is a suggested timeline to complete the assignment, which leaves plenty of time before the due date for debugging if necessary:

DateTasks to Complete
10/21Read through the assignment and complete problem 1, passing problem 1 tests on the autograder
10/22Complete problems 2 and 3, passing problem 2/3 tests on the autograder
10/23Complete problem 4, passing problem 4 tests on the autograder
10/24Complete problem 5, passing problem 5 tests on the autograder
10/25Complete problem 6, passing problem 6 tests on the autograder
10/27DEADLINE! Finsh debugging and submit your final code. Attend drop-in hours if you’ve had problems with any of the previous parts. Make sure to review your code for code quality.

These are soft deadlines that are not part of your grade, but I encourage you to stick to this timeline if you’ve struggled to complete the homework assignments on time. Being ahead of this timeline is great!

Downloading starter files

You’ll need to download three files for this assignment and place them in the same directory.

  • hw4_code.py: This is where you will write most of the code for this assignment. You need to edit this for parts 2-6.
  • hw4_test.py: This is where you will write tests for the functions that you will implement. You only need to edit this for part 1.
  • data.py: This includes data that you will use when you call your temperature_summary function. You do not need to edit this file!

Submitting

I strongly recommend that you submit to gradescope at the end of each part to get feedback. You should submit hw4_code.py and hw4_test.py to the autograder. Do not submit data.py.

Part 1: Writing Tests

The first step for this homework is to read through all of the functions and make sure that you understand them well enough to write test cases. You are required to write test cases for the four functions that make up parts 2-5 of this assignment in hw4_test.py. You will use pytest - if you don’t have pytest on your computer, see the guidelines for how to download it in lab 5. Your tests should pass when the function is implemented correctly, and should generally fail if there are logic errors in the function.2 Your test cases must be distinct from the examples given in this assignment. Finally, you must follow these guidelines in your test cases:

  • test_compute_mean: the input list should have a length of at least 2
  • test_compute_median_even: the input list should have a length of at least 4, and it should have an even number of items in it
  • test_compute_median_odd: the input list should have a length of at least 3, and it should have an odd number of items in it
  • test_matched_min: the input lists should have a length of at least 2
  • test_count_below: the input list should have a length of at least 2, and there should be at least 1 item in the list below value and one item in the list above value (e.g. count_below should not return 0 or len(numbers))

Your test cases will all fail because your code isn’t yet implemented. However, you should submit to the autograder to get feedback on your test cases before moving on!

Unfortunately, pytest does not work with the Thonny debugger. If you want to use the debugger when you are developing your code for parts 2-6, you must call your function in hw4_code.py.

Part 2: compute_mean

Write a function compute_mean in hw4_code.py that computes the mean of numbers in a list. Your function must work with lists that have >= 1 numbers in them. Your function should always return a float.

Here are some example inputs and outputs:

Function CallReturnsPrints
compute_mean([1])1.0 
compute_mean([-5, -3, 8, 3])0.75 

If you need some guidance when implementing this function, look over the examples we have gone through in class. Remember that the mean of a list of numbers is simply the sum of that list divided by the length of the list.

You may not import any functions that compute the mean of a list for this problem. You may use the built-in sum() function. The following code block shows the syntax for the sum() function:

numbers = [1, 2, 3]
numbers_sum = sum(numbers)
print(numbers_sum) # prints 6

If you do not use the sum() function, I recommend using a for loop to solve this problem.

Run your own test; if you pass that test, you should submit to the autograder to check that your function is working properly before moving on to the next part.

Part 3: compute_median

Write a function compute_median in hw4_code.py that computes the median of numbers in a list. Your function must work with lists that have >= 1 numbers in them. Your function should always return a float.

Here are some example inputs and ouputs:

Function CallReturnsPrints
compute_median([1, 2, 3])2.0 
compute_median([-5, -3, 8, 3])0.0 

You may not import any functions that compute the median of a list for this problem. I recommend sorting the list using the sorted function to solve this problem, then using list indexing to compute the median. Remember that when a dataset has an even number of items in it, the median is the average of the two numbers in the middle. You should not need a loop to solve this problem.

The following code block shows the syntax for the sorted() function:

numbers = [3, 1, 2]
numbers_sorted = sorted(numbers)
print(numbers_sorted) # prints [1, 2, 3]

Do not use .sort() to complete this problem. We will discuss this more in future classes, but .sort() modifies the list, which will have repercussions for part 6.

Run your own tests; if you pass those tests, you should submit to the autograder to check that your function is working properly before moving on to the next part.

Part 4: matched_min

Write a function matched_min in hw4_code.py that takes two lists as input, and returns the value from list1 at the same index as the index of the lowest value from list2. If there are two or more minimum values, you must return the item from list1 at the index corresponding to the first such value.

Here are some example inputs and outputs:

Function CallReturnsPrints
matched_min(["apple", "orange", "banana"], [3, 2, 4])"orange" 
matched_min(["pizza", "quesadilla", "fries"], [3, 10, 3])"pizza" 

Run your own test; if you pass that test, you should submit to the autograder to check that your function is working properly before moving on to the next part.

Part 5: count_below

Write a function count_below that takes a list of numbers and a numerical value as input, and returns the count of numbers in numbers that are less than value. Your function should return an int.

Here are some example inputs and outputs:

Function CallReturnsPrints
count_below([2, 1, 3], 2)1 
count_below([2, 1, 3], 2.5)2 

Run your own test; if you pass that test, you should submit to the autograder to check that your function is working properly before moving on to the next part.

Part 6: temperature_summary

Write a function temperature_summary in hw4_code.py that calls the four functions that you wrote for parts 2-5 to provide a nice summary of the temperatures in Middlebury in 2022. It should take three lists as input: year_temps, months, and month_avg_temps. Your summary should look like this:

In Middlebury, the mean temperature was 46.667123287671224 degrees F and the median temperature was 49.1 degrees F. There were 82 days when the temperature was below freezing (32 degrees F). The coldest month was January.

There are three variables that you will import from data.py to compute your summary (the import line is in the file that you downloaded):

VariableExplanationUse with functions
year_tempsList of temperatures at 12AM in Middlebury from each day in 2022compute_mean, compute_median, count_below
monthsList of months in the year as stringsmatched_min
month_avg_tempsMean of temperatures for each month (index 0 is January, index 1 is February, etc. - lines up with months)matched_min

Your summary should be printed and your function should return None. Test your function by calling it in hw4_code.py and checking that it matches the summary above. The autograder will test your function on data from a different year. Submit to the autograder to check that your function is working properly.3

Finishing up

That’s it! Make sure that your code follows the code quality guidelines for full credit.

  1. This data comes from here. Your data.py uses weather data from 2022, while the autograder uses weather data from 2020. 

  2. As we have discussed, it’s sometimes possible for your function to return the correct value with some inputs, but not all inputs. For example, a median function that always returns the item at index 0 would be correct for the list [0.0], but not for other lists. 

  3. The autograder will use a different dataset than you use, but as long as your functions are all implemented correctly, you should get full credit!