This lab is adapted from Phil Chodrow, who adapted it from an assignment by Peter-Michael Osera, Samuel Rebelsky, and Nicole Eikmeier at Grinnell College.

Reminder: you and your partner are a team! You should not move forward to one activity until you are both comfortable with the previous activity.

Introduction and Setup

In this lab activity, we’ll build on our skills with strings and functions, and combine them with new skills related to loops and lists. Along the way, we’ll introduce the map-filter-reduce paradigm for handling sets of data.

Activity 0

Download two two files and open them in THonny.

  • lab.py is where you’ll write your code. Follow this link to download the file.
  • data.py contains the data that you’ll use in the final part of this assignment. Follow this link to download the file.

Please save both of these files in the same folder!

The data variable that we imported with this code is a list. In the ACTIVITY 0 area, write code that:

  • Prints the first 10 elements of the data variable.
  • Prints the length of the data variable, optionally with a nice message.

International Standard Book Numbers

Our task this week is to determine the validity of an International Standard Book Number (ISBN). An ISBN is a unique number that identifies a book. You may have seen these numbers inside or on the back of many of the books you read. ISBNs often come with barcodes that make it easy for librarians to scan them.

An example of an ISBN and barcode on the back of a book. This ISBN has 13 digits. To keep things simple today, we are only going to work with the ISBN-10 format, which has only 10 digits. ISBN-10 was the standard until recently, when it was determined that we needed more numbers for more books!

Anatomy of an ISBN

A 10-digit ISBN number has two parts:

  • 9 identifier digits, which are all digits from 0 to 9.
  • 1 check digit. The purpose of the check digit is to confirm the accuracy of the identifier digits. If the check digit does not match the identifier digits, then this may identify a typo in the 9 identifier digits. The check digit can be either a digit between 0 and 9 or the letter X.

For example, consider the ISBN 1438946685. In this ISBN, the first 9 digits 143894668 contain information about the book, while the last digit 5 is the check digit. Another possible ISBN is 486910583X, since the last digit is allowed to be an X.

Activity 1

Write a function called well_formed() to check whether an input string s has the structure of a well-formed ISBN. well_formed(s) should return True if and only if all of the following are true:

  • s is a string with 10 characters.
  • The first 9 characters of s are all digits between 0 and 9.
  • The last character of s is EITHER a digit between 0 and 9 or the character X.

If any of these conditions are not met, well_formed(s) should return False.

For example:

well_formed("1438946686") # True
well_formed("486910583X") # True
well_formed("1438945")    # False (wrong number of digits)
well_formed("1434X88945") # False (only last digit can be X)
well_formed("143418894H") # False (last digit must be integer or X)

Hint: You can check of a character char is a digit by checking whether char in "0123456789". Similarly, you can check whether char is a valid final digit of an ISBN by checking char in "0123456789X".

The Check Digit

The final digit of the ISBN is the check digit. The check digit is calculated according to a specific algorithm, based on the first 9 digits. Its purpose is to guard against possible typos when recording ISBN numbers. If the check digit doesn’t match the first 9 numbers, then it’s very likely that someone made a typo when recording the ISBN. In this case, we say that the ISBN is corrupted. It is likely necessary to check and correct the ISBN.

Here’s how check digits are calculated. Suppose that the first 9 digits of an ISBN are 143894668.

  1. In left-to-right order, multiply the first digit by 10, the second digit by 9, the third digit by 8, and so on. The rightmost-digit is multiplied by 2. We then add up those terms, obtaining a sum that I’ll call s. In our example, we would do:
     s = 10*1 + 9*4 + 8*3 + 7*8 + 6*9 + 5*4 + 4*6 + 3*6 + 2*8 = 258
    
  2. Next, we take this sum (call it s) and compute s modulo 11. Recall that modulo computes the remainder of the division of two numbers. You can compute s modulo 11 using the syntax s % 11. In our example, 258 % 11 = 5
  3. We then subtract this result from 11 to obtain a result d:
     d = 11 - 5 = 6
    
  4. Almost there!
    • If d == 11, then the check_digit is 0.
    • If d == 10, then the check_digit is X`.
    • Otherwise, the check_digit is d.

In our case, d = 6, and so check_digit has value 6. So, the full ISBN would be 1438946686, with 6 being the check_digit.

Activity 2

Write a function called check_digit() whose input is a string containing the first 9 digits of an ISBN. Its output should be the check digit, computed according to the algorithm above. The output should always be a string.

For example:

check_digit("143894668") # returns "6"
check_digit("563872373") # returns "0"
check_digit("203740588") # returns "X"

Check your function in these three cases, and include the checks in your code.

Hints:

  • Convert an integer to a string like this: str(6).
  • Given a string s = "143894668", the integer value of the ith digit of s is int(s[i]).
  • To do the sum in Step 1, you can use the following pattern:
      total = 0
      for i in range(9):
          total = total + int(first_9_digits[i]) * (Y-i)
    
  • Here, Y is an integer that you will need think about and fill in.

Suppose now that we’re given a string like "203740588X". We’d like to figure out whether this string is a correct ISBN, a corrupted ISBN (check digit doesn’t match), or not an ISBN at all.

Activity 3

Write a function called classify_single_input(). This function should accept a single string as input. Its output is one of three strings: "correct", "corrupted", or "not_ISBN".

An input string s is "not_ISBN" if well_formed(s) is False. If well_formed(s) is True, AND if s[9] agrees with check_digit(s[:9]), then s is "correct". Otherwise, s is "corrupted". Check your function on the following test cases:

classify_single_input("1438946686") # "correct"
classify_single_input("2037405886") # "corrupted"
classify_single_input("ISB NUMBER") # "not_ISBN"

Include these checks in your code.

Processing Batches of Data

One of the main powers of computation is to process batches of data. In this part of the lab activity, we’ll practice using the map-filter-reduce paradigm to process data. Put simply:

  • Map operations take lists of data and produce lists of the same length containing new information.
  • Filter operations take lists of data and produce smaller lists, based on some criterion.
  • Reduce operations summarize lists of data into a single value.

Suppose now that you are librarian and you have received a large batch of ISBN numbers in a data file. However, you’re not sure which ones are valid! Some of the data are actually not ISBNs at all, while others are ISBNs that may be corrupted in some way. Here’s an example of your input data:

# This is just a regular list; I've written it vertically to make it easier to see the individual entries. 
# Python will process it normally!
[
    "1438946686",
    "5638723730",
    "203740588X",
    "2037405886",
    "hello",
    "563ZXY3730",
    "ISB NUMBER",
    .
    .
    .
]

Some of these are valid ISBNs, some are corrupted ISBNs with incorrect check digits, and others are not ISBNs at all.

Map Operations

A map operation consists in applying the same function to each entry of a list, resulting in a new list of the same length. Let’s write a map operation to classify each ISBN in a list.

Activity 4

Write a function called classify_many_inputs(). This function should accept a list of strings as input. Its output is a list of strings the same length. The entries of this list should state whether each entry of the input list is "correct", "corrupted", or "not_ISBN".

For example:

input_nums = ["1438946686", "2037405886", "ISB NUMBER"]
classification = classify_many_inputs(input_nums) 
print(classification)

>>> ["correct", "corrupted", "not_ISBN"]

You can test your function on the data variable that we imported at the top of lab.py. Here’s a sample function call and the expected output.

print(classify_many_inputs(data[:10]))

>>> ['corrupted', 'correct', 'not_ISBN', 'correct', 'corrupted', 
     'not_ISBN', 'correct', 'corrupted', 'correct', 'correct']

Perform this check and include it in your code.

Filter Operations

A filter operation takes a list as input and creates a new list by choosing some items from that list according to a criterion. The easiest way to do this is with an if statement. Filter operations almost always result in lists that are smaller than their original inputs.

Activity 5

Write two functions: filter_correct_ISBNs() and filter_corrupted_ISBNs(). Each of these functions takes the same input as classify_many_inputs(): a list of strings.

  • filter_correct_ISBNs() should return a list containing only correct ISBNs (i.e. inputs s such that classify_single_input(s) is "correct").
  • filter_corrupted_ISBNs() should return a list containing only corrupted ISBNs (i.e. inputs s such that classify_single_input(s) is "corrupted").

Here’s a check you can perform to test filter_correct_ISBNs:

print(filter_correct_ISBNs(["1438946686", "2037405886", "ISB NUMBER"]))

>>> ["1438946686"] # note that this is a list, not just a single string!

Here’s a check you can perform to test filter_corrupted_ISBNs:

print(filter_corrupted_ISBNs(["1438946686", "2037405886", "ISB NUMBER"]))

>>> ["2037405886"] # note that this is a list, not just a single string!

Reduce Operations

A reduce operation takes a list as input and produces a single value, such as a number or string. A simple example of a reduce operation is computing the length of a list.

Activity 6

Write a function called summarize_ISBNs(). This function should accept a list of strings as input. It should print a message describing the number of "correct", "corrupted", and "not_ISBN" entries in the input. Here’s the expected output for the data variable that we imported earlier.

summarize_ISBNs(data)
>>> There are 678 correct ISBNs
>>> There are 146 corrupted ISBNs
>>> There are 176 entries which are not valid ISBNs

Include this check in your code file.

Activity 7

Great work! One member from your team should rename the lab files submission.py and submit it on Gradescope, adding your partner’s name to the submission upload. You do not need to submit the data file.

Grading Note

This lab is auto-graded, but you are not given access to the tests to ensure that you get practice testing your own code. Your grade is based on the correctness of your well_formed(), check_digit(), classify_single_input(), classify_many_inputs(), filter_correct_ISBNs(), filter_corrupted_ISBNs(), and summarize_ISBNs() functions. More than 50% of the points are given if your code produces the expected output that is specified in the examples in this lab assignment; the rest of the points will be given based on un-seen test cases, which your code should pass if implemented correctly.

There are a number of visible test cases that are worth 0 points. Please make sure that you are passing all of these tests! They check that your output is the correct type. If your output is not the correct type, the other tests will not run, and your score will therefore be very low. Among other things, these tests make sure that you are using return when you should be and print when you should be.

Activity 8

For participation points, please fill out this survey about your experience working with lab partners in this class by October 3rd at 11:59PM (the same due date as this lab). This will help me to match you with your next partner.