Assignment 5: Basketball Hashing (60 pts)

Chris Tralie

Overview / Logistics / OOP Design

Learning Objectives

  • Use object references to perform efficient lookups yielding constant time operations in a collection
  • Leverage polymorphism to quickly swap out different implementations of abstract data types (LinkedMap and HashMap)
  • Create tests to verify the correctness of a code implementation and to empirically explore its efficiency.

Purpose / Getting Started

The purpose of this assignment is to give you practice designing collections in C++, and to explore how the right implementation of an abstract data type can lead to much faster queries in an unordered collection. In particular, you will create a hash table implementation for the map data structure, and you will compare this to the linked map data structure experimentally to verify that it is superior.

Click here to review notes on hash tables. Then, you can obtain the code for this assignment by typing

git clone https://github.com/ursinus-cs174-f2022/HW5_BasketballHashing.git

You will be editing the files Makefile, person.cpp, player.cpp playerlookup.cpp, hashable.cpp

You will be creating the files hashmap.h, hashmap.cpp, stringtest.cpp, and mapcheck.cpp

Overview of Code Design

In this assignment, you will implement your own HashMap (hashmap.cpp, hashmap.h) data structure and compare it to a LinkedMap (linkedmap.cpp, linkedmap.h) that has been provided for you. The follow design choices have been made for you:

  • Both LinkedMap and HashMap will inherit from an abstract class known as Map (map.h), which has a number of methods you will need to implement for your hash map.
  • The keys of a map are specified as pointers to Hashable (hashable.h, hashable.cpp) objects, which implement a method getHash() to compute a hash code on that object
  • The values of this map are specified as pointers to Cloneable (cloneable.h) objects, which are objects for which a clone() method is provided to make deep copies.
  • Since Hashable objects are also Cloneable, deep copies can be made of both the key and the value, and runtime polymorphism is leveraged so that calling the clone() method on a Cloneable* or Hashable* pointer makes a clone of the correct type. It is up to the implemented map classes to clean up dynamic memory from cloned objects.

Overview of Data

To test LinkedMap and HashMap, you will load in information about NBA players born between 1913-1997 (data obtained at this link), including

  • Name
  • The college/school they attended
  • Their height in centimeters
  • Their weight in kilograms
  • Their birth year
Each player is stored as these successive lines one after the other in the file players.txt. For example, the first 10 lines of the file store information about Curly Armstrong and Cliff Baker as follows:

To manage all of this, you have been provided with a class Player (player.cpp, player.h) that encapsulates all this information in one place. You will need to load in the players in from a file and put them into a LinkedMap. Then, once you have implemented your HashMap structure, you should also load the players into a HashMap and compare the efficiency of get() queries between the two data structures. The tasks below break this process down further.


Programming Tasks

Below are the tasks you should work through in sequence. A suggested timeline is as follows:

  • By Wednesday 4/6, finish up player loading and database queries
  • By Wednesday 4/13, finish up and debug your hash table implementation
  • By Friday 4/15, complete all tasks (the final deadline)

Player Loading (8 Points)

Your first task should be to load in all of the players in the text file into a map by filling in the loadPlayers(Map* m) method in the player.cpp file. To warm up for this, familiarize yourself with the Map abstract class and the LinkedMap class. Then, look at Person.cpp for an example of how to use the map. In particular, it accepts object references for both the key and value.

Hint: You should have a variable that keeps track of what line you're on as you're loading players.txt so that you know what field you're loading. Once you've loaded all 5 pieces of information, you can create a new object and add it to the map, and then move onto the next one.

Database Queries (7 Points)

Once you have loaded in the players, you should fill in the file playerlookup.cpp to lookup a player in the map specified on the command line. For instance,

Should print out and running Should print out

If a player is not in the database, you should output "not found"

HINT: The names of the players are stored as "Firstname Lastname", but you will need two command line arguments for the first name and last name. So you will have to use string concatenation to add them together with a space in order to match what's in the database.

String Hashing (5 pts)

We're now going to move towards making a hash map, but we first need to define how to hash objects of interest. I've provided an abstract class called Hashable which has the abstract method getHash() which objects extending this class will implement. Since we want to hash basketball players by their names, we'll define a hash function for a string:

Your Task: Complete the wrapper class around strings called HashableString by implementing the abstract method getHash() in hashable.cpp. You will be implementing the same hash function that Java uses for it's String type. In particular, here is pseudocode for the algorithm assuming that each character c has been cast to a size_t. Once you are finished this, you should create a file stringtest.cpp to test out the hash codes on a few examples. You should add an entry to your makefile to automatically compile this based on its dependencies. Then, you should use it to output and verify the following hash codes by initializing objects of type HashableString:

Hash Map Implementation (30 Pts)

You are now ready to create a hash table implementation of a map. Create a pair of files hashmap.h and hashmap.cpp, and add entries for them in your makefile. In these files, create a class HashMap that implements the abstract class Map. This should have the exact same public methods as LinkedMap, but it will have different private variables. The main private variable should be an array of HashNode* references, each representing the head of a linked list for a different bin. All references should start off as NULL.

Then, you should implement all of the methods below. You can borrow heavily from the code in linkedmap.cpp, but you will have to have an additional step of computing a hash to look up which bin's head you should start with. Below are the methods you need to implement. Be sure to develop incrementally!!. Be especially sure to compile every few lines of code you write to quickly flush out syntax errors.

  • (4 pts) void put(Hashable* key, Cloneable* value): Put an item into the hash map, using the key's hash function. To keep this simple, you do not need to check if the key already exists, so it is okay to end up with duplicates of the same key.
  • (4 pts) Cloneable* get(Hashable* key): Return a reference to the value associated to this key, or NULL if the key is not in the collection.
  • (4 pts) void remove(Hashable* key): Remove a key/value pair from the hash map, if the key exists in the table.
  • (4 pts) bool containsKey(Hashable* key): Return true if the key is in the hash map or false if it is not.
  • (4 pts) Hashable** getKeyList(size_t* N): Dynamically create an array to hold references to all of the keys in this map in an arbitrary order, and return the length of this array by reference.
  • (4 pts) Create a destructor that properly cleans up dynamic memory. Be sure you also clean up dynamic memory outside of the destructor when necessary. You will lose points if there are any memory leaks!

(6 pts) To put this all together and test it, change the person.cpp file to use a HashMap instead of a LinkedMap and make sure it still works. You should only have to change one line of code to make this swap! Then, do the same thing in playerlookup.cpp and verify that still works as well. This is probably where you will discover a lot of bugs in your code, so be patient!

HashMap / LinkedMap Comparison (10 pts)

You will now compare your HashMap implementation to the LinkedMap implementation both to check the former's correctness, and also to compare their efficiency. First, you should increment the numOps variable in each of your methods in HashMap every time a hash function is computed and every time a node is visited. This will allow you to keep track of how many operations are happening under different queries (refer to how this was done in LinkedMap if you are lost).

Next, you should create an executable file called mapcheck.cpp that performs the following steps:

  1. Create a map m1 of type LinkedMap and a map m2 of type HashMap, and fill them each in with the NBA players.
  2. Reset the operation counts of each map.
  3. Loop through all of the keys in m1 and make sure they're in m2 by calling the containsKey() method. If they're not, print out the ones that are missing to help you debug.
  4. Loop through all of the keys in m2 and make sure they're in m1 by calling the containsKey() method. If they're not, print out the ones that are missing to help you debug.
  5. Report the number of operations in steps 3 and 4, and the average number of operations per player.

Assuming everything is working correctly, if you use 100 bins, for example, you should see the following stats: Try to see what happens when you increase or decrease the number of bins. Does the average number of operations per player change in a way that makes sense?