Sunday, April 3, 2011

The importance of overriding Object's hashCode( ) method

During a recent conversation with some colleagues of mine, I figured out that some of them seem not to know what is the actual importance and utility of overriding the hashCode( ) method from the Object class.
Therefore, I wanted to dedicate a post for this topic.

1. Where is it used and how
The main purpose of the hashing algorithm is to improve performance of object retrieval from a hash collection such as a HashSet, HashMap and so on. Basically, such a collection can be seen as a series of buckets in which, objects will be stored. Each of these buckets is implicitly identified through the hashcode. Here's, for instance, how a HashSet can be seen : 
Fig. 1 - Empty HashSet

Every time an Object will be added to the HashSet, 2 operations will actually be done : 
  • The Object's HashCode ( ) method is executed in order to identify the bucket in which is will be stored
  • Since we are working with a Set, after having identify the appropriate bucket, the Object's equals(Object obj) method will be executed against each objects that are already stored in the same bucket to make sure a same object is not added twice to the Set.
Fig. 2 - Adding an Object to the HashSet

 2. hashCode( ) method's common issues

If the hashCode( ) method is not appropriately implemented or implemented at all : 
  • Several Objects that should be considered identical can be redundantly stored in the HashSet given they're not stored in the same bucket.
  • The contains(Object obj) method of the HashSet can return a wrong result because it was checking the wrong bucket
  • The remove(Object obj) method of the HashSet does not remove anything because, again, it was looking for the object to remove in the wrong bucket

3. Performance issues
Theoretically, an optimal hashCode( ) method would distribute the objects to store in a Hash collection among all the available buckets, so that the number of execution of the equals(Object obj) method is more or less the same regardless the bucket in which an object insertion/comparison/removal will be done

In order to check this by myself, I've written the following classes : 


package main;

public class Car {
 private String brand;
 private String model;
 
 public Car(String brand, String model){
  this.brand = brand;
  this.model = model;
 }
 
 public String getBrand() {
  return brand;
 }
 
 public void setBrand(String brand) {
  this.brand = brand;
 }
 
 public String getModel() {
  return model;
 }
 
 public void setModel(String model) {
  this.model = model;
 }

 @Override
 public boolean equals(Object obj) {  
  if (obj == null) {
   return false;
  }
  
  if (!(obj instanceof Car)) {
   return false;
  }
  
  Car other = (Car) obj;
  if (brand == null) {
   if (other.brand != null) {
    return false;
   }
  } else if (!brand.equalsIgnoreCase(other.brand)) {
   return false;
  }
  
  if (model == null) {
   if (other.model != null) {
    return false;
   }
  } else if (!model.equalsIgnoreCase(other.model)) {
   return false;
  }
  
  return true;
 }
 
 @Override
 public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result + ((brand == null) ? 0 : brand.hashCode());
  result = prime * result + ((model == null) ? 0 : model.hashCode());
  return result;
 }
}

package main;

import java.util.HashSet;
import java.util.Set;

public class HashMain {
 private static final int CATALOG_SIZE = 10000;
 
 public static void main(String[] args) {
  Set<Car> carCatalog = new HashSet<Car>();
  
  populateCarCatalog(carCatalog);
  emptyCarCatalog(carCatalog);
 }
 
 private static void populateCarCatalog(Set<Car> carCatalog){
  for(int i=1; i <= CATALOG_SIZE; i++){
   Car car = new Car("Brand ".concat(Integer.toString(i)), "Model ".concat(Integer.toString(i)));
   carCatalog.add(car);
  }
 }
 
 private static void emptyCarCatalog(Set<Car> carCatalog){
  long startTime = System.currentTimeMillis();
  
  for(int i=1; i <= CATALOG_SIZE; i++){
   Car car = new Car("Brand ".concat(Integer.toString(i)), "Model ".concat(Integer.toString(i)));
   carCatalog.remove(car);
  }
  
  System.out.println("Catalog emptied in : " + ((System.currentTimeMillis() - startTime) / 1000.0));
 }
} 

I had the Eclipse IDE generate the equals(Object obj) and hashCode( ) methods and the execution took around 0.01 second.

After making the hashCode( ) method constantly return the value 1, the execution took 2.681 sec.

The reason is that at the first removal, we had to make 10000 comparisons because all the Car objects were stored in the same bucket. At the second removal, 9999 comparisons. At the third removal, 9998. And so on ...

4. Conclusion
A good practice would be to always implement the equals(Object obj) and hashCode( ) methods when defining a POJO class, no matter it will be stored in a hash collection or not.  

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.