Sunday, November 24, 2013

Object equality, hashCode() vs equals() in Java

In this post, we'll see what makes two object equals and when to override equals() and hashCode() method. Also we'll see what is difference between Reference equality and Object Equality.

Let's first see what does Object class methods equals() and hashCode() says.
equals()
hashCode()

Overriding equals()
This is what Object equals() method look like.
public boolean equals(Object obj) {
        return (this == obj);
}
It simply using the == operator to compare.
Comparing two object references using the == operator evaluates to true only when both references refer to the same object because == simply looks at the bits in the variable, and they're either identical or they're not.

Let's see an example to understand this. 

public class ReferenceDemo {  
  public static void main(String[] javalatte) {  
  Dog d1 = new Dog("DogA");  
  Dog d2 = new Dog("DogA");  
  if(d1.equals(d2)){  
   System.out.println("Dog's are equal");  
  }else {  
   System.out.println("Dog's are not equal");  
  }  
  Dog d = new Dog("Tommy");  
  Dog tommy1 = d;  
  Dog tommy2 = d;  
  if(tommy1.equals(tommy2)){  
   System.out.println("Dog's are equal");  
  }else {  
   System.out.println("Dog's are not equal");  
  }  
  }  
 }  
 class Dog{  
  private String name;  
  Dog(String title){  
  this.name=title;  
  }  
  public String getTitle(){  
  return name;  
  }  
 }  
Sample Output
Dog's are not equal
Dog's are equal
Now it clear why we got this output because equals method looks at the bits in the variable, and they're either identical or they're not.

If you see the String class and the wrapper classes have overridden the equals() method (inherited from class Object), so that you could compare two different objects (of the same type) to see if their contents are meaningfully equivalent. If two different String instances both hold the String value "javalatte", as far as you're concerned they are equal. The fact that the value "javalatte lives in two separate objects doesn't matter.

String class overridden equals() method code
public boolean equals(Object anObject) {
     if (this == anObject) {
         return true;
     }
     if (anObject instanceof String) {
         String anotherString = (String)anObject;
         int n = count;
         if (n == anotherString.count) {
             char v1[] = value;
             char v2[] = anotherString.value;
             int i = offset;
             int j = anotherString.offset;
             while (n-- != 0) {
                 if (v1[i++] != v2[j++])
                     return false;
             }
             return true;
         }
     }
     return false;
 }
To clear this picture, let's see an another example and have a look on the output
 public class StringReferenceDemo {  
  public static void main(String[] java) {  
  String s1 = new String("javalatte");  
  String s2 = new String("javalatte");  
  if(s1.equals(s2)){  
   System.out.println("String's are equal");  
  }else{  
   System.out.println("String's are not equal");  
  }  
  }  
 }  
Sample Output
String's are equal
If you see the "javalatte" string live in separate object, but they are equal because String class has overridden the equals() method.

When you really need to know if two references are identical, use ==. But when you need to know if the objects themselves (not the references) are equal, use the equals() method. For each class you write, you must decide if it makes sense to consider two different instances equal. For some classes, you might decide that two objects can never be equal. For example, imagine a class Car that has instance variables for things like make, model, year, configuration—you certainly don't want your car suddenly to be treated as the very same car as someone with a car that has identical attributes.
Your car is your car and you don't want your neighbor Pardeep driving off in it just because, "hey, it's really the same car; the equals() method said so".So no two cars should ever be considered exactly equal.
If two references refer to  one car, then you know that both are talking about one car, not two cars that have the same attributes. So in the case of a Car you might not ever need, or want, to override the equals() method.

What It Means If You Don't Override equals()
If you don't override a class's equals() method, you won't be able to use those objects as a key in a hashtable and you probably won't get accurate Sets, such that there are no conceptual duplicates.
The equals() method in class Object uses only the == operator for comparisons, so unless you override equals(), two objects are considered equal only if the two references refer to the same object.
As you have seen in the "ReferenceDemo" class where
Dog d1 =  new Dog("DogA");
Dog d2 =  new Dog("DogA");
di and d2 are not equal

Let's look at what it means to not be able to use an object as a hashtable key.
Imagine you have a Dog, a very specific car that you want to put in a HashMap, so that you can search on a particular Dog and retrieve the corresponding Person object that represents the owner. So you add the Dog instance as the key to the HashMap. But now what happens when you want to do a search? You want to say to the HashMap collection, "Here's the Dog, now give me the Person object that goes with this Dog." But now you're in trouble unless you still have a reference to the exact object you used as the key when you added it to the Collection. In other words, you can't make an identical Car object and use it for the search.
The bottom line is this: if you want objects of your class to be used as keys for a hashtable (or as elements in any data structure that uses equivalency for searching for—and/or retrieving—an object), then you must override equals() so that two different instances can be considered the same.

Understanding Hashcodes
Imagine a set of buckets lined up on the floor. Someone hands you a piece of paper with a name on it. You take the name and calculate an integer code from it by using A is 1, B is 2, and so on, and adding the numeric values of all the letters in the name together. A given name will always result in the same code;

We don't introduce anything random, we simply have an algorithm that will always run the same way given a specific input, so the output will always be identical for any two identical inputs. So far so good? Now the way you use that code is to determine which bucket to place the piece of paper into.Now imagine that someone comes up and shows you a name and says, "Please retrieve the piece of paper that matches this name." So you look at the name they show you, and run the same hashcode-generating algorithm. The hashcode tells you in which bucket you should look to find the name.
You might have noticed a little flaw in our system, though. Two different names might result in the same value. For example, the names Amy and May have the same letters, so the hashcode will be identical for both names. That's acceptable, but it does mean that when someone asks you  for the Amy piece of paper, you'll still have to search through the target bucket reading each name until we find Amy rather than May. The hashcode tells you only which bucket to go into, but not how to locate the name once we're in that bucket.

How some of the collections use hashcodes
The above distributed-across-the-buckets example is similar to the way hashcodes are used in collections. When you put an object in a collection that uses hashcodes, the collection uses the hashcode of the object to decide in which bucket/slot the object should land. Then when you want to fetch that object you have to give the collection a reference to an object that the collection compares to the objects it holds in the collection. As long as the object you're trying to search for has the same hashcode as the object you're using for the search then the object will be found.

But…and this is a Big One, imagine what would happen if, going back to our name example, you showed the bucket-worker a name and they calculated the code based on only half the letters in the name instead of all of them. They'd never find the name in the bucket because they wouldn't be looking in the correct bucket!
Now can you see why if two objects are considered equal, their hashcodes must also be equal? Otherwise, you'd never be able to find the object since the default hashcode method in class Object virtually always comes up with a unique number for each object, even if the equals() method is overridden in such a way that two or more objects are considered equal

What makes two object equal?
First, we have to ask - what makes two Dog references duplicate? They must be considered equal. Is it simply two references to the very same object, or it is two different object both have the same title?
This  brings up a key issue : reference equality vs object equality

Reference equality






if ( tommy == jakky ) {
// both references referring to the same object on the heap
}








Two references, one object on the heap.
Two references that refer to the same object on the heap are equal. If you call the hashCode() method on the both references, you will get the same result. If you don't override the hashCode() method, the default behavior is that each object will get a unique number.

If you want to know if the 2 references are really referring to the same object, use the == operator, which compares the bits in the variable. If both references point to the same objects, the bits will be identical. 


Object equality




if(tommy.equals(jakky) && tommy.hashCode() == jakky.hashCode() ){
// both references are referring to either a single object, or to tow objects that are equal
}








Two references, two objects on the heap, but the objects are considered meaningfully equivalent.
If you want to treat two different Dog objects as equal ( for example if you have decided that 2 Dog are the same if have matching name), you must override both the hashCode() and equals() method inherited from Object.

As I said above, if you don't override the hashCode() method, the default behavior is to give each object a unique hashcode value. So you must override hashCode() to be sure that two equivalent objects return the same hashcode. But you must also override equals() so that if you call it on either object passing in the other object, always return true.


How HashSet checks for duplicates : hashCode() and equals()
When you put an object into a HashSet, it uses the object's hashcode value to determine where to put the object in the Set. But it also compares the object's hashcode to the hashcode of all the objects in the HashSet, and if there is no matching hashcode, the HashSet assume that this new object is not a duplicate.
In other words, if the hashcodes are different, the HashSet assumes there is no way the objects can be equal.
So you must override hashCode() to make sure the object have the same value.
But two objects with the same hashCode() might not be equals, so if the HashSet finds a matching hashcode for two objects - one you are inserting and one already in the set -  the HashSet will then call one of the object's equals() methods to see if there hashcode matched object are really equals.
And if they are equal, the HashSet knows that the objects you are attempting to add is duplicate of something in the Set, so the add doesn't happen.

Let's see an example of HashSet for checking how it check for duplicate in the HashSet


import java.util.HashSet;  
 public class HashCodeDemo {  
  public static void main(String[] args) {  
  HashSet<Cat> hm = new HashSet<Cat>();  
  Cat c1 = new Cat("abc");  
  Cat c2 = new Cat("abcd");  
  Cat c3 = new Cat("xyz");  
  System.out.println("Adding c1");  
  System.out.println(hm.add(c1));  
  System.out.println("Adding c2");  
  System.out.println(hm.add(c2));  
  System.out.println("Adding c3");  
  System.out.println(hm.add(c3));  
  }  
 }  
 class Cat{  
  private String name;  
  Cat(String title){  
  this.name=title;  
  }  
  public String getName(){  
  return name;  
  }  
  public boolean equals(Object obj){  
  System.out.println("equals is called");  
  Cat c = (Cat)obj;  
  return name.equals(c.getName());  
  }  
  public int hashCode(){  
  System.out.println("hashcode is called");  
  return name.length();  
  }  
 }  
Sample Output
Adding c1
hashcode is called
true
Adding c2
hashcode is called
true
Adding c3
hashcode is called
equals is called
true

It's clear from the output that when it try to add c3 element in the hashSet it first called hashCode() to compare the hashcode value and on finding the same value it called equals() to check whether they are really equals or not.





If you know anyone who has started learning java, why not help them out! Just share this post with them. 
Thanks for studying today!...

2 comments: