- When and how does HashMap convert the bucket from linked list to Red Black Trees? [duplicate]
- 1 Answer 1
- Hashmap java красно черное дерево
- Insertion Time
- Deletion Time
- Red-Black Tree Compared With Other Data Structures
- Red-Black Tree vs. AVL Tree
- Red-Black Tree vs. Binary Search Tree
- Summary
- The Secret Improvement of HashMap in Java 8
- Red-black Tree
- Treeify in HashMap
- References
When and how does HashMap convert the bucket from linked list to Red Black Trees? [duplicate]
I was going through java 8 features and found out that hashmaps use a red black tree instead of a linkedlist when the number of entry sets on the bucket increases. However, doesn’t this require the key to be Comparable or some ordering of the keys to exist and how does this work ? When does this conversion actually happens and how ?
1 Answer 1
When there are at least 8 entries ( TREEIFY_THRESHOLD ) in a single bucket and the total number of buckets is more then 64 ( MIN_TREEIFY_CAPACITY ) then that single bucket will be transformed to a perfectly balanced red black tree node.
There is also the shrinkage that you should be aware of (if you want) that happens when you remove entries ( UNTREEIFY_THRESHOLD == 6).
You are correct that keys should be Comparable — but that is not always required, it is good if they are (in case they have the same hashCode ), but in case they are not, this is used:
static int tieBreakOrder(Object a, Object b)
So the className as a String is used for comparison and if that fails too, then System.identityHashCode is used (Marsaglia XOR-Shift algorithm) to decide the left and right.
To answer you question when this happens — when resize is called. When you have to resize your HashMap — there are some things happening; like the number of buckets increases by a factor of two (one more bit is taken into consideration where an entry will move or not) or a certain bucket is transformed to a tree. This process (again if you really care) is pretty slow, some people say that a Java HashMap is «sloooooooow, then is fast fast fast; then it’s sloooooo, then it’s fast fast fast» (I still see this as mocking, but there are PauselessHashMap implementations).
This brings two interesting points. First is choose the correct size of your Map initially (even a rough estimation will do), i.e.:
new HashMap<>(256); // choosing the size
this will avoid some resizes.
The second one is why transforming to a Tree is important (think database indexes and why they are BTREE . ). How many steps it would take you to find an entry in a perfect tree that has INTEGER.MAX_VALUE entries (theoretically). Only up to 32.
Источник
Hashmap java красно черное дерево
We follow a path from the root to the searched node (or to a NIL leaf). At each level, we perform a comparison. The effort for the comparison is constant. The search cost is thus proportional to the tree height. We denote by n the number of tree nodes. In the «Height of a Red-Black Tree» section, we have recognized that the longest path is at most twice as long as the shortest path. It follows that the height of the tree is bounded by O(log n). A formal proof is beyond the scope of this article. You can read the proof on Wikipedia. Thus, the time complexity for finding a node in a red-black tree is: O(log n)
Insertion Time
- Checking the color of the parent node
- Determination of the uncle node and checking its color
- Recoloring one up to three nodes
- Performing one or two rotations
Each of these operations has constant time, O(1), in itself. The total time for checking and repairing the tree is therefore also proportional to its height.
So the time complexity for inserting into a red-black tree is also: O(log n)
Deletion Time
Just as with insertion, we first search for the node to be deleted in time O(log n).
Also, the deletion cost is independent of the tree size, so it is constant O(1).
For checking the rules and repairing the tree, one or more of the following operations occur – at most once per level:
- Checking the color of the deleted node
- Determining the sibling and examining its color
- Checking the colors of the sibling’s children
- Recoloring the parent node
- Recoloring the sibling node and one of its children
- Performing one or two rotations
These operations also all have a constant complexity in themselves. Thus, the total effort for checking and restoring the rules after deleting a node is also proportional to the tree height.
So the time complexity for deleting from a red-black tree is also: O(log n)
Red-Black Tree Compared With Other Data Structures
The following sections describe the differences and the advantages and disadvantages of the red-black tree compared to alternative data structures.
Red-Black Tree vs. AVL Tree
The red-black tree, as well as the AVL tree, are self-balancing binary search trees.
In the red-black tree, the longest path to the root is at most twice as long as the shortest path to the root. On the other hand, in the AVL tree, the depth of no two subtrees differs by more than 1.
In the red-black tree, balance is maintained by the node colors, a set of rules, and by rotating and recoloring nodes. In the AVL tree, the heights of the subtrees are compared, and rotations are performed when necessary.
These differences in the characteristics of the two types of trees lead to the following differences in performance and memory requirements:
- Due to the more even balancing of the AVL tree, search in an AVL tree is usually faster. In terms of magnitude, however, both are in the range O(log n).
- For insertion and deletion, the time complexity in both trees is O(log n). In a direct comparison, however, the red-black tree is faster because it rebalances less frequently.
- Both trees require additional memory: the AVL tree one byte per node for the height of the subtree starting at a node; the red-black tree one bit per node for the color information. This rarely makes a difference in practice since a single bit usually occupies at least one byte.
If you expect many insert/delete operations, then you should use a red-black tree. If, on the other hand, you expect more search operations, then you should choose the AVL tree.
Red-Black Tree vs. Binary Search Tree
The red-black tree is a concrete implementation of a self-balancing binary search tree. So every red-black tree is also a binary search tree.
There are also other types of binary search trees, such as the AVL tree mentioned above – or trivial non-balanced implementations. Thus, not every binary search tree is also a red-black tree.
Summary
This tutorial taught you what a red-black tree is, which rules govern it and how these rules are evaluated and restored if necessary after inserting and deleting nodes. I also introduced you to a Java implementation that is as easy to understand as possible.
The JDK uses red-black trees in TreeMap (here is the source code on GitHub) and in bucket collisions in HashMap (here is the source code).
With this, I conclude the tutorial series on binary trees.
If I could help you better understand binary trees in general, binary search trees, AVL trees, and – in this article – red-black trees, I’m happy about a comment. Also, feel free to share the article using one of the share buttons at the end.
Do you want to be informed when the next article is published on HappyCoders.eu? Then click here to sign up for the HappyCoders newsletter.
Источник
The Secret Improvement of HashMap in Java 8
HashMap is one of the most widely used data structure in Java. As we know, HashMap stores data in a number of buckets and each bucket is a linked list. The new elements will be added into the bucket at index of \((n — 1)\,\&\,hash\) (n is the current capacity in HashMap). When the size of a single linked list increases, the performance of retrievals gets worse. Java 8 improves the performance by converting the linked list into a Red-black Tree if the size is bigger than a threshold. I am going to talk about the basics of red-black tree and how Java applies this methodology in HashMap.
Red-black Tree
Red-black tree is an approximately balanced binary search tree, which has the following properties:
- Each node is either red or black.
- The root node and all leaves are black.
- If a node is red, then both its children are black.
- Every path from a given node to any of its descendant NIL nodes contains the same number of black nodes.
These constraints enforce a critical property of red–black trees: the path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf. As property 4 guarantees that all the paths from root node to leaves contains the same amount of black nodes. So, the shortest path only contains black nodes. We assume the shortest path contains N black nodes. Based on property 3, the longest path in the tree could be inserting one red node after every black node in a path. So the longest path contains \(2*N\) nodes. When considering the NIL leaves, the shortest path is \(N+1\) and the longest path is \(2*N+1\).
This property guarantees log based time complexity of operations in red-black tree. The time complexities of Search, Insertion, Deletion are all \(O(log\,n)\)
Resources about insertion and deletion in Red-black tree can be referred here.
Treeify in HashMap
There are three static variables in HashMap related to “treeify” functions in HashMap:
- TREEIFY_THRESHOLD(8): The bin count threshold for using a tree rather than list for a bin. Bins are converted to trees when adding an element to a bin with at least this many nodes.
- UNTREEIFY_THRESHOLD(6): The bin count threshold for untreeifying a (split) bin during a resize operation.
- MIN_TREEIFY_CAPACITY(64): The smallest table capacity for which bins may be treeified. Otherwise the table is resized if too many nodes in a bin.
The following method is copied from the source code of HashMap in Java 8, which converts the linked list at index for a given hash value into a red-black tree. It only convert the linked list if the amount of buckets is greater than MIN_TREEIFY_CAPACITY, otherwise it calls resize method. Resize method doubles the size of the table which makes each bucket contains less nodes.
final void treeifyBin(NodeK,V>[] tab, int hash) int n, index; NodeK,V> e; if (tab == null || (n = tab.length) MIN_TREEIFY_CAPACITY) resize(); else if ((e = tab[index = (n - 1) & hash]) != null) TreeNodeK,V> hd = null, tl = null; do TreeNodeK,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else p.prev = tl; tl.next = p; > tl = p; > while ((e = e.next) != null); if ((tab[index] = hd) != null) hd.treeify(tab); > >
The following code is the inner class — TreeNode in HashMap. It implements Extends LinkedHashMap.Entry, which in turn extends the inner class Node, so it is easier to untreeify a red-black tree to linked list.
static final class TreeNodeK,V> extends LinkedHashMap.EntryK,V> TreeNodeK,V> parent; // red-black tree links TreeNodeK,V> left; TreeNodeK,V> right; TreeNodeK,V> prev; // needed to unlink next upon deletion boolean red; >
The treeify method loops through all nodes of a given list and uses red-black tree insertion algorithm to construct a new red-black tree. After it forms the whole tree, moveRootToFront method is called to move the root node of the red-black tree to the front of the given linked list.
static K,V> void moveRootToFront(NodeK,V>[] tab, TreeNodeK,V> root) int n; if (root != null && tab != null && (n = tab.length) > 0) int index = (n - 1) & root.hash; TreeNodeK,V> first = (TreeNodeK,V>)tab[index]; if (root != first) NodeK,V> rn; tab[index] = root; TreeNodeK,V> rp = root.prev; if ((rn = root.next) != null) ((TreeNodeK,V>)rn).prev = rp; if (rp != null) rp.next = rn; if (first != null) first.prev = root; root.next = first; root.prev = null; > assert checkInvariants(root); > >
References
This work is licensed under a Attribution-NonCommercial 4.0 International license.
Источник