Why is std::map implemented as a red-black tree?
Why is std::map implemented as a red-black tree? There are several balanced binary search trees (BSTs) out there. What were design trade-offs in choosing a red-black tree?
Although all implementations I’ve seen use an RB-tree, note that this is still implementation-dependent.
C++’s map and set are actually ordered map and ordered set. They are not implemented using hash functions. Every query would take O(logn) and not O(1) , but the values will be always sorted. Starting from C++11 (i think), there are unordered_map and unordered_set , that are implemented using hash functions and while they are not sorted, most queries and operations are possible in O(1) (averagely)
I’m surprised that nobody has said anything about iterator invalidation. The STL’s API guarantees that, when you insert or delete an element from a std::map , iterators pointing into other elements are not invalidated. This makes it veeery difficult, if not outright impossible, to store more than one element per dynamically allocated node, while also fulfilling the usual time complexity guarantees. (Queries and updates to a std::map must take at worst logarithmic time.) So, in practice, std::map implementations have to be self-balancing binary trees of some sort.
6 Answers 6
Probably the two most common self balancing tree algorithms are Red-Black trees and AVL trees. To balance the tree after an insertion/update both algorithms use the notion of rotations where the nodes of the tree are rotated to perform the re-balancing.
While in both algorithms the insert/delete operations are O(log n), in the case of Red-Black tree re-balancing rotation is an O(1) operation while with AVL this is a O(log n) operation, making the Red-Black tree more efficient in this aspect of the re-balancing stage and one of the possible reasons that it is more commonly used.
Red-Black trees are used in most collection libraries, including the offerings from Java and Microsoft .NET Framework.
you make it sound like red-black trees can do tree modifications in O(1) time, which is not true. tree modifications are O(log n) for both red-black and AVL trees. that makes it moot whether the balancing part of the tree modification is O(1) or O(log n) because the main operation is already O(log n). even after all the slightly extra work that AVL trees do results in a more tightly balanced tree which leads to slightly faster lookups. so it is a perfectly valid tradeoff and does not make AVL trees inferior to red-black trees.
You have to look beyond the complexity to actual runtime to see a difference — AVL trees generally have a lower total runtime when there are many more lookups than inserts/deletes. RB trees have a lower total runtime when there are many more inserts/deletes. The exact proportion at which the break occurs depends of course on many details of implementation, hardware, and exact usage, but since library authors have to support a wide range of usage patterns, they have to take an educated guess. AVL is also slightly harder to implement, so you might want a proven benefit to use it.
RB tree isn’t a «default implementation». Each implementer chooses an implementation. As far as we know, they’ve all chose RB trees, so presumably this is either for performance or for ease of implementation/maintenance. As I said, the breakpoint for performance might not imply that they think there are more inserts/deletes than lookups, just that the ratio between the two is above the level where they think RB probably beats AVL.
@Denis: unfortunately the only way to get numbers is to make a list of std::map implementations, track down the developers, and ask them what criteria they used to make the decision, so this remains speculation.
Missing from all this is the cost, per-node, to store the auxilary information required to make balance decisions. Red-Black trees require 1-bit to represent the colour. AVL trees require at least 2 bits (to represent -1, 0 or 1).
It really depends on the usage. AVL tree usually has more rotations of rebalancing. So if your application doesn’t have too many insertion and deletion operations, but weights heavily on searching, then AVL tree probably is a good choice.
std::map uses Red-Black tree as it gets a reasonable trade-off between the speed of node insertion/deletion and searching.
Are you sure about that. I personally think that Red-Black tree is either or more complex, never simpler. The only thing, is in Rd-Black tree, re-balancing occurs less often than AVL.
@Eric Theoretically, both R/B tree and AVL tree has complexity O(log n) ) for insertion and deletion. But one big part of the operation cost is rotation, which is different between these two trees. Please refer to discuss.fogcreek.com/joelonsoftware/… Quote: «balancing an AVL tree can require O(log n) rotations, whilst a red black tree will take at most two rotations to bring it into balance (though it may have to examine O(log n) nodes to decide where the rotations are necessary).» Edited my comments accordingly.
Thanks a lot to bring my attention to the maximum rotation of 2 for insertion into a RB tree. You are right. I didn’t realized that. Like you said, re-coloration happen in Log(n) but cost a lot less that rotation. I think your answer is great, I don’t remember what was the previous one ;-). Thanks.
The previous answers only address tree alternatives and red black probably only remains for historical reasons.
Why not a hash table?
Designing a good hash table requires intimate knowledge of the context it which it will be used. Should it use open addressing, or linked chaining? What levels of load should it accept before resizing? Should it use an expensive hash that avoids collisions, or one that is rough and fast?
Since the STL can’t anticipate which is the best choice for your application, the default needs to be more flexible. Trees «just work» and scale nicely.
(C++11 did add hash tables with unordered_map . You can see from the documentation it requires setting policies to configure many of these options.)
What about other trees?
Red Black trees offer fast lookup and are self balancing, unlike BSTs. Another user pointed out its advantages over the self-balancing AVL tree.
Alexander Stepanov (The creator of STL) said that he would use a B* Tree instead of a Red-Black tree if he wrote std::map again, because it is more friendly for modern memory caches.
One of the biggest changes since then has been the growth of caches. Cache misses are very costly, so locality of reference is much more important now. Node-based data structures, which have low locality of reference, make much less sense. If I were designing STL today, I would have a different set of containers. For example, an in-memory B*-tree is a far better choice than a red-black tree for implementing an associative container. — Alexander Stepanov
Should maps always use trees?
Another possible maps implementation would be a sorted vector (insertion sort) and binary search. This would work well for containers which aren’t modified often but are queried frequently. I often do this in C as qsort and bsearch are built in.
Do I even need to use map?
Cache considerations mean it rarely makes sense to use std::list or std::deque over std:vector even for those situations we were taught in school (such as removing an element from the middle of the list). Applying that same reasoning, using a for loop to linear search a list is often more efficient and cleaner than building a map for a few lookups.
Of course choosing a readable container is usually more important than performance.
AVL trees have a maximum height of 1.44logn, while RB trees have a maximum of 2logn. Inserting an element in a AVL may imply a rebalance at one point in the tree. The rebalancing finishes the insertion. After insertion of a new leaf, updating the ancestors of that leaf has to be done up to the root, or up to a point where the two subtrees are of equal depth. The probability of having to update k nodes is 1/3^k. Rebalancing is O(1). Removing an element may imply more than one rebalancing (up to half the depth of the tree).
RB-trees are B-trees of order 4 represented as binary search trees. A 4-node in the B-tree results in two levels in the equivalent BST. In the worst case, all the nodes of the tree are 2-nodes, with only one chain of 3-nodes down to a leaf. That leaf will be at a distance of 2logn from the root.
Going down from the root to the insertion point, one has to change 4-nodes into 2-nodes, to make sure any insertion will not saturate a leaf. Coming back from the insertion, all these nodes have to be analysed to make sure they correctly represent 4-nodes. This can also be done going down in the tree. The global cost will be the same. There is no free lunch! Removing an element from the tree is of the same order.
All these trees require that nodes carry information on height, weight, color, etc. Only Splay trees are free from such additional info. But most people are afraid of Splay trees, because of the ramdomness of their structure!
Finally, trees can also carry weight information in the nodes, permitting weight balancing. Various schemes can be applied. One should rebalance when a subtree contains more than 3 times the number of elements of the other subtree. Rebalancing is again done either throuh a single or double rotation. This means a worst case of 2.4logn. One can get away with 2 times instead of 3, a much better ratio, but it may mean leaving a little less thant 1% of the subtrees unbalanced here and there. Tricky!
Which type of tree is the best? AVL for sure. They are the simplest to code, and have their worst height nearest to logn. For a tree of 1000000 elements, an AVL will be at most of height 29, a RB 40, and a weight balanced 36 or 50 depending on the ratio.
There are a lot of other variables: randomness, ratio of adds, deletes, searches, etc.
Источник
Explanation of Red-Black tree based implementation of TreeMap in JAVA
A Red-Black tree based NavigableMap implementation. The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used. This implementation provides guaranteed log(n) time cost for the containsKey, get, put and remove operations. Algorithms are adaptations of those in Cormen, Leiserson, and Rivest’s Introduction to Algorithms.
In the source code I found that an Inner Class Entry was used as a node.
static final class Entry implements Map.Entry < K key; V value; Entryleft = null; Entry right = null; Entry parent; boolean color = BLACK; .
A red–black tree is a type of self-balancing binary search tree, a data structure used in computer science. The self-balancing is provided by painting each node with one of two colors (these are typically called 'red' and 'black', hence the name of the trees) in such a way that the resulting painted tree satisfies certain properties that don't allow it to become significantly unbalanced. When the tree is modified, the new tree is subsequently rearranged and repainted to restore the coloring properties. The properties are designed in such a way that this rearranging and recoloring can be performed efficiently.
- Suppose I am already having two keys "C" and "E" in the Tree and then I am adding "D". How the nodes will be arranged (Using natural ordering).
- How is self-balancing of Tree is implemented in the java source code.
I tried searching for detail implementationof TreeMap but was unable to find any article such as the following article I have found for HashMap
Since yesterday I am hanging on this Tree 🙁 Can someone please help me get down.
Источник