When I was a college student and took the course Data Structure and Algorithm, the B-Tree section was not included in the exam requirement. At that time the teacher just said little about it and therefore I knew little about it. It is ridiculous that I totally understand B-Tree until last month. So what is the B-Tree? B-Trees are balanced search trees designed to work well on magnetic disks or other direct-access secondary storage devices. B-Trees are similar to red-black trees. but they are better at minimizing disk I/O operations. Many database systems use B-Trees, or variants of B-Trees, to store information.
B-Trees differ from red-black trees in that B-Tree nodes may have many children. B-Trees are similar to red-black trees in that every n-node B-Tree has height O(lg n), although the height of a B-Tree can be considerably less than that of a red-black tree because its branching factor can be much larger. Therefore, B-Trees can also be used to implement many dynamic-set operations in time O(lg n).
The precise definition of B-Trees is as follows:
A B-tree T is a rooted tree (whose root is m_ root [ T ]) having the following properties:
- Every node x has the following fields:
- n [ x ], the number of keys currently stored in node x ,
- the n [ x ] keys themselves, stored in nondecreasing order, so that key 1 [ x ] ≤ key 2 [ x ] ≤ ··· ≤ key n [ x ] [ x ],
- leaf [ x ], a boolean value that is TRUE if x is a leaf and FALSE if x is an internal node.
- Each internal node x also contains n [ x ]+ 1 pointers c 1 [ x ], c 2 [ x ], …, c n [ x ]+1 [ x ] to its children. Leaf nodes have no children, so their c i fields are undefined.
- The keys key i [ x ] separate the ranges of keys stored in each subtree: if k i is any key stored in the subtree with root c i [ x ], then k 1 ≤ key 1 [ x ] ≤ k 2 ≤ key 2 [ x ] ≤··· ≤ key n [ x ] [ x ] ≤ k n [ x ]+1 .
- All leaves have the same depth, which is the tree’s height h .
- There are lower and upper bounds on the number of keys a node can contain. These bounds can be expressed in terms of a fixed integer t ≥ 2 called the minimum degree of the B-tree:
- Every node other than the root must have at least t – 1 keys. Every internal node other than the root thus has at least t children. If the tree is nonempty, the root must have at least one key.
- Every node can contain at most 2 t – 1 keys. Therefore, an internal node can have at most 2 t children. We say that a node is full if it contains exactly 2 t – 1 keys.
The simplest B-tree occurs when t = 2. Every internal node then has either 2, 3, or 4 children, and we have a 2-3-4 tree . In practice, however, much larger values of t are typically used.
The following is my implement of B-Tree. The template class Element is data type stored inside the B-Tree nodes. I use two STL’s vectors to store the info of the elements and children in each node. One is vector
; another is vector
.
1 //the class definition of the element stored in each node; 2 template<typename T> 3 class Element 4 { 5 public: 6 T data; 7 bool operator < (Element& e) const { return data < e.data;} 8 bool operator <= (Element& e) const { return data <= e.data;} 9 bool operator > (Element& e) const { return data > e.data;} 10 bool operator >= (Element& e) const { return data >= e.data;} 11 bool operator == (Element& e) const { return data == e.data;} 12 Element& operator = (const Element& e) 13 { 14 data=e.data; 15 return *this; 16 } 17 Element(T& d) : data(d){} 18 }; 19 20 //An integer element. 21 typedef Element<int> Elem; 22 23 class BTreeNode 24 { 25 public: 26 vector<Elem> m_data; 27 //m_child[i] is the child whose keys are smaller than that of m_data[i]; 28 //and the size of m_child is always one more than m_data 29 vector<BTreeNode*> m_child; 30 //Here is my B-tree strcture representation 31 // 32 // m_data[0] m_data[1] ... m_data[i] ... 33 // m_child[0]--> / | | <--m_child[i+1] 34 // / | | 35 // 36 BTreeNode* m_parent; // pointer to the parent BTreeNode 37 BTreeNode(BTreeNode* mp=NULL) : m_parent(mp){} 38 }; 39 class BTree 40 { 41 typedef void (*v_func)(Elem &data);//visit function 42 protected: 43 int m_order; //maximum number of the elements stored a node 44 BTreeNode* m_root;//the root of the tree 45 v_func m_visit; 46 //return true if the insertion does not need split the current node 47 bool Normal_insert(Elem &); 48 //When the number of current node's element is overflow, perform split insertion. 49 bool Split_insert(Elem &); 50 //store the result of every search 51 BTreeNode* search_result; 52 public: 53 BTree(v_func f, int n, BTreeNode* r=NULL):m_visit(f),m_order(n),m_root(r) 54 { 55 search_result = NULL; 56 }; 57 ~BTree(); 58 bool Insertion(Elem&);//Insert a element to the B-tree. 59 bool Deletion(Elem&); 60 BTreeNode* Search(Elem&);//Search an Element in B-tree. 61 void travel(BTreeNode*) const; // travel the Btree 62 void print() const { travel(m_root); } // print the elements in a sorted order 63 };
Searching a B-tree is much like searching a binary search tree, except that instead of making a binary, or “two-way,” branching decision at each node, we make a multiway branching decision according to the number of the node’s children. More precisely, at each internal B-tree node x, in my code, I perform a binary search.
1 //Search an Element in B-tree. Return the node pointer which contain the 2 //desired element or return NULL if the element isn't found. 3 BTreeNode* BTree::Search(Elem& t) 4 { 5 BTreeNode* p = m_root; 6 while (p) 7 { 8 //store the current search result. 9 search_result = p; 10 //perform binary search of the node 11 int first=0, last=p->m_data.size()-1, mid = (first+last)/2; 12 while (first<=last) 13 { 14 mid=(first+last)/2; 15 if (p->m_data[mid]==t) { 16 return p; 17 } 18 if (p->m_data[mid]>t) { 19 last=mid-1; 20 } 21 if (p->m_data[mid]<t) { 22 first=mid+1; 23 } 24 } 25 //continue search in the child. 26 if(p->m_data[mid]>t) 27 p = p->m_child[mid]; 28 else 29 p = p->m_child[mid+1]; 30 } 31 return p; 32 }
As with binary search trees, we search for the leaf position at which to insert the new key. With a B-tree, however, we cannot simply create a new leaf node and insert it, as the resulting tree would fail to be a valid B-tree. Instead, we insert the new key into an existing leaf node. Since we cannot insert a key into a leaf node that is full, we introduce an operation that splits a full node y (having 2t – 1 keys(having m_order elements)) around its median key keyt[y] into two nodes having t – 1 keys each. The median key moves up into y’s parent to identify the dividing point between the two new trees. But if y’s parent is also full, it must be split before the new key can be inserted, and thus this need to split full nodes can propagate all the way up the tree.
1 bool BTree::Insertion(Elem& t) 2 { 3 if(Search(t))//The element already exits in the tree 4 return false; 5 if(m_root==NULL)//The tree is empty 6 { 7 m_root = new BTreeNode; 8 m_root->m_data.push_back(t); 9 m_root->m_child.push_back(NULL); 10 m_root->m_child.push_back(NULL); 11 m_root->m_parent = NULL; 12 return true; 13 } 14 else 15 { 16 if(search_result) 17 { 18 //If the element can fit into vector m_data, slide all the elements 19 //greater than the arguement forward one position and insert the newly 20 //vacated slot, then return true. Otherwise performan split. 21 vector<Elem> *p = &search_result->m_data; 22 vector<BTreeNode*> *pc = &search_result->m_child; 23 int i; 24 //locate the right position to insert 25 for(i=0;i<p->size() && (*p)[i]<t;i++); 26 p->insert(p->begin()+i,t);//insert the new Element 27 pc->insert(pc->begin()+i,NULL);//insert the new child. 28 if(p->size() < m_order)//dosen't need to split this node 29 { 30 return true; 31 } 32 else 33 return Split_insert(t); 34 } 35 else 36 return false; 37 } 38 39 } 40 41 42 bool BTree::Split_insert(Elem &t) 43 { 44 BTreeNode* sr = search_result; 45 if(sr) 46 { 47 BTreeNode* sr = search_result; 48 vector<Elem> *p = &sr->m_data; 49 vector<BTreeNode*> *pc = &sr->m_child; 50 //begin to insert and split 51 while(p->size() >= m_order) 52 { 53 int split_point = p->size()/2; 54 BTreeNode* new_node = new BTreeNode(sr->m_parent); 55 //new node recieve rightmost half of current node. And after copying 56 //data to new node, delete the righmost half of the current node 57 int i,total= p->size(); 58 for(i=split_point+1;i<total;i++) 59 { 60 new_node->m_data.push_back((*p)[split_point+1]); 61 new_node->m_child.push_back((*pc)[split_point+1]); 62 //make the new node be the parent of the righmost half's children 63 if((*pc)[split_point+1]) 64 (*pc)[split_point+1]->m_parent = new_node; 65 p->erase(p->begin()+split_point+1); 66 pc->erase(pc->begin()+split_point+1); 67 } 68 //deal with the one more child 69 new_node->m_child.push_back((*pc)[split_point+1]); 70 //make the new node be the parent of the righmost half's children 71 if((*pc)[split_point+1]) 72 (*pc)[split_point+1]->m_parent = new_node; 73 pc->erase(pc->begin()+split_point+1); 74 //delete the split_point 75 Elem upward = (*p)[split_point]; 76 p->erase(p->begin()+split_point); 77 //make the current's parent to be new_node's parent 78 new_node->m_parent = sr->m_parent; 79 if(sr == m_root) //reach the root 80 { 81 //make a new node to be root. 82 BTreeNode* new_node2 = new BTreeNode(NULL); 83 m_root = new_node2; 84 new_node2->m_data.push_back(upward); 85 //the new root's two children are the splited two from current node 86 new_node2->m_child.push_back(sr); 87 new_node2->m_child.push_back(new_node); 88 sr->m_parent = new_node2; 89 new_node->m_parent = new_node2; 90 return true; 91 } 92 93 //add the upward element to the parent of current node 94 p = &sr->m_parent->m_data; 95 pc = &sr->m_parent->m_child; 96 //locate the right position to insert 97 for(i=0;i<p->size() && (*p)[i]<upward;i++); 98 p->insert(p->begin()+i,upward); 99 //adjust the parent's child 100 pc->insert(pc->begin()+i+1,new_node); 101 sr=sr->m_parent; 102 } 103 return true; 104 } 105 return false; 106 }
Here is the the ful list code. I have not tested it seriously, so if you find that there is a bug in it, please comment on. Thanks!
Deletion from a B-tree is analogous to insertion but a little more complicated, I will implement it later if have any spare time.
Hi. I’m no expert, but as I understand it a “2-3” tree (each node has either one or two entries and two or three (or none) children) are also B-Trees.