Jump to content
Nytro

Trees, B-Trees, B+Trees and H-Trees

Recommended Posts

Posted

[h=1]Trees, B-Trees, B+Trees and H-Trees[/h][h=3]Jarret W. Buse[/h]B+Tree

B-Trees are used to help search various data. Some file systems use B+Tree to search directories, extent descriptors, file allocation and even file retrieval. In the future, B+Tree may be used to search through more types of data within the file systems.

First, let's look at a Tree, and we'll use letters to represent files (C, F, L, N, P and Z). A Tree is a tree structure (upside down as most people say), which consists of one root node, children nodes and leaf nodes. Each node contains a key and pointer. The key is the search item, such as a file name. The data pointer for the key points to the actual data. So, looking at Figure 1, let's assume File L is entered first in the Tree.

figure-1-jpg.303

FIGURE 1

The next File added is File C. File C is added under the root node and to the left since the value is less than the root (C<Z). Greater values go to the right, so if File Z is added, it goes to the right of the root as shown in Figure 2.

figure-2-jpg.304

FIGURE 2

We have 3 files left to add, so let's add File F, N and P as shown in Figure 3.

figure-3-jpg.305

FIGURE 3

In this case, the Tree is not balanced; that is, there are more nodes to the right of the root than to the left. In some cases, this may be fine. If you access File L more than the others, the search will occur very quickly. Of course with the number of files on some file systems being over 10,000, this Tree would be quite large.

From Figure 3, Node L is the root node. Node C is a child node (as is Node Z and N). Nodes F and P are Leaf Nodes. Leaf Nodes are the nodes on the end that have no child nodes. Node C is considered a parent of Nodes F. Node L (root) is the parent of Node C and Z.

In an extreme case, the root could be File Z, followed by a child node of File P, then N, L, F and finally C as shown in Figure 4. The layout would produce a very unbalanced tree that most likely would cause lengthy searches.

figure-4-jpg.306

FIGURE 4

Now, B-Tree uses more keys within a node, otherwise referred to as the order. If a B-Tree node contains five pointers, it is a B-Tree of order 5. Let's look at an example of a B-Tree root as shown in Figure 5.

figure-5-jpg.307

FIGURE 5

The dots represent the pointers while the numbers represent the key values (file names, etc.). Notice that for this node, each Key has a partnered Pointer: Key-1 and Pointer-1, Key-2 and Pointer-2, etc. If you look at Figure 5, you notice there is an extra Pointer (Pointer-0). The tree works in a similar way as a regular tree. If the search item we are looking for is less than Key-1, Pointer-0 is followed. If the search item is greater than or equal to Key-1 and less than Key-2, we follow Pointer-1. If the search item is greater than or equal to Key-2 and less than Key-3, we follow Pointer-2.

For example, if we were searching for number 1, we would follow Pointer-0. If we were searching for 12, we would follow Pointer-2. By following these pointers, we are led to another node to perform the same task and so on until we reach a leaf which contains the search value. The search value points to our file we are searching for or the search item is not found and an error message is returned.

Take a look at Figure 6 for the following example.

figure-6-jpg.308

FIGURE 6

Let's say we are searching for 5. We start at the root node and determine if the search value is less than Key-1 (3). Since it is not, we see if the search value is between Key-1 (3) and Key-2 (10), which it is so we follow Pointer-1. At this node, we check if the search value is less than Key-1 (5), it is not. Our search value is equal to Key-1 (5) so we follow Pointer-1 and find two leaf nodes (5 and 6). The first value matches our search, so we use the value in the leaf node to get to the file for which we have been searching.

Now, let's say we are searching for File-18. We start at the root and follow Pointer-2 since our search value is greater than or equal to Key-2 (10). At the next node, we have three key values to check: 10, 15 and 22. We know that 18 is greater than 10 and 15, so we can skip Pointer-0 and Pointer-1, and we follow Pointer-2. At the leaf nodes, we find two leaves, 15 and 20. File-18 does not exist and a message can alert the user that there is no such file.

NOTE: Be aware that these searches typically are extremely fast. You can see how a tree can allow for faster searching than going through a whole sequential file.

Now to move on to the more difficult parts: insertion and deletion. For Figure 6, let's add File-12. The first thing that is done when doing an insertion or deletion is to search for the entry that is being inserted or deleted. This is done for specific reasons:

  • Insertion - the entry must not already exist
  • Deletion - the entry must already exist

If the entry to be inserted exists, or the entry to delete does not exist, a message is generated. In the case of an indexed file listing for a file system, if a file exists that we are copying to the harddisk, we get a query asking to overwrite the file (it was found in the B-Tree). If we want to delete a file that does not exist, we get an error that the file does not exist.

Now, to get to the details of inserting File-12; if you look at Figure 6, we follow Pointer-2 to the next node. File-12 is greater than Key-1 (10) and less than Key-2 (15), so we follow Pointer-1. Now we find two leaf nodes (10 and 14). File-12 should be inserted between the two as shown in Figure 7. File-12 is placed between the leaves since it goes there, and since it falls between Leaf-10 and Leaf-14, no entries need to be made in a node.

figure-7-jpg.309

FIGURE 7

Now, let's look at the possibility of adding File-8. Looking at Figure 7, we can see the File-8 would be searched for, from the root, down Pointer-1. File-8 does not fit in this node anywhere, so another key must be made: Key-3 (8) as shown in Figure 8.

figure-8-jpg.310

FIGURE 8

Now we can try another. Let's add File-30. Following Pointer-2 from the root, we get to a node that has one space left, and File-30 is added there as shown in Figure 9.

figure-9-jpg.311

FIGURE 9

What happens if we want to add File-40? Well, if the B-tree is an order 5, we can only have five pointers per node. By adding File-40, we would create a node with more than five pointers. To accomplish the insertion, we take the node that is full and remove half the keys and pointer pairs. These entries will then be placed in a new node. All associated leaves will be moved as well, as shown in Figure 10.

NOTE: Each node must contain at least 2 keys (the root is the exception).

figure-10-jpg.312

FIGURE 10

The keys (22 and 30) are moved to a new node. The largest leaf value (20) is added to the previous node Key-3 so we now have a new high end for the node. Key-2 will now become File-40 when inserting the new key. The first Key (30) of the new node must be placed in the root and a pointer associated with it. As you can see, File-22 and File-27 are placed in leaf nodes with Pointer-0 pointing to it.

NOTE: When something changes, the effect can "ripple" up to the root node. This is extremely true for large B-Trees which may have fuller nodes.

Looking at Figure 10, if one of the following entries were to be deleted (6, 9, 12, 14, 22, or 27), these could be removed with no further actions. A search would be performed and once the entry was found, the leaf would be deleted.

For example, to delete File-9 from the B-Tree would result in Figure 11.

figure-11-jpg.313

FIGURE 11

Now, let’s look at what happens if we remove File-5. File-5 can easily be removed, but it is also a key. Here the key would be removed as well. Keys (7 and 8) to the right of the key would be moved to the left. Any leaf nodes not being removed (6) would be moved to be with the leaves (3) to the left as well as shown in Figure 12.

figure-12-jpg.314

FIGURE 12

If we removed File-30, the same would happen, but the key in the root would change to the new Key-1 for the node as shown in Figure 13.

figure-13-jpg.315

FIGURE 13

If we also removed File-40, the last node would be removed as well as Key-3 in the root node as shown in Figure 14. The leaves 22 and 27 can be moved to the left node.

figure-14-jpg.302

FIGURE 14

The difference between the B-Tree and B+Tree is that a B+Tree allows for data to be stored in the leaves only, while a B-Tree can store data in the Nodes. B+Trees can also store keys with the same data to allow for redundant data, but B-Trees cannot do this.

Note: Another type of Tree is the H-Tree. The H-Tree is the same as a B+Tree except that the keys are not a file name, directory name or whatever is being searched, but the keys are hashes. A hash is made of the key being placed into the H-Tree.

Sursa: Trees, B-Trees, B+Trees and H-Trees | Linux.org

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...