Data Structure Trees Explained

1 In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.
Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at most one parent node. Furthermore, the children of each node have a specific order. A node is a structure which may contain a value, a condition, or represent a separate data structure (which could be a tree of its own). Each node in a tree has zero or more child nodes, which are below it in the tree (by convention, trees are drawn growing downwards). A node that has a child is called the child's parent node (or ancestor node, or superior). A node has at most one parent. An internal node (also known as an inner node or branch node[1]) is any node of a tree that has child nodes. Similarly, an external node (also known as an outer node, leaf node, or terminal node) is any node that does not have child nodes. The topmost node in a tree is called the root node. Being the topmost node, the root node will not have a parent. It is the node at which operations on the tree commonly begin (although some algorithms begin with the leaf nodes and work up ending at the root). All other nodes can be reached from it by following edges or links. (In the formal definition, each such path is also unique). In diagrams, it is typically drawn at the top. In some trees, such as heaps, the root node has special properties. Every node in a tree can be seen as the root node of the subtree rooted at that node. A free tree is a tree that is not rooted. The height of a node is the length of the longest downward path to a leaf from that node. The height of the root is the height of the tree. The depth of a node is the length of the path to its root (i.e., its root path). This is commonly needed in the manipulation of the various self balancing trees, AVL Trees in particular. Conventionally, the value 1 corresponds to a subtree with no nodes, whereas zero corresponds to a subtree with one node. A subtree of a tree T is a tree consisting of a node in T and all of its descendants in T. (This is different from the formal definition of subtree used in graph theory.[2]) The subtree corresponding to the root node is the entire tree; the subtree corresponding to any other node is called a proper subtree (in analogy to the term proper subset). [edit] Representations There are many different ways to represent trees; common representations represent the nodes as dynamically allocated records with pointers to their children, their parents, or both, or as items in an array, with relationships between them determined by their positions in the array (e.g., binary heap). [edit] Trees and graphs The tree data structure can be generalized to represent directed graphs by removing the constraints that a node may have at most one parent, and that no cycles are allowed. Edges are still abstractly considered as pairs of nodes, however, the terms parent and child are usually replaced by different terminology (for example, source and target). Different implementation strategies exist, for example adjacency lists. In graph theory, a tree is a connected acyclic graph; unless stated otherwise, trees and graphs are undirected. There is no one-to-one correspondence between such trees and trees as data structure. We can take an arbitrary undirected tree, arbitrarily pick one of its vertices as the root, make all its edges directed by making them point away from the root node - producing an arborescence - and assign an
2 order to all the nodes. The result corresponds to a tree data structure. Picking a different root or different ordering produces a different one. [edit] Traversal methods Main article: Tree traversal Stepping through the items of a tree, by means of the connections between parents and children, is called walking the tree, and the action is a walk of the tree. Often, an operation might be performed when a pointer arrives at a particular node. A walk in which each parent node is traversed before its children is called a pre-order walk; a walk in which the children are traversed before their respective parents are traversed is called a post-order walk; a walk in which a node's left subtree, then the node itself, and then finally its right sub tree are traversed is called an in-order traversal. (This last scenario, referring to exactly two subtrees, a left subtree and a right subtree, assumes specifically a binary tree.) A level-order walk effectively performs a breadth-first search search over the entirety of a tree; nodes are traversed level by level, where the root node is visited first, followed by its direct child nodes and their siblings, followed by its grandchild nodes and their siblings, etc., until all nodes in the tree have been traversed. Representations There are many different ways to represent trees; common representations represent the nodes as dynamically allocated records with pointers to their children, their parents, or both, or as items in an array, with relationships between them determined by their positions in the array (e.g., binary heap). What are the applications of trees in data structures? Using binary tree, one can create expression trees. The leaves of the expression tree are operands such as constants, variable names and the other node contains the operator (binary operator). this... B Trees B Trees were described originally as generalizations of binary search trees , where a binary tree is a 2node B-Tree, the 2 standing for two children, with 2-1 = 1 key separating the 2 children. Hence a 3node has 2 values separating 3 children, and a N node has N children separated by N-1 keys. A classical B-Tree can have N-node internal nodes, and empty 2-nodes as leaf nodes, or more conveniently, the children can either be a value or a pointer to the next N-node, so it is a union. The main idea with B-trees is that one starts with a root N-node , which is able to hold N-1 entries, but on the Nth entry, the number of keys for the node is exhausted, and the node can be split into two half sized N/2 sized N nodes, separated by a single key K, which is equal to the right node's leftmost key, so any entry with key K2 equal or greater than K goes in the right node, and anything less than K goes in the left. When the root node is split, a new root node is created with one key, and a left child and a right child. Since there are N children but only N-1 entries, the leftmost child is stored as a separate pointer. If the leftmost pointer splits, then the left half becomes the new leftmost pointer, and the right half and separating key is inserted into the front of the entries. An alternative is the B+ tree which is the most commonly used in database systems, because only values are stored in leaf nodes, whereas internal nodes only store keys and pointers to other nodes, putting a limit on the size of the datum value as the size of a pointer. This often allows internal nodes with more entries able to fit a certain block size, e.g. 4K is a common physical disc block size. Hence , if a B+ tree internal node is aligned to a physical disc block, then the main rate limiting factor
3 of reading a block of a large index from disc because it isn't cached in a memory list of blocks is reduced to one block read. A B+ tree has bigger internal nodes, so is more wider and shorter in theory than an equivalent B tree which must fit all nodes within a given physical block size, hence overall it is a faster index due to greater fan out and less height to reach keys on average. Apparently, this fan out is so important, compression can also be applied to the blocks to increase the number of entries fitting within a given underlying layer's block size (the underlying layer is often a filesystem block). Most database systems use the B+ tree algorithm, including postgresql, mysql, derbydb, firebird, many Xbase index types, etc. Many filesystems also use a B+ tree to manage their block layout (e.g. xfs, NTFS, etc). Transwiki has a java implementation of a B+ Tree which uses traditional arrays as key list and value list. Below is an example of a B Tree with test driver, and a B+ tree with a test driver. The memory / disc management is not included, but a usable hacked example can be found at Hashing Memory Checking Example. This B+ tree implementation was written out of the B Tree , and the difference from the transwiki B+ tree is that it tries to use the semantics of SortedMap and SortedSet already present in the standard Java collections library. Hence , the flat leaf block list of this B+ implementation can't contain blocks that don't contain any data, because the ordering depends on the first key of the entries, so a leaf block needs to be created with its first entry.

Data Structure Trees Explained

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Structure Trees Explained

Uploaded by

Copyright:

Available Formats

1 In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.

You might also like