-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathd8-trees.tex
236 lines (207 loc) · 8.62 KB
/
d8-trees.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
\section{Trees}
\label{sec:trees}
When we use a list for storing our data in a dynamic structure,
there are two limitations:
\begin{itemize}
\item Finding an element in the list takes a long time, proportional
to the length of the list.
\item Keeping the list's elements sorted also takes a lot of
time: you must find the place where the new element should be, and
then insert it.
\end{itemize}
Both seeking an element and inserting an element in a list require a
time that is proportional to the length of the list. They will take
(in average) twice as long in a list twice as big.
We have seen that \emph{maps} provide a partial solution to the first
problem. By assigning a value with a key, they can access any element
immediately. However, maps have some problems too. Either you store
the keys on an array that you can access by index (i.e.~a hash of the
key) ---which is not dynamic---
or you need a linked list of keys ---which has similar problems: it
may require a lot of time to find a key---.
Generally speaking, maps are used for relatively low numbers of
elements. When the number of elements is very big, and it
is important to have them sorted, there is a better data structure:
the tree.
A tree in computing is similar to a tree in nature because it has a
root and it has branches. Like a linked list, a tree is composed of
elements. Elements where branches start are usually called
\emph{nodes} and elements at the end of branches are usually
\emph{leaves}. In some trees you have data on nodes and leaves, while
in others you have data only on leaves.
Trees are very important in computing. For example, the filesystem on
your hard disk, where you have stored all your files (including this
file you are reading now, your personal folders/directories, etc) has
a tree structure: folders are nodes and files are leaves.
Trees have a similar structure to lists, stacks, and queues; the main
difference is that each element points to two or more elements instead
of one. See this example:
\begin{figure}[hbtp]
\centering
\includegraphics[width=\textwidth]{gfx/list-tree}
\caption{In a singly-linked list, every element is connected to the
next one. In a binary tree, every element is connected to two next
elements. It is usual to draw trees going down (like when we
write in English) rather than going up (as trees in Nature).}
\label{fig:listree}
\end{figure}
\begin{verbatim}
public class IntegerListNode {
int value;
IntegerListNode next;
// ... methods would be here
}
(...)
public class IntegerTreeNode {
int value;
IntegerTreeNode left;
IntegerTreeNode right;
// ... methods would be here
}
\end{verbatim}
Trees where every node links to two other nodes are called
\emph{binary trees}. Binary trees are the most common types of
trees, but we can create trees of any cardinality (see the examples
below). Although we are going to focus on binary
trees in this section, everything we will learn can be applied to
any type of tree.
{\small % If not small, does not fit
\begin{verbatim}
public class IntTernaryTreeNode { public class IntArbitraryTreeNode {
int value; int value;
IntegerTernaryTreeNode left; IntArbitraryTreeNode[] children;
IntegerTernaryTreeNode center; // ... methods would be here
IntegerTernaryTreeNode right; }
// ... methods would be here
}
\end{verbatim}
}
\subsection{Adding elements to a tree}
\label{sec:adding-elements-tree}
Trees have a structure that makes it very easy to keep the data
sorted. This is important because it is faster to find a specific
piece of data when everything in order..
When we want to add a new element to a tree we start at the root. Then
we check whether we want to add it to the right or to the left. Then
we continue the process on that branch. Let's see an example based on
class \verb+IntegerTreeNode+ above.
\begin{verbatim}
public void add(int newNumber) {
if (newNumber > this.value) {
if (right == null) {
right = new IntegerTreeNode(newNumber);
} else {
right.add(newNumber);
}
} else {
if (left == null) {
left = new IntegerTreeNode(newNumber);
} else {
left.add(newNumber);
}
}
}
\end{verbatim}
You can see that the process is symmetrical: we decide to go left or
right based on the new number to add, and then we continue on that
side of the tree. This leaves the tree automatically sorted: ``lowers''
on the left, ``highers'' on the right, at every level of the tree. If
there is nothing in the branch, we add our new element; otherwise, we
compare again.
Note that the first element of the tree (i.e.~the
root) may need to be handled as a special case (as with
lists). Figure~\ref{fig:treecr} shows how a tree is created by adding
some new nodes.
\begin{figure}[hbtp]
\centering
\includegraphics[width=\textwidth]{gfx/tree-creation}
\caption{Creation of a tree based on the class IntegerTreeNode and
its method add(int). The numbers are added unsorted as 6, 9, 5, 3,
11, 12, 8\ldots}
\label{fig:treecr}
\end{figure}
\subsection{Finding elements in a tree}
\label{sec:find-elem-tree}
The good thing about trees is that we do not need to go over the whole
list of elements to find the element we are looking for. We only need
to check at each node whether the node we are looking for is (a) in our
current node, or (b) under it to the left, or (c) under it to the right. If
nodes do not contain data, we just need to check whether the leaf we
are looking for is on the left or on the right. Look at the method
below, which checks whether a number has been added to a tree based on
class \verb+IntegerTreeNode+.
\begin{verbatim}
public boolean contains(int n) {
if (n == this.value) {
return true;
} else if (n > this.value) {
if (right == null) {
return false;
} else {
return right.contains(n);
}
} else {
if (left == null) {
return false;
} else {
return left.contains(n);
}
}
}
\end{verbatim}
There is no need to look at all the elements
(Figure~\ref{fig:fhsdhshgfhjsdg}). After every check, we
discard half of all remaining elements (assuming that the tree is
balanced, i.e.~has approximately the same number of elements under left
branches than under right branches). If the tree contains 1000
numbers, after the first check we have discarded 500, then we discard
250, then 125, etc. In other words, the time needed to find an element
in a tree is proportional to the logarithm (in base 2) of the size of
the tree\footnote{A number $l$ is the logarithm of $x$ in base $a$ if
$a^l = x$.}. Finding an element in a list of 15 elements will take at
most 15 comparisons, but if the list has 255 elements it will take at
most 255 comparisons; by comparison,
finding an element in a tree of 15 elements will
need at most 4 comparisons, but if the tree has 255 elements it will
take at most 8 comparisons. This is a real improvement for big amounts
of data!
\begin{figure}[hbtp]
\centering
\includegraphics[width=\textwidth]{gfx/tree-find}
\caption{Finding elements in a tree. Every time we check an element
to find the one we are looking for, we discard half of the
remaining elements. In this example, finding the element
containing number 8 requires only three comparisons.}
\label{fig:fhsdhshgfhjsdg}
\end{figure}
\subsection{Conclusion}
\label{sec:conclusion}
Tress are a useful data structure to keep data sorted with minimal
effort, in a simple and efficient manner. Having the data sorted at
all times makes it easier to find what we are looking for. On the
other hand, trees are more complex data structures than lists
so if the data does not need to be sorted, a list is preferrable.
\begin{table}[hbtp]
\centering
\begin{tabular}{|l|c|c|}
\hline
& List & Tree \\
\hline
Min. pointers / element & 1 & 2 \\
Auto-Sorted & No & Yes \\
Time needed & Linear & Logarithmic \\
\hline
\end{tabular}
\caption{Lists vs. Trees: lists are not sorted
while trees are always sorted; time to find an item in a list grows
linearly with the list's size, while the time required to find an
element in a tree grows logarithmicly with the size of the tree.
(Lists may be sorted, but it needs special insertion methods or
frequent re-sorting).
}
\label{tab:treelist}
\end{table}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: