You are on page 1of 3

COSC 757 - Research Article Summary for:

Efficient frequent pattern mining based on Linear Prefix


Tree
Published: 24 October 2013 in Knowledge-Based Systems
Dave Borncamp
This article seeks to address the problem of extremely efficient frequent pattern
mining algorithms. As many algorithms are not efficient for both memory and runtime,
efficiency can become a huge concern large datasets. Since datasets in all fields continue
to grow larger, this will become a bigger and bigger concern for anyone who wishes to
find patterns in their data. This article proposes a new approach to basic data structures
that melds the Apriori and FP-growth algorithms that can optimize both memory usage
and runtime efficiency. And it provides general algorithms to be implemented for
retrieving the data from this new structure.
Many other algorithms and data mining structures gratuitously use pointers and
superfluous data structures making them effective but inefficient. The proposed approach,
termed Linear Prefix Tree (LP-tree) because it has a similar structure to a tree, creates
data arrays as nodes to minimize pointers between nodes and uses the minimum amount
of information required for mining. While many other approaches uses similar tree
structures, the proposed approach allows for linear access to corresponding nodes which
means less memory for storing data as pointers to other data which corresponds to fewer
CPU cycles to find the necessary data.
As the name would suggest, LP-trees have a tree structure consisting of 3 parts;
the header list, the linear prefix node and the branch node list. The header list contains the
item names, number of occurrences and node links. The Linear Prefix Node (LPN) stores
frequent items of each transaction and a separate header for the node. There can be
multiple internal nodes and headers in the top node allowing related data to be easily
accessed. The LPN has this structure:
LPN = {[Parent Link], [i1; S; L; b], [i2; S; L; b] . . . ; [in; S; L; b]}
The Branch Node List (BNL) includes information on its child nodes. The BNL is able to
manage all of the branch nodes when they have child nodes. The whole LP-tree takes this
form when there is only one child node:
LP _ tree = {Headerlist; BNL; LPN1; LPN2; . . . ; LPNc}
If the tree has more than one level of nodes its structure can be visualized as Error:
Reference source not found.

The algorithm presented for accessing this new data


structure first selects the bottom item from the header and
traverses the tree to the appropriate node. Each subsequent
node can be accessed directly if the search is contained
within one LPN, thereby increasing the efficiency of the
data retrieval. It then moves to other nodes and continues
looking for patterns. Once the header refers to the root
node, the growth operation stops as the path has already
been searched completely.
This paper then evaluated the performance of the new data
structure and algorithm compared with other "state-of-theart" algorithms with all algorithms written in C++ and
running on the same system. As described earlier, the LP-growth uses a linear structure to
its trees to minimize access times to search nodes. As shown in figure 2, this has a
positive effect on reducing runtime and memory, especially as the minimum support
threshold becomes lower.

Figure 1 - LP-tree with more


than one level of nodes.

Overall, this new structure and algorithm does have improved runtime and
memory performance compared to other pattern matching algorithms. Because it uses a
linear array and fewer pointers
it is able to significantly
improve performance. While
the scope of testing in this
paper is somewhat limited, it
should be able to be applied to
other types of pattern matching.
This paper could be
improved first by having the
authors re-reading the paper,
there were a few spelling and
grammatical mistakes that kind
of took away from the point the
article was trying to convey.
The paper could also expand
their test sets to include closedmaximal pattern matching as
well as expanding the proposed
structure and algorithm to
cover top-k pattern mining, and
graph mining. Taken as a
whole, this was an interesting
paper as it proposes something
relatively simple to solve a
Figure 2 Runtime and memory test for the Kosarak dataset.

complex problem. It seems obvious that both memory and runtime could be optimized by
reducing the number of pointers and structures used within an algorithm, this approach
takes that simple idea, implements it and tests it to confirm this result.

References
Gwangbum Pyun, Unil Yun, Keun Ho Ryu, Efficient frequent pattern mining based on
Linear Prefix tree, in: Knowledge-Based Systems 55 (2014), pg 125139

You might also like