Efficient Analytical Modeling of Whole-Program Data Cache Behavior

Efficient and Accurate Analytical
Modeling of
Whole-Program Data Cache
Behavior

Published in IEEE TRANSACTIONS ON
COMPUTERS

Authors:
Jingling Xue
Xavier Vera
Program Model
References A reference is a static read or write
in the program.
Memory Accesses A memory access is the
execution of a reference at a particular iteration
of the loop nest enclosing the reference.
Affine Expressions An expression in a program
is affine if it has the form c
1
I
1
+ . . . + c
n
I
n
+ b,
where I
1
; . . . ; I
n
are the loop variables of the n
enclosing loops (if any) and c
1
; . . . ; c
n
; b are
compile-time or runtime constants.

Scope of Program
The calls are non recursive.
The bounds of all loops are affine.
The IF conditionals guiding array
references are affine.
The subscript expressions of array
references are affine.

Cache Model
a cache configuration is identified as a
triple (C, L, K).
A memory line refers to a cache-line-
sized block in the memory, while a cache
line refers to the actual block in which a
memory line is mapped
Reuse Self/Group : Temporal / Spatial
Element size E
Compilation Model
The base addresses of all arrays are
known statically. If A is an array variable,
the notation @A stands for its base
address.
The sizes of all arrays in all but their last
dimensions are known statically
Load/Store IR is generated to perform
optimization on allocation of registers to
index variables.
Compilation Model
Program Analysis
In a IK-way set-associative cache with LRU
replacement, these two statements are true.
1) A memory access m
a
to memory line l is a cold
miss if l is accessed for the first time.
2) Let m
b
be the most recent previous access
(MRPA) also to l. Then, ma is a replacement
miss if there are IK or more distinct memory
lines that are accessed between m
a
and m
b
that
are also mapped to the same cache set as and
a hit otherwise.

Abstract Inlining
How CALL foo() is inlined ?
Base address @A=A(1,1,..,1)
Abstract Inlining
Loop Nest Normalization
Loop sinking to move all statements in
their innermost loops.
Using ifs (with U=L) make all loop nests N
dimensional.
All loop variables at depth I are normalized
to I
k
Memory Access Vectors
Loop Vectors Each n-dimensional loop nest is identified
by the loop vector of its innermost loop, (l
1
; . . . ; l
n
),
where l
k
means that the kth loop of the nest is the l
k
th
(counted from 1) among all loops enclosed in the (k-1)st
loop.
Iteration Vectors The execution of an n-dimensional
nest when I
1
= i
1
; . . . ; I
n
= i
n
, known as an iteration, is
identified by the iteration vector (i
1
; . . . ; i
n
).
Memory Access Vectors Let R be a reference in the
nest L= (l
1
; . . . ; l
n
). The access of R at the iteration I= (i
1
;
. . . ; i
n
) of the nest is identified uniquely by the memory
access vector (l
1
; i
1
; . . .. . . ; l
n
; i
n
,)
Example
Memory Accesses
Lexicographic order of access vectors
M possesses the (strict) total order <
which is known as the lexicographic or
dictionary order. This order specifies
statically the temporal order in which all
accesses in the program are executed. In
other words, the accesses in M ordered by
< give rise to precisely the address trace
of the program at compile time.

Notations
Notation
The reuse analysis is concerned with finding a
good approximation of the function ipred
R
for
every reference in the program

a and ipred
R
(a) play exactly the roles that the
access m
a
and its MRPA m
b
respectively
Constructing Ipred
R,R
i
k
are symbolic constants
j
k
are constraint variables
Involves PIP
Constructing MRPA
Uniformly Generated References Let A be an
m-D array. Two references A(H
1
I+c
1
) and
A(H
2
I+c
2
) (inside the same or distinct n-
dimensional nests) are uniformly generated if H
1

= H
2
, where I= (i
1
; . . . ; i
n
).
Uncoupled References A reference A(HI+c) or
H is uncoupled if each row of H has at most one
non zero component, where I= (i
1
; . . . ; i
n
).

More Assumptions
Only Uniformly generated References are
handled.
Two memory access vectors cannot touch
same memory line when their third indices
differ i.e C < 3
No pointers , all call parameters pass by
reference (Fortran 77)

Underlining Theory
L/E is the number of elements in a memory line.

The following conclusions are a direct
consequence of this definition:

Underlining Theory
Non Linear Constraint
ml
R
(a) = ml
R
(b)
to
ma
R
(a) - ma
R
(b)= l where |l| < L/E

which is a linear constraint.

They define C to be the maximum number of
array columns spanned by a memory line.

Based on the above notations we have:

Due to assumption on values C can take we have:
From the above equations, we obtain

Cache Miss Specification
Cache Miss Specification
The access a of R is a cold miss if
MRPA
R
(a) = NULL

If MRPA
R
(a) != NULL . The access a of R
is a replacement miss if has at least K
distinct solutions m
1
; . . .;m
K
and a hit
otherwise.

Results
Comparison with our approach
Xue et al calculate MRPA corresponding
to each memory access for cache miss
specification.
We compute cache miss specification for
each memory reference corresponding to
each cache line.
Both the formulations are based on the
concepts of memory access vectors which
determine the order of memory accesses.

THANK YOU

Efficient Analytical Modeling of Whole-Program Data Cache Behavior

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Efficient Analytical Modeling of Whole-Program Data Cache Behavior

Uploaded by

Copyright:

Available Formats

Efficient and Accurate Analytical

You might also like