You are on page 1of 6

Moving on to another perspective of looking at problems and trying to solve them; I shall try to

address another main concern with the STT-Rams. The challenge of high write latency. While
Shalins perspective dealt with reduction of high cost writes with a hybrid architecture and
implementing EWT scheme by removing redundant bits at the pre-write stage; I shall be dealing
with special characteristic of STT-Rams, the Non-Volatility, to address the write latency
problem. In short, to mitigate both the dynamic energy to some extent and address write latency
problems aspects of STT-RAM that allow us to approach the characteristics of SRAM. There are
several key ideas here. I shall be building in a modular way so that each part fits reasonably
enough to understand.

Lets start by looking at a relative comparison from the paper that presented this idea [3]:

The dotted line represents the normalized optimum performance. The black line represents the
performance outline for SRAMs. We find that the SRAM still leads the way in terms of lesser
dynamic energy spent per read/write operation. Also, endurance has reached near perfect levels
with SRAMs due to continued research on existing tech.

However, parameters like retention time, on-chip density, leakage current are the major issues
that affect the performance of an SRAM.

The two major issues that plague the stt-rams are high write energies and significantly slower
write speed than SRAMs.

The first idea is to significantly relax the non-volatility of the device. To do that we can simply
reduce the surface area of the free layer MTJ which is essentially the storage for STT-Ram.

To provide a quick idea:


The retention time of a MTJ is a characterization of the expected time until a random bit-flip
occurs and is determined by the thermal stability of the MTJ. High stability indicates the cell is
unlikely to suffer from random bit-flips but makes it more difficult to write, requiring either
higher currents or more time or both. The stability is estimated by the thermal factor, , which
is calculated from Equation 1 using the volume (V), the in-plane anisotropy field (Hk), the
saturation magnetization (Ms), and the absolute temperature in kelvin (T).

The setup for the enhancement is as shown:


The simulated results for the enhancements showing that while Read Latency could possibly be
scaled at having better results than even SRAMs however at the cost of performance:
STT-RAM cannot match the write latency of SRAM. To improve those results, an idea is
leveraged: the latency variations less than one clock cycle will not affect performance, as most
caches operate synchronously with the processor's clock. So, to match the read capacity of an
SRAM design, only the cycle-based read latency needs to be the same. The idea goes on to
describe two procedures that improve on write performance while at the same time matching and
even exceeding read performances of SRAMs.

So essentially through the simulation process, the maximum read and write performances are
measured, while negatively affecting the other. Then they make use of Pareto optimization
technique to settle for an optimal point for reference [5]. Overall, this procedure has one cycle
less read latency than the write optimized design for SRAM and reduces the effective write
latency by three cycles compared to native SRAM approach with similar configurations [3].

However, my focus is on improvement of write latencies and reduction in dynamic energy, I


shall consider the write-optimized STT-RAM design if the read performance is identical.

To introduce the second idea a brief understanding of the performance pitfalls is required. To
keep pace with the technology scaling, the simulations were carried out on aggressive sizing of
die, starting from existing sizes to future production level much smaller sizes. This meant as the
size of the MTJ decreases, the voltage across the MTJ increases, thereby the required switching
current also rises exponentially as the size is reduced. All which in turn reduces memory
retention as non-volatility reduces on volume reduction. For the most aggressive sizing, the
retention time obtained was 26.5s.; which may not be long enough to retain data in L1 cache
systems whether in standalone structure [3] or Hybrid structure [2].
Due to this, correctness issues of bit flips may arise. To counter this, a simple DRAM styled
scheme was proposed [3]. However, it has a drawback: the assumption that every refresh
scenario detects an error and causes a writeback. Which essentially means a delay in the form of
refresh interval be introduced which will refresh (as in rewrite the data in every line is re-
written) within the retention time.

So, to overcome this, the use of Dynamic refreshing scheme is proposed for both hybrid and
standalone designs. The DRAM styled refresh scheme used in the native design would mean that
every block of data is refreshed without regards to the content inside. Also, additionally another
delay might be introduced wherein any Read/Write operation must be stalled during the refresh
operation.

To eliminate unnecessary refresh, we use SRAM counters to track the lifespan of cache data
blocks. Refresh is performed only on cache blocks that have reached their full lifespan of data. A
data counter is assigned to each data block in the L1 cache to monitor its data retention status.
Only 512 4-bit counters are needed in a 32K bytes L1 cache with 64-byte data blocks. In other
words, a negligible overhead of less than 1 percent [2].
During data refresh operation, the blocks data is read out into a buffer, and then saved back to
the same cache block. If a read request to the same cache block comes before the refresh
finishes, the data is returned from this buffer directly. There is therefore no impact on the read
response time of the cache. If there is WRITE request, the refresh operation is terminated
immediately, and the write request is executed. Again, no penalty is introduced. Also, the refresh
interval of the data next written into the same cache block is shortened.

References:

[1] Architecture design with STT-RAM: Opportunities and challenges


Ping Chi ; Shuangchen Li ; Yuanqing Cheng ; Yu Lu ; Seung H. Kang ; Yuan Xie
http://ieeexplore.ieee.org.pitt.idm.oclc.org/document/7427997/

[2] Multi retention level STT-RAM cache designs with a dynamic refresh scheme
Zhenyu Sun, Xiuyuan Bi, Hai (Helen) Li, Weng-Fai Wong, Zhong-Liang Ong, Xiaochun Zhu,
Wenqing Wu https://dl-acm-org.pitt.idm.oclc.org/citation.cfm?id=2155659

[3] Relaxing non-volatility for fast and energy-efficient STT-RAM caches


Clinton W. Smullen ; Vidyabhushan Mohan ; Anurag Nigam ; Sudhanva Gurumurthi ; Mircea
R. Stan
http://ieeexplore.ieee.org.pitt.idm.oclc.org/document/5749716/

[4] STT-RAM vs. SRAM/eDRAM and Efficiency Analysis between Differing Cache
Configurations
Brandon Dziewior
http://cal.ucf.edu/3801-reports_summer_2016/STT-RAM_efficiency.pdf

[5] Reading on Pareto : https://web.stanford.edu/group/sisl/k12/optimization/MO-unit5-


pdfs/5.8Pareto.pdf

[6] Energy Reduction for STT-RAM Using Early Write Termination


Ping Zhou, Bo Zhao, Jun Yang, Youtao Zhang
https://dl-acm-org.pitt.idm.oclc.org/citation.cfm?id=1687448

You might also like