Professional Documents
Culture Documents
The complete range of next generation graphics hardware from ATI features native PCI Express
support. This means the PCI Express interface is integrated into the VPU, rather than being
supported on an external chip (sometimes referred to as a “bridge”). Native PCI Express support is
necessary to expose the full available bandwidth in both the upstream and downstream directions
simultaneously. Applications can take advantage of this capability for improved performance in a
number of different applications, such as video editing, video capture, simulation, and games.
However, AGP upstream transfers to system memory are generally required to be cacheable to
achieve acceptable performance levels. This requires a large number of additional transactions to
maintain cache coherence, and limits upstream transfers to using PCI semantics, with a peak
throughput of just 266 MB/sec. Frequent switching between upstream and downstream modes also
introduces additional latency into AGP transfers, further degrading performance.
PCI Express uses two uni-directional links, one for upstream transfers and one for downstream
transfers. For PCI Express x16, each link consists of 16 lanes, providing a peak theoretical
bandwidth of 4 GB/sec in each direction. Thus, PCI Express offers significantly higher bandwidth in
both the upstream and downstream directions than AGP 8X. Furthermore, it supports full duplex
transfers (i.e. both directions simultaneously), while AGP can only support transfers in one direction
at a time. Note that this important capability is only supported in native PCI Express designs (see
Figure 1).
-1-
Downstream
4 GB/sec
Upstream
4 GB/sec
Video capture applications can use this to record video from a capture port to a file on the hard drive
without monopolizing the bus. Because this involves a steady stream of data transfer in the upstream
direction, it can seriously interfere with downstream transactions on AGP systems, such as those
required for playing games (see Figure 2). PCI Express is capable of gracefully handling both of
these tasks at the same time, making possible true background recording.
-2-
take advantage of PCI Express for more optimal performance. Figure 4 shows illustrates the data
flows in this type of system.
System
Memory
VPU
Vertex
Processing
Engine CPU
North
Bridge
Pixel
Processing
Engine
Fast simultaneous
upstream/downstream
transfers required
60
Millions of Users
50
40
30 Users
20
10
0
2001 2002 2003 2004 2005 2006
-4-
The latest revolution in consumer video technology is the transition to High Definition (HD) video,
which is raising the bar and setting new standards for visual fidelity. The two key characteristics of
HD video that separate it from Standard Definition (SD) video are its much higher resolutions and
wide screen aspect ratios.
The most common HD formats are 720p (1280x720 progressive) and 1080i (1920x1080 interlaced),
both of which use a 16:9 aspect ratio. In contrast, analog SD video uses a 480i (640x480 interlaced)
format, and digital SD supports a progressive version of this resolution, both using a 4:3 aspect ratio.
Progressive format video updates each scan line in sequence, while interlaced video updates the odd
scan lines on one pass, and the even scan lines on the next. With up to 7x the number of pixels and
a 33% wider display than SD, HD video provides crystal clear images (see Figure 5).
Over 99% of U.S. TV households are currently able to receive digital TV signals. All of the major U.S.
networks are airing most of their top-rated programs in HD, and FCC legislation requires that all
broadcasters provide a digital signal by 2006. These trends have driven consumers to spend record
amounts on new HD digital video equipment. Soon, HD will replace SD just as color television
replaced black & white, and will be a market requirement for all new video devices.
For PCs, the problem is that handling HD content has much higher system requirements than SD
content. The greater amounts of data involved require significantly greater bandwidth than most of
today’s PCs can provide. For example, a 1080p HD video stream requires 12 times as much
bandwidth as a 480i SD video stream, and in fact exceeds the maximum available upstream
bandwidth of the AGP 8X interface (see Figure 6).
MPEG-2 or MPEG-4 compression can be used on HD video streams to make the bandwidth
requirements more manageable, but this then requires large of amounts of processing power to
perform the necessary encoding and decoding. These requirements place heavy demands on the
CPU, the front side bus (system memory), the graphics processor and the graphics bus interface.
Without a system architecture optimized to move and process HD data, users will not be able to fully
experience the benefits of the HD revolution on their PCs.
-5-
Megabytes per Second of Video
600
500
400
300
200 AGP Upstream
Bandwidth Limit
100
0
Analog 480i 480p 720p 1080i 1080p
HD video editing is a particularly resource intensive task. In a typical usage scenario, a HD video
camera would stream recorded video into a PC via a FireWire or USB interface, where it would be
stored on a hard drive. The CPU would then read the video stream into system memory, decode it,
and pass the uncompressed video data across the graphics bus interface to the VPU, which displays
it on a display device. Using a software application running on the CPU, the user then edits in special
effects. The necessary commands are passed to the VPU, which processes the video stream and
passes the edited version back to system memory. The CPU then compresses this data and stores it
back on the hard drive. The complex set of data transfers required for this process are illustrated in
Figure 7.
-6-
Simultaneous upstream
and downstream
transfers CPU
VPU
System
North Memory
Bridge
South
Bridge
1394/USB Hard Drive
Port
HD Video Camera
Other possible usage scenarios involve combining data from multiple streams during the editing
process, or editing while simultaneously capturing video to disk. Tasks like these can involve multiple
simultaneous HD streams going through the paths described above, further exacerbating the
limitations of AGP systems. Fortunately, PCI Express systems will provide the necessary capabilities
to make these types of applications a reality.
The goal for the next generation of video editing applications is real time editing of HD content. “Real
time” is typically considered to be 30-60 frames per second, so all of the necessary processing must
take 30 milliseconds or less to complete. Since AGP does not have enough upstream bandwidth to
support HD video transfers, the CPU is forced to do all of the work and then pass the final output
downstream across the AGP bus for display. In this case, the CPU must simultaneously
decompress, edit, blend, and compress multiple video streams.
Even today’s fastest multi-threaded CPUs are not capable of handling these tasks at real time frame
rates. To make this possible, the VPU must help out by handling the editing and blending, as well as
by accelerating the encoding and decoding processes. This requires that multiple HD streams be
-7-
passed back and forth between the CPU and the VPU. In other words, it is an ideal situation for PCI
Express systems.
The first Pinnacle Systems demonstration uses several digital HD video streams at 720p resolution.
One can be streamed in from a HD video camera, while the others are streamed off of the hard disk.
The streams are compressed in MPEG-2 format, and must first be decoded by the CPU. The
resulting uncompressed frames are written temporarily into system memory, and then streamed into
the VPU where they are combined and have a variety of 3D effects applied. The final output frames
are then simultaneously displayed on the screen and streamed back to the CPU, where they are
encoded and written to the hard drive.
On a system with a Intel 3.0 GHz CPU and ATI’s latest PCI Express graphics hardware, the Pinnacle
application was able to handle up to 4 simultaneous HD video streams while maintaining interactive
frame rates of over 30 frames per second. When run using a similarly equipped AGP 8X system, the
frame rate drops well below real-time levels (see Figure 8). Note the small difference between the
AGP 4X and AGP 8X systems; this clearly shows that PCI Express provides much greater benefits
than a simple increase in AGP bandwidth.
Pinnacle Demo 1
• Four 720p video streams
• Real-time 3D effects, including
read-back to hard drive
Frames
per Second
35 38% Required real-
real-time
frame rate
30
30.2
25 8%
fps AGP Systems:
Intel Grantsdale with Prescott 3.0 GHz CPU
20 20.3
21.8 DDR400 memory
ATI RADEON 9600 PRO graphics
fps
fps
15 PCI Express System:
Intel i865 with Prescott 3.0 GHz CPU
10 DDR 400 memory
ATI PCI Express graphics
(equivalent to RADEON 9600 PRO)
5
-8-
The second Pinnacle demonstration was designed to isolate graphics bus performance from CPU
performance. This was done by replacing the HD video streams with high resolution bitmaps, which
do not require any decoding or encoding. The result is that the PCI Express system was able to
show up to run at up to double the frame rate of the comparable AGP 4X system in this test (see
Figure 9). Once again, the difference between AGP 4X and AGP 8X is relatively small.
Pinnacle Demo 2
• Four 720p bitmap streams
• Real-time 3D effects, including
read-back to hard drive
These applications represent one of the first demonstrations of how PCI Express will change the PC
landscape over the next few years. PCI Express systems will become an integral component of the
digital home of the future.
-9-