You are on page 1of 8

DELIVERING THE NEXT GENERATION OF

VIDEO PERFORMANCE WITH


PCI EXPRESS™
Introduction
The year of 2004 will mark the most significant update to the PC architecture of the past decade. PCI
Express is a new I/O technology designed to allow computer systems to scale to new levels of
performance. In addition to higher frequencies and bandwidth, it will offer many advantages over
existing AGP/PCI architectures, including simplified PCB routing, advanced power management,
robust error recovery, hot plugging and swapping, virtual channels and traffic classes. These
capabilities will move the PC platform forward by enabling new form factors, innovative designs, and
new classes of applications.

The complete range of next generation graphics hardware from ATI features native PCI Express
support. This means the PCI Express interface is integrated into the VPU, rather than being
supported on an external chip (sometimes referred to as a “bridge”). Native PCI Express support is
necessary to expose the full available bandwidth in both the upstream and downstream directions
simultaneously. Applications can take advantage of this capability for improved performance in a
number of different applications, such as video editing, video capture, simulation, and games.

PCI Express and AGP: Key Differences


The AGP bus consists of a single bi-directional link, which must be shared by both upstream and
downstream data transfers. In this context, upstream refers to VPU Æ system memory transfers, and
downstream refers to system memory Æ VPU transfers. Peak bandwidth is 2.1 GB/sec, assuming
non-snooped, un-cacheable transfers. These are fine for downstream transfers, since the AGP bus
interfaces of graphics chips are designed to handle them efficiently without worrying about cache
coherency.

However, AGP upstream transfers to system memory are generally required to be cacheable to
achieve acceptable performance levels. This requires a large number of additional transactions to
maintain cache coherence, and limits upstream transfers to using PCI semantics, with a peak
throughput of just 266 MB/sec. Frequent switching between upstream and downstream modes also
introduces additional latency into AGP transfers, further degrading performance.

PCI Express uses two uni-directional links, one for upstream transfers and one for downstream
transfers. For PCI Express x16, each link consists of 16 lanes, providing a peak theoretical
bandwidth of 4 GB/sec in each direction. Thus, PCI Express offers significantly higher bandwidth in
both the upstream and downstream directions than AGP 8X. Furthermore, it supports full duplex
transfers (i.e. both directions simultaneously), while AGP can only support transfers in one direction
at a time. Note that this important capability is only supported in native PCI Express designs (see
Figure 1).

-1-
Downstream
4 GB/sec
Upstream
4 GB/sec

Figure 1: Comparison of Bridged and Native PCI Express Implementations

Optimizing Applications for PCI Express


The key to taking advantage of PCI Express performance is to make use of its simultaneous bi-
directional nature.

Video capture applications can use this to record video from a capture port to a file on the hard drive
without monopolizing the bus. Because this involves a steady stream of data transfer in the upstream
direction, it can seriously interfere with downstream transactions on AGP systems, such as those
required for playing games (see Figure 2). PCI Express is capable of gracefully handling both of
these tasks at the same time, making possible true background recording.

-2-
take advantage of PCI Express for more optimal performance. Figure 4 shows illustrates the data
flows in this type of system.

System
Memory

VPU
Vertex
Processing
Engine CPU
North
Bridge
Pixel
Processing
Engine

Fast simultaneous
upstream/downstream
transfers required

Figure 4: CPU/VPU Parallel Processing Model

High Definition Video Editing


Video capture represents the ability to record video from various sources on to a PC in a digital
format. Sources can include digital cameras & camcorders, VCRs, DVDs, cable & satellite receivers,
and over-the-air broadcasts. These sources are connected to the PC using a variety of interfaces
such as USB, Firewire, and PCI or AGP capture cards. Once on the PC, digital video can be edited
to add a variety of special effects, including transitions, overlays, picture-in-picture, filtering, lighting
effects, and audio. Market research has shown that there are currently over 30 million users of PC
video editing software in the U.S., and this figure is expected to double within the next 3 years.

Consumer Video Editing


Application Users

60
Millions of Users

50
40
30 Users
20
10
0
2001 2002 2003 2004 2005 2006

Source: Instat: 2003

-4-
The latest revolution in consumer video technology is the transition to High Definition (HD) video,
which is raising the bar and setting new standards for visual fidelity. The two key characteristics of
HD video that separate it from Standard Definition (SD) video are its much higher resolutions and
wide screen aspect ratios.

The most common HD formats are 720p (1280x720 progressive) and 1080i (1920x1080 interlaced),
both of which use a 16:9 aspect ratio. In contrast, analog SD video uses a 480i (640x480 interlaced)
format, and digital SD supports a progressive version of this resolution, both using a 4:3 aspect ratio.
Progressive format video updates each scan line in sequence, while interlaced video updates the odd
scan lines on one pass, and the even scan lines on the next. With up to 7x the number of pixels and
a 33% wider display than SD, HD video provides crystal clear images (see Figure 5).

Figure 5: Image quality comparison between SD and HD video

Over 99% of U.S. TV households are currently able to receive digital TV signals. All of the major U.S.
networks are airing most of their top-rated programs in HD, and FCC legislation requires that all
broadcasters provide a digital signal by 2006. These trends have driven consumers to spend record
amounts on new HD digital video equipment. Soon, HD will replace SD just as color television
replaced black & white, and will be a market requirement for all new video devices.

For PCs, the problem is that handling HD content has much higher system requirements than SD
content. The greater amounts of data involved require significantly greater bandwidth than most of
today’s PCs can provide. For example, a 1080p HD video stream requires 12 times as much
bandwidth as a 480i SD video stream, and in fact exceeds the maximum available upstream
bandwidth of the AGP 8X interface (see Figure 6).

MPEG-2 or MPEG-4 compression can be used on HD video streams to make the bandwidth
requirements more manageable, but this then requires large of amounts of processing power to
perform the necessary encoding and decoding. These requirements place heavy demands on the
CPU, the front side bus (system memory), the graphics processor and the graphics bus interface.
Without a system architecture optimized to move and process HD data, users will not be able to fully
experience the benefits of the HD revolution on their PCs.

-5-
Megabytes per Second of Video
600
500
400
300
200 AGP Upstream
Bandwidth Limit
100
0
Analog 480i 480p 720p 1080i 1080p

Figure 6: Bandwidth Comparison for Different Video Formats

HD video editing is a particularly resource intensive task. In a typical usage scenario, a HD video
camera would stream recorded video into a PC via a FireWire or USB interface, where it would be
stored on a hard drive. The CPU would then read the video stream into system memory, decode it,
and pass the uncompressed video data across the graphics bus interface to the VPU, which displays
it on a display device. Using a software application running on the CPU, the user then edits in special
effects. The necessary commands are passed to the VPU, which processes the video stream and
passes the edited version back to system memory. The CPU then compresses this data and stores it
back on the hard drive. The complex set of data transfers required for this process are illustrated in
Figure 7.

-6-
Simultaneous upstream
and downstream
transfers CPU

VPU
System
North Memory
Bridge

South
Bridge
1394/USB Hard Drive
Port

HD Video Camera

Figure 7: Data Flows for High Definition Video Editing

Other possible usage scenarios involve combining data from multiple streams during the editing
process, or editing while simultaneously capturing video to disk. Tasks like these can involve multiple
simultaneous HD streams going through the paths described above, further exacerbating the
limitations of AGP systems. Fortunately, PCI Express systems will provide the necessary capabilities
to make these types of applications a reality.

Case Study: Pinnacle Systems


Pinnacle Systems is a developer of advanced digital video editing software packages for
professionals and consumers. Their software supports compositing of real-time 3D effects with
multiple video streams from a variety of sources, as well as DVD authoring tools for the latest
generation of PCs shipping with DVD writing capability. As such, it provides a perfect platform to put
the improvements promised by PCI Express systems to the test.

The goal for the next generation of video editing applications is real time editing of HD content. “Real
time” is typically considered to be 30-60 frames per second, so all of the necessary processing must
take 30 milliseconds or less to complete. Since AGP does not have enough upstream bandwidth to
support HD video transfers, the CPU is forced to do all of the work and then pass the final output
downstream across the AGP bus for display. In this case, the CPU must simultaneously
decompress, edit, blend, and compress multiple video streams.

Even today’s fastest multi-threaded CPUs are not capable of handling these tasks at real time frame
rates. To make this possible, the VPU must help out by handling the editing and blending, as well as
by accelerating the encoding and decoding processes. This requires that multiple HD streams be

-7-
passed back and forth between the CPU and the VPU. In other words, it is an ideal situation for PCI
Express systems.

The first Pinnacle Systems demonstration uses several digital HD video streams at 720p resolution.
One can be streamed in from a HD video camera, while the others are streamed off of the hard disk.
The streams are compressed in MPEG-2 format, and must first be decoded by the CPU. The
resulting uncompressed frames are written temporarily into system memory, and then streamed into
the VPU where they are combined and have a variety of 3D effects applied. The final output frames
are then simultaneously displayed on the screen and streamed back to the CPU, where they are
encoded and written to the hard drive.

On a system with a Intel 3.0 GHz CPU and ATI’s latest PCI Express graphics hardware, the Pinnacle
application was able to handle up to 4 simultaneous HD video streams while maintaining interactive
frame rates of over 30 frames per second. When run using a similarly equipped AGP 8X system, the
frame rate drops well below real-time levels (see Figure 8). Note the small difference between the
AGP 4X and AGP 8X systems; this clearly shows that PCI Express provides much greater benefits
than a simple increase in AGP bandwidth.

Pinnacle Demo 1
• Four 720p video streams
• Real-time 3D effects, including
read-back to hard drive

Frames
per Second
35 38% Required real-
real-time
frame rate
30
30.2
25 8%
fps AGP Systems:
Intel Grantsdale with Prescott 3.0 GHz CPU
20 20.3
21.8 DDR400 memory
ATI RADEON 9600 PRO graphics
fps
fps
15 PCI Express System:
Intel i865 with Prescott 3.0 GHz CPU
10 DDR 400 memory
ATI PCI Express graphics
(equivalent to RADEON 9600 PRO)
5

AGP4x AGP8x PCI Express


x16
Figure 8: Pinnacle Demo #1 Results

-8-
The second Pinnacle demonstration was designed to isolate graphics bus performance from CPU
performance. This was done by replacing the HD video streams with high resolution bitmaps, which
do not require any decoding or encoding. The result is that the PCI Express system was able to
show up to run at up to double the frame rate of the comparable AGP 4X system in this test (see
Figure 9). Once again, the difference between AGP 4X and AGP 8X is relatively small.

Pinnacle Demo 2
• Four 720p bitmap streams
• Real-time 3D effects, including
read-back to hard drive

Frames per Second


50
45 69%
48.4
40 fps
35 Required real-time
frame rate
30 16%
AGP Systems:
25 Intel Grantsdale with Prescott 3.0 GHz CPU
28.7 DDR400 memory
20 24.7 fps ATI RADEON 9600 PRO graphics
fps
PCI Express System:
15 Intel i865 with Prescott 3.0 GHz CPU
DDR 400 memory
10 ATI PCI Express graphics
(equivalent to RADEON 9600 PRO)

AGP4x AGP8x PCI Express


x16
Figure 9: Pinnacle Demo #2 Results

These applications represent one of the first demonstrations of how PCI Express will change the PC
landscape over the next few years. PCI Express systems will become an integral component of the
digital home of the future.

Copyright 2004, ATI Technologies Inc. PCI Express is a trademark of PCI-SIG.

-9-

You might also like