You are on page 1of 33

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO.

9, SEPTEMBER 2011 4053

Structured Compressed Sensing:


From Theory to Applications
Marco F. Duarte, Member, IEEE, and Yonina C. Eldar, Senior Member, IEEE

AbstractCompressed sensing (CS) is an emerging field that has nature, where no structure beyond sparsity is assumed on the
attracted considerable research interest over the past few years. signal or in its representation. This basic formulation already
Previous review articles in CS limit their scope to standard dis- required the use of sophisticated mathematical tools and rich
crete-to-discrete measurement architectures using matrices of ran-
domized nature and signal models based on standard sparsity. In theory in order to analyze recovery approaches and provide per-
recent years, CS has worked its way into several new application formance guarantees. It was therefore essential to confine atten-
areas. This, in turn, necessitates a fresh look on many of the ba- tion to this simplified setting in the early stages of development
sics of CS. The random matrix measurement operator must be re- of the CS framework.
placed by more structured sensing architectures that correspond to Essentially all analog-to-digital converters to date follow
the characteristics of feasible acquisition hardware. The standard
sparsity prior has to be extended to include a much richer class of the celebrated ShannonNyquist theorem which requires the
signals and to encode broader data models, including continuous- sampling rate to be at least twice the bandwidth of the signal.
time signals. In our overview, the theme is exploiting signal and This basic principle underlies the majority of digital signal pro-
measurement structure in compressive sensing. The prime focus cessing (DSP) applications such as audio, video, radio receivers,
is bridging theory and practice; that is, to pinpoint the potential radar applications, medical devices and more. The ever growing
of structured CS strategies to emerge from the math to the hard-
ware. Our summary highlights new directions as well as relations demand for data, as well as advances in radio frequency (RF)
to more traditional CS, with the hope of serving both as a review technology, have promoted the use of high-bandwidth signals,
to practitioners wanting to join this emerging field, and as a refer- for which the rates dictated by the ShannonNyquist theorem
ence for researchers that attempts to put some of the existing ideas impose severe challenges both on the acquisition hardware
in perspective of practical applications. and on the subsequent storage and DSP processors. CS was
Index TermsApproximation algorithms, compressed sensing, motivated in part by the desire to sample wideband signals
compression algorithms, data acquisition, data compression, sam- at rates far lower than the ShannonNyquist rate, while still
pling methods.
maintaining the essential information encoded in the under-
lying signal. In practice, however, much of the work to date on
I. INTRODUCTION AND MOTIVATION CS has focused on acquiring finite-dimensional sparse vectors
using random measurements. This precludes the important case

C OMPRESSED sensing (CS) is an emerging field that has


attracted considerable research interest in the signal pro-
cessing community. Since its introduction only several years
of continuous-time (i.e., analog) input signals, as well as prac-
tical hardware architectures which inevitably are structured.
Achieving the holy grail of compressive ADCs and increased
ago [1], [2], thousands of papers have appeared in this area, and resolution requires a broader framework which can treat more
hundreds of conferences, workshops, and special sessions have general signal models, including continuous-time signals with
been dedicated to this growing research field. various types of structure, as well as practical measurement
Due to the vast interest in this topic, there exist several excel- schemes.
lent review articles on the basics of CS [3][5]. These articles fo- In recent years, the area of CS has branched out to many new
cused on the first CS efforts: the use of standard discrete-to-dis- fronts and has worked its way into several application areas.
crete measurement architectures using matrices of randomized This, in turn, necessitates a fresh look on many of the basics
of CS. The random matrix measurement operator, fundamental
Manuscript received September 01, 2010; revised February 19, 2011 and June in all early presentations of CS, must be replaced by more struc-
22, 2011; accepted July 03, 2011. Date of publication July 14, 2011; date of tured measurement operators that correspond to the application
current version August 10, 2011. The associate editor coordinating the review
of this manuscript and approving it for publication was Dr. Ivan W. Selesnick. of interest, such as wireless channels, analog sampling hard-
The work of M. F. Duarte was supported in part by NSF Supplemental Funding ware, sensor networks and optical imaging. The standard spar-
DMS-0439872 to UCLA-IPAM, P. I. R. Caflisch. The work of Y. C. Eldar was
supported in part by the Israel Science Foundation under Grant no. 170/10, and
sity prior that has characterized early work in CS has to be ex-
by a Magneton grant from the Israel Ministry of Industry and Trade. tended to include a much richer class of signals: signals that
M. F. Duarte is with the Department of Electrical and Computer Engineering, have underlying low-dimensional structure, not necessarily rep-
University of Massachusetts, Amherst, MA 01003 USA (e-mail: mduarte@ecs.
umass.edu).
resented by standard sparsity, and signals that can have arbitrary
Y. C. Eldar is with the Department of Electrical Engineering, Technion dimensions, not only finite-dimensional vectors.
Israel Institute of Technology, Haifa 32000, Israel (e-mail: yonina@ee.tech- A significant part of the recent work on CS from the signal
nion.ac.il). processing community can be classified into two major con-
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. tribution areas. The first group consists of theory and applica-
Digital Object Identifier 10.1109/TSP.2011.2161982 tions related to CS matrices that are not completely random and
1053-587X/$26.00 2011 IEEE
4054 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

that often exhibit considerable structure. This largely follows II. BACKGROUND
from efforts to model the way the samples are acquired in prac-
We live in a digital world. Telecommunication, entertain-
tice, which leads to sensing matrices that inherent their structure
ment, medical devices, gadgets, businessall revolve around
from the real world. The second group includes signal represen-
digital media. Miniature sophisticated black-boxes process
tations that exhibit structure beyond sparsity and broader classes
streams of bits accurately at high speeds. Nowadays, electronic
of signals, such as continuous-time signals with infinite-dimen-
consumers feel natural that a media player shows their favorite
sional representations. For many types of signals, such struc-
movie, or that their surround system synthesizes pure acoustics,
ture allows for a higher degree of signal compression when
as if sitting in the orchestra instead of the living room. The
leveraged on top of sparsity. Additionally, infinite-dimensional
digital world plays a fundamental role in our everyday routine,
signal representations provide an important example of richer
to such a point that we almost forget that we cannot hear or
structure which clearly cannot be described using standard spar-
watch these streams of bits, running behind the scenes.
sity. Since reducing the sampling rate in analog signals was one
Analog-to-digital conversion (ADC) lies at the heart of this
of the driving forces behind CS, building a theory that can ac-
revolution. ADC devices translate the physical information into
commodate low-dimensional signals in arbitrary Hilbert spaces
a stream of numbers, enabling digital processing by sophisti-
is clearly an essential part of the CS framework. Both of these
cated software algorithms. The ADC task is inherently intri-
categories are motivated by real-world CS involving actual hard-
cate: its hardware must hold a snapshot of a fast-varying input
ware implementations.
signal steady while acquiring measurements. Since these mea-
In our review, the theme is exploiting signal and measure-
surements are spaced in time, the values between consecutive
ment structure in CS. The prime focus is bridging theory and
snapshots are lost. In general, therefore, there is no way to re-
practicethat is, to pinpoint the potential of CS strategies to
cover the analog signal unless some prior on its structure is in-
emerge from the math to the hardware by generalizing the un-
corporated.
derlying theory where needed. We believe that this is an essen-
After sampling, the numbers or bits retained must be stored
tial ingredient in taking CS to the next step: incorporating this
and later processed. This requires ample storage devices and
fast growing field into real-world applications. Considerable ef-
sufficient processing power. As technology advances, so does
forts have been devoted in recent years by many researchers to
the requirement for ever-increasing amounts of data, imposing
adapt the theory of CS to better solve real-world signal acquisi-
unprecedented strains on both the ADC devices and the subse-
tion challenges. This has also led to parallel low-rate sampling
quent DSP and storage media. How then does consumer elec-
schemes that combine the principles of CS with the rich theory
tronics keep up with these high demands? Fortunately, most of
of sampling such as the finite rate of innovation (FRI) [6][8]
the data we acquire can be discarded without much perceptual
and Xampling frameworks [9], [10]. There are already dozens
loss. This is evident in essentially all compression techniques
of papers dealing with these broad ideas. In this review we have
used to date. However, this paradigm of high-rate sampling fol-
strived to provide a coherent summary, highlighting new direc-
lowed by compression does not alleviate the large strains on the
tions and relations to more traditional CS. This material can
acquisition device and on the DSP. In his seminal work on CS
both serve as a review to those wanting to join this emerging
[1], Donoho posed the ultimate goal of merging compression
field, as well as a reference that attempts to summarize some of
and sampling: Why go to so much effort to acquire all the data
the existing results in the framework of practical applications.
when most of what we get will be thrown away? Cant we just
Our hope is that this presentation will attract the interest of both
directly measure the part that wont end up being thrown away?
mathematicians and engineers in the desire to promote the CS
premise into practical applications, and encourage further re-
A. ShannonNyquist Theorem
search into this new frontier.
This review paper is organized as follows. Section II pro- ADCs provide the interface between an analog signal being
vides background motivating the formulation of CS and the recorded and a suitable discrete representation. A common ap-
layout of the review. A primer on standard CS theory is pre- proach in engineering is to assume that the signal is bandlimited,
sented in Section III. This material serves as a basis for the later meaning that the spectral contents are confined to a maximal fre-
developments. Section IV reviews alternative constructions for quency . Bandlimited signals have limited time variation, and
structured CS matrices beyond those generated completely at can therefore be perfectly reconstructed from equispaced sam-
random. In Section V, we introduce finite-dimensional signal ples with rate at least , termed the Nyquist rate. This funda-
models that exhibit additional signal structure. This leads to the mental result is often attributed in the engineering community
more general union-of-subspaces framework, which will play to ShannonNyquist [11], [12], although it dates back to earlier
an important role in the context of structured infinite-dimen- works by Whittaker [13] and Kotelnikov [14].
sional representations as well. Section VI extends the concepts Theorem 1 [12]: If a function contains no frequencies
of CS to infinite-dimensional signal models and introduces re- higher than hertz, then it is completely determined by giving
cent compressive ADCs which have been developed based on its ordinates at a series of points spaced seconds apart.
the Xampling and FRI frameworks. For each of the matrices and A fundamental reason for processing at the Nyquist rate is the
models introduced, we summarize the details of the theoretical clear relation between the spectrum of and that of its sam-
and algorithmic frameworks and provide example applications ples , so that digital operations can be easily substituted
where the structure is applicable. for their analog counterparts. Digital filtering is an example
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4055

where this relation is successfully exploited. Since the power to allow for more elaborate measurement schemes that incor-
spectral densities of analog and discrete random processes are porate structure into the measurement process. When consid-
associated in a similar manner, estimation and detection of pa- ering real-world acquisition schemes, the choices of possible
rameters of analog signals can be performed by DSP. In con- measurement matrices are dictated by the constraints of the ap-
trast, compression is carried out by a series of algorithmic steps, plication. Thus, we must deviate from the general randomized
which, in general, exhibit a nonlinear complicated relationship constructions and apply structure within the projection vectors
between the samples and the stored data. that can be easily implemented by the acquisition hardware.
While this framework has driven the development of signal Section IV focuses on such alternatives; we survey both existing
acquisition devices for the last half century, the increasing com- theory and applications for several classes of structured CS ma-
plexity of emerging applications dictates increasingly higher trices. In certain applications, there exist hardware designs that
sampling rates that cannot always be met using available hard- measure analog signals at a sub-Nyquist rate, obtaining mea-
ware. Advances in related fields such as wideband communi- surements for finite-dimensional signal representations via such
cation and RF technology open a considerable gap with ADC structured CS matrices.
devices. Conversion speeds which are twice the signals max- In the second part of this review (Sections V and VI), we
imal frequency component have become more and more difficult expand the theory of CS to signal models tailored to express
to obtain. Consequently, alternatives to high rate sampling are structure beyond standard sparsity. A recent emerging theoret-
drawing considerable attention in both academia and industry. ical framework that allows a broader class of signal models to be
Structured analog signals can often be processed far more acquired efficiently is the union of subspaces model [15][20].
efficiently than what is dictated by the ShannonNyquist the- We introduce this framework and some of its applications in
orem, which does not take any structure into account. For ex- a finite-dimensional context in Section V, which include more
ample, many wideband communication signals are comprised general notions of structure and sparsity. Combining the princi-
of several narrow transmissions modulated at high carrier fre- ples and insights from the previous sections, in Section VI we
quencies. A common practice in engineering is demodulation extend the notion of CS to analog signals with infinite-dimen-
in which the input signal is multiplied by the carrier frequency sional representations. This new framework, referred to as Xam-
of a band of interest, in order to shift the contents of the narrow- pling [9], [10], relies on more general signal modelsunion of
band transmission from the high frequencies to the origin. Then, subspaces and FRI signalstogether with guidelines on how to
commercial ADC devices at low rates are utilized. Demodula- exploit these mathematical structures in order to build sensing
tion, however, requires knowing the exact carrier frequency. In devices that can directly acquire analog signals at reduced rates.
this review we focus on structured models in which the exact We then survey several compressive ADCs that result from this
parameters defining the structure are unknown. In the context broader framework.
of multiband communications, for example, the carrier frequen-
cies may not be known, or may be changing over time. The goal III. COMPRESSED SENSING BASICS
then is to build a compressed sampler which does not depend on
the carrier frequencies, but can nonetheless acquire and process CS [1][5] offers a framework for simultaneous sensing and
such signals at rates below Nyquist. compression of finite-dimensional vectors, that relies on linear
dimensionality reduction. Specifically, in CS we do not acquire
B. Compressed Sensing and Beyond directly but rather acquire linear measurements
A holy grail of CS is to build acquisition devices that using an CS matrix . We refer to as the mea-
exploit signal structure in order to reduce the sampling rate, surement vector. Ideally, the matrix is designed to reduce the
and subsequent demands on storage and DSP. In such an number of measurements as much as possible while allowing
approach, the actual information contents dictate the sampling for recovery of a wide class of signals from their measurement
rate, rather than the dimensions of the ambient space in which vectors . However, the fact that renders the matrix
the signal resides. The challenges in achieving this task both rank-defficient, meaning that it has a nonempty nullspace; this,
theoretically and in terms of hardware design can be reduced in turn, implies that for any particular signal , an in-
substantially when considering finite-dimensional problems in finite number of signals will yield the same measurements
which the signal to be measured can be represented as a discrete for the chosen CS matrix .
finite-length vector. This has spurred a surge of research on The motivation behind the design of the matrix is, there-
various mathematical and algorithmic aspects of sensing sparse fore, to allow for distinct signals within a class of signals
signals, which were mainly studied for discrete finite vectors. of interest to be uniquely identifiable from their measurements
At its core, CS is a mathematical framework that studies ac- , , even though . We must therefore
curate recovery of a signal represented by a vector of length make a choice on the class of signals that we aim to recover from
from measurements, effectively performing compres- CS measurements.
sion during signal acquisition. The measurement paradigm con-
A. Sparsity
sists of linear projections, or inner products, of the signal vector
into a set of carefully chosen projection vectors that act as a mul- Sparsity is the signal structure behind many compression al-
titude of probes on the information contained in the signal. In gorithms that employ transform coding, and is the most preva-
the first part of this review (Sections III and IV) we survey the lent signal structure used in CS. Sparsity also has a rich history
fundamentals of CS and show how the ideas can be extended of applications in signal processing problems in the last century
4056 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

(particularly in imaging), including denoising, deconvolution, When sparsity is used as the signal structure enabling CS, we
restoration, and inpainting [21][23]. aim to recover from by exploiting its sparsity. In contrast
To introduce the notion of sparsity, we rely on a signal repre- with transform coding, we do not operate on the signal di-
sentation in a given basis for . Every signal rectly, but rather only have access to the CS measurement vector
is representable in terms of coefficients as . Our goal is to push as close as possible to in order
; arranging the as columns into the matrix to perform as much signal compression during acquisition as
and the coefficients into the coefficient vector , we possible. In the sequel, we will assume that is taken to be the
can write succinctly that , with . Similarly, if identity basis so that the signal is itself sparse. In certain
we use a frame1 containing unit-norm column vectors of cases we will explicitly define a different basis or frame that
length with (i.e., ), then for any vector arises in a specific application of CS.
there exist infinitely many decompositions
such that . In a general setting, we refer to as the B. Design of CS Matrices
sparsifying dictionary [24]. While our exposition is restricted The main design criteria for the CS matrix is to enable the
to real-valued signals, the concepts are extendable to complex unique identification of a signal of interest from its measure-
signals as well [25], [26]. ments . Clearly, when we consider the class of -sparse
We say that a signal is -sparse in the basis or frame if signals , the number of measurements for any ma-
there exists a vector with only nonzero entries trix design, since the identification problem has unknowns
such that . We call the set of indices corresponding to even when the support of the signal is provided.
the nonzero entries the support of and denote it by . In this case, we simply restrict the matrix to its columns cor-
We also define the set that contains all signals that are responding to the indices in , denoted by , and then use the
-sparse. pseudoinverse to recover the nonzero coefficients of :
A -sparse signal can be efficiently compressed by pre-
serving only the values and locations of its nonzero coefficients, (3)
using bits: coding each of the nonzero coeffi-
Here, is the restriction of the vector to the set of indices
cients locations takes bits, while coding the magnitudes
, and denotes the pseudoinverse of the
uses a constant amount of bits that depends on the desired pre-
matrix . The implicit assumption in (3) is that has full
cision, and is independent of . This process is known as
column-rank so that there is a unique solution to the equation
transform coding, and relies on the existence of a suitable
.
basis or frame that renders signals of interest sparse or
We begin by determining properties of that guarantee that
approximately sparse.
distinct signals , , lead to different measure-
For signals that are not exactly sparse, the amount of com-
ment vectors . In other words, we want each vector
pression depends on the number of coefficients of that we
to be matched to at most one vector such that
keep. Consider a signal whose coefficients , when sorted in
. A key relevant property of the matrix in this context is
order of decreasing magnitude, decay according to the power
its spark.
law
Definition 1 [28]: The spark spark of a given matrix is
the smallest number of columns of that are linearly dependent.
(1) The spark is related to the Kruskal rank from the tensor
product literature; the matrix has Kruskal rank spark .
where indexes the sorted coefficients. Thanks to the rapid This definition allows us to pose the following straightforward
decay of their coefficients, such signals are well-approximated guarantee.
by -sparse signals. The best -term approximation error for Theorem 2 [28]: If spark , then for each measure-
such a signal obeys ment vector there exists at most one signal
such that .
(2) It is easy to see that spark , so that Theorem 2
yields the requirement .
with and denoting a constant that does not de- While Theorem 2 guarantees uniqueness of representation for
pend on [27]. That is, the signals best approximation error -sparse signals, computing the spark of a general matrix has
(in an -norm sense) has a power-law decay with exponent combinatorial computational complexity, since one must verify
as increases. We dub such a signal -compressible. When that all sets of columns of a certain size are linearly indepen-
is an orthonormal basis, the best sparse approximation of is dent. Thus, it is preferable to use properties of that are easily
obtained by hard thresholding the signals coefficients, so that computable to provide recovery guarantees. The coherence of a
only the coefficients with largest magnitudes are preserved. matrix is one such property.
The situation is decidedly more complicated when is a gen- Definition 2 [28][31]: The coherence of a matrix
eral frame, spurring the development of sparse approximation is the largest absolute inner product between any two columns
methods, a subject that we will focus on in Section III-C. of :
1A matrix 9 is said to be a frame if there exist constants A, B such that (4)
Akxk  k9xk  B kxk for all x.
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4057

It can be shown that ; the lower bound respect to the multiplicative noise introduced by the CS matrix
is known as the Welch bound [32], [33]. Note that when mismatch [35], [36].
, the lower bound is approximately . One can tie The RIP can be connected to the coherence property by using,
the coherence and spark of a matrix by employing the Gersh- once again, the Gershgorin circle theorem (Theorem 3).
gorin circle theorem. Lemma 2: [41] If has unit-norm columns and coherence
Theorem 3 [34]: The eigenvalues of an matrix , then has the -RIP with .
with entries , , lie in the union of discs One can also easily connect RIP with the spark. For each
, , centered at with radius -sparse vector to be uniquely identifiable by its measure-
. ments, it suffices for the matrix to have the -RIP with
Applying this theorem on the Gram matrix leads , as this implies that all sets of columns of are
to the following result. linearly independent, i.e., spark (cf. Theorems 2 and
Lemma 1: [28] For any matrix , 4). We will see later that the RIP enables recovery guarantees
that are much stronger than those based on spark and coher-
spark (5) ence. However, checking whether a CS matrix satisfies the
-RIP has combinatorial computational complexity.
Now that we have defined relevant properties of a CS matrix
By merging Theorem 2 with Lemma 1, we can pose the fol- , we discuss specific matrix constructions that are suitable for
lowing condition on that guarantees uniqueness. CS. An Vandermonde matrix constructed from
Theorem 4 [28], [30], [31]: If distinct scalars has spark [27]. Unfortunately,
these matrices are poorly conditioned for large values of ,
(6) rendering the recovery problem numerically unstable. Similarly,
there are known matrices of size that achieve the
then for each measurement vector there exists at most coherence lower bound
one signal such that .
Theorem 4, together with the Welch bound, provides an upper (8)
bound on the level of sparsity that guarantees uniqueness
using coherence: . such as the Gabor frame generated from the Alltop sequence
The prior properties of the CS matrix provide guarantees of [42] and more general equiangular tight frames [33]. It is also
uniqueness when the measurement vector is obtained without possible to construct deterministic CS matrices of size
error. Hardware considerations introduce two main sources of that have the -RIP for [43]. These
inaccuracies in the measurements: inaccuracies due to noise at
the sensing stage (in the form of additive noise ), constructions restrict the number of measurements needed to
and inaccuracies due to mismatches between the CS matrix used recover a -sparse signal to be , which is
during recovery, , and that implemented during acquisition, undesirable for real-world values of and .
(in the form of multiplicative noise [35], [36]). Fortunately, these bottlenecks can be defeated by random-
Under these sources of error, it is no longer possible to guarantee izing the matrix construction. For example, random matrices
uniqueness; however, it is desirable for the measurement process of size whose entries are independent and identically dis-
to be tolerant to both types of error. To be more formal, we tributed (i.i.d.) with continuous distributions have spark
would like the distance between the measurement vectors for with high probability. It can also be shown that when the
two sparse signals , to be proportional to distribution used has zero mean and finite variance, then in the
the distance between the original signal vectors and . Such asymptotic regime (as and grow) the coherence converges
a property allows us to guarantee that, for small enough noise, to [44], [45]. Similarly, random matrices from
two sparse vectors that are far apart from each other cannot lead Gaussian, Rademacher, or more generally a sub-Gaussian dis-
to the same (noisy) measurement vector. This behavior has been tribution2 have the -RIP with high probability if [46]
formalized into the restricted isometry property (RIP).
Definition 3 [4]: A matrix has the -restricted isom- (9)
etry property ( -RIP) if, for all ,

(7) Finally, we point out that while the set of RIP-fulfilling ma-
trices provided above might seem limited, emerging numerical
In words, the -RIP ensures that all submatrices of of results have shown that a variety of classes of matrices are
size are close to an isometry, and therefore distance-pre- suitable for CS recovery at regimes similar to those of the ma-
serving. We will show later that this property suffices to prove trices advocated in this section, including subsampled Fourier
that the recovery is stable to presence of additive noise . In and Hadamard transforms [47], [48].
certain settings, noise is introduced to the signal prior to mea- 2A Rademacher distribution gives probability 6
to the values 1. A
surement. Recovery is also stable for this case; however, there random variable X is called sub-Gaussian if there existsc> 0 such that
is a degradation in the distortion of the recovery by a factor of e
( ) e for all t2 . Examples include the Gaussian, Bernoulli,
[37][40]. Furthermore, the RIP also leads to stability with and Rademacher random variables, as well as any bounded random variable.
4058 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

C. CS Recovery Algorithms is a common choice. Approaches designed to


We now focus on solving the CS recovery problem: given address stochastic noise include complexity-based regulariza-
and , find a signal within the class of interest such that tion [52] and Bayesian estimation [53]. These methods pose
exactly or approximately. probabilistic or complexity-based priors, respectively, on the
When we consider sparse signals, the CS recovery process set of observable signals. The particular prior is then leveraged
consists of a search for the sparsest signal that yields the mea- together with the noise probability distribution during signal re-
surements . By defining the norm of a vector as the covery. Optimization-based approaches can also be formulated
number of nonzero entries in , the simplest way to pose a re- in this case; one of the most popular techniques is the Dantzig
covery algorithm is using the optimization selector [54]:

subject to (10) (14)

Solving (10) relies on an exhaustive search and is successful for where denotes the -norm, which provides the largest-
all when the matrix has the sparse solution unique- magnitude entry in a vector and is a constant parameter that
ness property (i.e., for as small as , see Theorems 2 and 4). controls the probability of successful recovery.
However, this algorithm has combinatorial computational com- An alternative to optimization-based approaches, are greedy
plexity, since we must check whether the measurement vector algorithms for sparse signal recovery. These methods are iter-
belongs to the span of each set of columns of , ative in nature and select columns of according to their cor-
. Our goal, therefore, is to find computationally fea- relation with the measurements determined by an appropriate
sible algorithms that can successfully recover a sparse vector inner product. For example, the matching pursuit and orthog-
from the measurement vector for the smallest possible number onal matching pursuit algorithms (OMP) [24], [55] proceed by
of measurements . finding the column of most correlated to the signal residual,
An alternative to the norm used in (10) is to use the which is obtained by subtracting the contribution of a partial
norm, defined as . The resulting adapta- estimate of the signal from . The OMP method is formally de-
tion of (10), known as basis pursuit (BP) [22], is formally de- fined as Algorithm 1, where denotes a thresholding
fined as operator on that sets all but the entries of with the largest
magnitudes to zero, and denotes the restriction of to the
subject to (11) entries indexed by . The convergence criterion used to find
sparse representations consists of checking whether
Since the norm is convex, (11) can be seen as a convex relax- exactly or approximately; note that due to its design, the al-
ation of (10). Thanks to the convexity, this algorithm can be im- gorithm cannot run for more than iterations, as has
plemented as a linear program, making its computational com- rows. Other greedy techniques that have a similar flavor to OMP
plexity polynomial in the signal length [49].3 include CoSaMP [56], detailed as Algorithm 2, and subspace
The optimization (11) can be modified to allow for noise in pursuit (SP) [57]. A simpler variant is known as iterative hard
the measurements ; we simply change the constraint thresholding (IHT) [58]: starting from an initial signal estimate
on the solution to , the algorithm iterates a gradient descent step followed
by hard thresholding, i.e.,
subject to (12)
(15)

where is an appropriately chosen bound on the noise until a convergence criterion is met.
magnitude. This modified optimization is known as basis pur-
suit with inequality constraints (BPIC) and is a quadratic pro- Algorithm 1: Orthogonal Matching Pursuit
gram with polynomial complexity solvers [49]. The Lagrangian Input: CS matrix , measurement vector
relaxation of this quadratic program is written as
Output: Sparse representation
Initialize: , , ,
(13)
while halting criterion false do
and is known as basis pursuit denoising (BPDN). There exist
many efficient solvers to find BP, BPIC, and BPDN solutions; {form residual signal estimate}
for an overview, see [51]. {update support with residual}
Oftentimes, a bounded-norm noise model is overly pes-
simistic, and it may be reasonable instead to assume that the , {update signal estimate}
noise is random. For example, additive white Gaussian noise {update measurement residual}
3A similar set of recovery algorithms, known as total variation minimizers, end while
operate on the gradient of an image, which exhibits sparsity for piecewise
smooth images [50].
return
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4059

Algorithm 2: CoSaMP probability when the parameter , with


denoting an adjustable parameter to control the probability of
Input: CS matrix , measurement vector , sparsity being too large [60]. A second example gives a related re-
sult for the BPDN algorithm.
Output: -sparse approximation to true signal
Theorem 6 [61]: Let the signal and write
Initialize: , , , where . Suppose that
while halting criterion false do and consider the BPDN optimization problem (13) with
. Then, with probability on the order of ,
the solution of (13) is unique, its error is bounded by
{form residual signal estimate}
{prune residual}
(18)
{merge supports}
, {form signal estimate} and its support is a subset of the true -element support of .
Under AWGN, the value of one would need to choose in
{prune signal using model}
Theorem 5 is , giving a bound much larger than The-
{update measurement residual} orem 6, which is . This demonstrates the noise
end while reduction achievable due to the adoption of the random noise
model. These guarantees come close to the CramrRao bound,
return
which is given by [62]. We finish the study of coher-
ence-based guarantees for the AWGN setting with a result for
D. CS Recovery Guarantees OMP.
Many of the CS recovery algorithms above come with guar- Theorem 7 [61]: Let the signal and write
antees on their performance. We group these results according , where . Suppose that
to the matrix metric used to obtain the guarantee. and
First, we review results that rely on coherence. As a first ex-
ample, BP and OMP recover a -sparse vector from noiseless
measurements when the matrix satisfies (6) [28], [30], [31]. (19)
There also exist coherence-based guarantees designed for mea-
surements corrupted with arbitrary noise. for some constant . Then, with probability at least
Theorem 5 [59]: Let the signal and write , the output of OMP after itera-
. Denote . Suppose that and tions has error bounded by
in (12). Then the output of (12) has error bounded by
(20)
(16)
and its support matches the true -element support of .
The greedy nature of OMP poses the requirement on the min-
while the output of the OMP algorithm with halting criterion
imum absolute-valued entry of in order for the support to be
has error bounded by
correctly detected, in contrast to BPIC and BPDN.
A second class of guarantees are based on the RIP. The
(17)
following result for OMP provides an interesting viewpoint of
greedy algorithms.
provided that for OMP, with being a Theorem 8 [63]: Let the signal and write .
positive lower bound on the magnitude of the nonzero entries of Suppose that has the -RIP with . Then
. OMP can recover a -sparse signal exactly in iterations.
Note here that BPIC must be aware of the noise magnitude Guarantees also exist for noisy measurement settings, albeit
to make , while OMP must be aware of the noise magni- with significantly more stringent RIP conditions on the CS ma-
tude to set an appropriate convergence criterion. Additionally, trix.
the error in Theorem 5 is proportional to the noise magnitude . Theorem 9 [64]: Let the signal and write
This is because the only assumption on the noise is its magni- . Suppose that has the -RIP with . Then
tude, so that might be aligned to maximally harm the estima- the output of OMP after iterations has error bounded by
tion process.
In the random noise case, bounds on can only be (21)
stated in high probability, since there is always a small proba-
bility that the noise will be very large and completely overpower The next result extends guarantees from sparse to more gen-
the signal. For example, under additive white Gaussian noise eral signals measured under noise. We collect a set of indepen-
(AWGN), the guarantees for BPIC in Theorems 5 hold with high dent statements in a single theorem.
4060 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

Theorem 10 [4], [56][58]: Let the signal and write In words, the theorem says that as long as the coherence
. The outputs of the CoSaMP, SP, IHT, and BPIC and the spectral norm of the CS matrix are small enough,
algorithms, with having the -RIP, obey we will be able to recover the majority of -sparse signals
from their measurements . Probabilistic results that rely on co-
herence can also be obtained for the BPDN algorithm (13) [45].
The main difference between the guarantees that rely solely
(22) on coherence and those that rely on the RIP and probabilistic
where is the best -sparse ap- sparse signal models is the scaling of the number of measure-
proximation of the vector when measured in the norm. The ments needed for successful recovery of -sparse signals.
requirements on the parameters , of the RIP and the values According to the bounds (8) and (9), the sparsity level that al-
of , , and are specific to each algorithm. For example, lows for recovery with high probability in Theorems 10, 12, and
for the BPIC algorithm, and suffice to obtain 13 is , compared with for the de-
the guarantee (22). terministic guarantees provided by Theorems 5, 6, and 7. This
The type of guarantee given in Theorem 10 is known as uni- so-called square root bottleneck [65] gives an additional reason
form instance optimality, in the sense that the CS recovery error for the popularity of randomized CS matrices and sparse signal
is proportional to that of the best -sparse approximation to the models.
signal for any signal . In fact, the formulation of the
IV. STRUCTURE IN CS MATRICES
CoSaMP, SP and IHT algorithms was driven by the goal of in-
stance optimality, which has not been shown for older greedy While most initial work in CS has emphasized the use of ran-
algorithms like MP and OMP. Theorem 10 can also be adapted domized CS matrices whose entries are obtained independently
to recovery of exactly sparse signals from noiseless measure- from a standard probability distribution, such matrices are often
ments. not feasible for real-world applications due to the cost of multi-
Corollary 11: Let the signal and write . plying arbitrary matrices with signal vectors of high dimension.
The CoSaMP, SP, IHT, and BP algorithms can exactly recover In fact, very often the physics of the sensing modality and the
from if has the -RIP, where the parameters , of capabilities of sensing devices limit the types of CS matrices
the RIP are specific to each algorithm. that can be implemented in a specific application. Furthermore,
Similarly to Theorem 5, the error in Theorem 10 is propor- in the context of analog sampling, one of the prime motivations
tional to the noise magnitude , and the bounds can be tai- for CS is to build analog samplers that lead to sub-Nyquist sam-
lored to random noise with high probability. The Dantzig se- pling rates. These involve actual hardware and therefore struc-
lector improves the scaling of the error in the AWGN setting. tured sensing devices. Hardware considerations require more
Theorem 12 [54]: Let the signal and write elaborate signal models to reduce the number of measurements
, where . Suppose that needed for recovery as much as possible. In this section, we re-
in (14) and that has the and -RIPs with view available alternatives for structured CS matrices; in each
. Then, with probability at least , case, we provide known performance guarantees, as well as ap-
plication areas where the structure arises. In Section VI we ex-
we have
tend the CS framework to allow for analog sampling, and in-
troduce further structure into the measurement process. This re-
(23) sults in new hardware implementations for reduced rate sam-
plers based on extended CS principles. Note that the survey
Similar results under AWGN have been shown for the OMP and of CS devices given in this section is by no means exhaus-
thresholding algorithms [61]. tive [66][68]; our focus is on CS matrices that have been inves-
A third class of guarantees relies on metrics additional to co- tigated from both a theoretical and an implementational point of
herence and RIP. This class has a nonuniform flavor in the sense view.
that the results apply only for a certain subset of sufficiently
sparse signals. Such flexibility allows for significant relaxations A. Subsampled Incoherent Bases
on the properties required from the matrix . The next example The key concept of a frames coherence can be extended to
has a probabilistic flavor and relies on the coherence property. pairs of orthonormal bases. This enables a new choice for CS
Theorem 13 [65]: Let the signal with support drawn matrices: one simply selects an orthonormal basis that is inco-
uniformly at random from the available possibilities and herent with the sparsity basis, and obtains CS measurements by
entries drawn independently from a distribution so that selecting a subset of the coefficients of the signal in the chosen
. Write and fix and basis [69]. We note that some degree of randomness remains in
. If and this scheme, due to the choice of coefficients selected to repre-
sent the signal as CS measurements.
1) Formulation: Formally, we assume that a basis
is provided for measurement purposes, where
each column of corresponds to a different
then is the unique solution to BP (11) with probability at least basis element. Let be an column submatrix of that
. preserves the basis vectors with indices and set .
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4061

Under this setup, a different metric arises to evaluate the per-


formance of CS.
Definition 4 [28], [69]: The mutual coherence of the -di-
mensional orthonormal bases and is the maximum absolute
value of the inner product between elements of the two bases:

(24)

Fig. 1. Diagram of the single pixel camera. The incident lightfield (corre-
where denotes the column, or element, of the basis . sponding to the desired image x) is reflected off a digital micro-mirror device
The mutual coherence has values in the range (DMD) array whose mirror orientations are modulated in the pseudorandom
. For example, when is the pattern supplied by the random number generator (RNG). Each different mirror
pattern produces a voltage at the single photodiode that corresponds to one
discrete Fourier transform basis, or Fourier matrix, and is measurement y (m). The process is repeated with different patterns M times to
the canonical basis, or identity matrix, and when obtain a full measurement vector y (taken from [76]).
both bases share at least one element or column. Note also that
the concepts of coherence and mutual coherence are connected
by the equality . The definition of mutual In the case of optical microscopy, the Fourier coefficients that
coherence can also be extended to infinite-dimensional repre- are measured correspond to the lowpass regime. The highpass
sentations of continuous-time (analog) signals [33], [70]. values are completely lost. When the underlying signal can
2) Theoretical Guarantees: The following theorem provides change sign, standard sparse recovery algorithms such as BP do
a recovery guarantee based on mutual coherence. not typically succeed in recovering the true underlying vector.
Theorem 14 [69]: Let be a -sparse signal To treat the case of recovery from lowpass coefficients, a special
in with support , , and with purpose sparse recovery method was developed under the name
entries having signs chosen uniformly at random. Choose of Nonlocal Hard Thresholding (NLHT) [74]. This technique
a subset uniformly at random for the set attempts to allocate the off-support of the sparse signal in an
of observed measurements, with . Suppose that iterative fashion by performing a thresholding step that depends
and for fixed on the values of the neighboring locations.
values of . Then with probability at least , The second category involves the design of new acquisition
is the solution to (11). hardware that can obtain projections of the signal against a
The number of measurements required by Theorem 14 ranges class of vectors. The goal of the matrix design step is to find a
from to . It is possible to expand the guar- basis whose elements belong to the class of vectors that can be
antee of Theorem 14 to compressible signals by adapting an ar- implemented on the hardware. For example, a class of single
gument of Rudelson and Vershynin in [71] to link coherence and pixel imagers based on optical modulators [60], [76] (see an
restricted isometry constants. example in Fig. 1) can obtain projections of an image against
Theorem 15 [71 Remark 3.5.3]: Choose a subset vectors that have binary entries. Example bases that meet this
for the set of observed measurements, criterion include the WalshHadamard and noiselet bases [77].
with . Suppose that The latter is particularly interesting for imaging applications, as
it is known to be maximally incoherent with the Haar wavelet
(25) basis. In contrast, certain elements of the Walsh-Hadamard
basis are highly coherent with wavelet functions at coarse
for a fixed value of . Then with probability at least scales, due to their large supports. Permuting the entries of the
the matrix has the RIP with constant . basis vectors (in a random or pseudorandom fashion) helps
Using this theorem, we obtain the guarantee of Theorem 10 reduce the coherence between the measurement basis and
for compressible signals with the number of measurements a wavelet basis. Because the single pixel camera modulates
dictated by the coherence value . the light field through optical aggregation, it improves the
3) Applications: There are two main categories of applica- signal-to-noise ratio of the measurements obtained when com-
tions where subsampled incoherent bases are used. In the first pared to standard multisensor schemes [78]. Similar imaging
category, the acquisition hardware is limited by construction to hardware architectures have been developed in [79], [80].
measure directly in a transform domain. The most relevant ex- An additional example of a configurable acquisition device is
amples are magnetic resonance imaging (MRI) [72] and tomo- the random sampling ADC [81], which is designed for acquisi-
graphic imaging [73], as well as optical microscopy [74], [75]; tion of periodic, multitone analog signals whose frequency com-
in all of these cases, the measurements obtained from the hard- ponents belong to a uniform grid (similarly to the random de-
ware correspond to coefficients of the images 2-D continuous modulator of Section IV-B-3). The sparsity basis for the signal
Fourier transform, albeit not typically selected in a random- once again is chosen to be the discrete Fourier transform basis.
ized fashion. Since the Fourier functions, corresponding to si- The measurement basis is chosen to be the identity basis, so
nusoids, will be incoherent with functions that have localized that the measurements correspond to standard signal samples
support, this imaging approach works well in practice for spar- taken at randomized times. The hardware implementation em-
sity/compressibility transforms such as wavelets [69], total vari- ploys a randomized clock that drives a traditional low-rate ADC
ation [73], and the standard canonical representation [74]. to sample the input signal at specific times. As shown in Fig. 2,
4062 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

Fig. 3. Block diagram of the random demodulator (taken from [84]).

In words, using as a mixing matrix sums together intervals


of adjacent coefficients of the signal under the transform .
We also use a diagonal modulation matrix whose nonzero
entries are independently drawn from a Rademacher distribu-
Fig. 2. Diagram of the random sampling ADC. A pseudorandom clock gen- tion, and formulate our mixing matrix as .
erated by an FPGA drives a low-rate standard ADC to sample the input signal Theorem 16 [83, Theorem 1]: Let be a structurally sub-
at predetermined pseudorandom times. The samples obtained are then fed into
a CS recovery algorithm or a sketching algorithm for Fourier sparse signals to sampled matrix of size obtained from the basis and the
estimate the frequencies and amplitudes of the tones present in the signal (taken mixing matrix via randomized subsampling.
from [81]). Then for each integer , any and any ,
there exist absolute positive constants and such that if

the random clock is implemented using an FPGA that outputs (26)


a predetermined pseudorandom pulse sequence indicating the
times at which the signal is sampled. The patterns are timed ac- then the matrix has the -RIP with probability at least
cording to a set of pseudorandom arithmetic progressions. This .
process is repeated cyclically, establishing a windowed acquisi- Similarly to the subsampled incoherent bases case, the pos-
tion for signals of arbitrary length. The measurement and spar- sible values of provide us with a required number of
sity framework used by this randomized ADC is also compatible measurements ranging from to .
with sketching algorithms designed for signals that are sparse in 3) Applications: Compressive ADCs are one promising ap-
the frequency domain [81], [82]. plication of CS, which we discuss in detail in Section VI-B-6
after introducing the infinite-dimensional CS framework. A first
B. Structurally Subsampled Matrices step in this direction is the architecture known as the random
demodulator (RD) [84]. The RD employs structurally subsam-
In certain applications, the measurements obtained by the pled matrices for the acquisition of periodic, multitone analog
acquisition hardware do not directly correspond to the sensed signals whose frequency components belong in a uniform grid.
signals coefficients in a particular transform. Rather, the obser- Such signals have a finite parametrization and therefore fit the
vations are a linear combination of multiple coefficients of the finite-dimensional CS setting.
signal. The resulting CS matrix has been termed a structurally To be more specific, our aim is to acquire a discrete uniform
subsampled matrix [83]. sampling of a continuous-time signal observed
1) Formulation: Consider a matrix of available measurement during an acquisition interval of 1 s, where it is assumed that
vectors that can be described as the product , where is of the form
is a mixing matrix and is a basis. The CS matrix
is obtained by selecting out of rows at random, and nor-
malizing the columns of the resulting subsampled matrix. There (27)
are two possible downsampling stages: first, might offer only
mixtures to be available as measurements; second, we The vector is sparse so that only of its values are nonzero.
only preserve of the mixtures available to represent As shown in Fig. 3, the sampling hardware consists of a mixer
the signal. This formulation includes the use of subsampled in- element that multiplies the signal with a chipping sequence
coherent bases simply by letting and , i.e., no at a rate chirps/s. This chipping sequence is the output
coefficient mixing is performed. of a pseudorandom number generator. The combination of these
2) Theoretical Guarantees: To provide theoretical guar- two units effectively performs the product of the Nyquist-sam-
antees we place some additional constraints on the mixing pled discrete representation of the signal with the matrix in
matrix . the analog domain. The output from the modulator is sampled
Definition 5: The integrator matrix , for a divisor by an integrate-and-dump samplera combination of an ac-
of , is defined as , where the row of cumulator unit and a uniform sampler synchronized with each
is defined as , , otherat a rate of samples per interval. Such sampling effec-
with . tively performs a multiplication of the output of the modulator
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4063

by the integrator matrix . To complete the setup, we set


and to be the discrete Fourier transform basis; it is easy
to see that these two bases are maximally incoherent. In the RD
architecture, all subsampling is performed at this stage; there is
no randomized subsampling of the output of the integrator.
A prototype implementation of this architecture is reported
in [85]; similar designs are proposed in [5] and [86]. Note that
the number of columns of the resulting CS matrix scales with
the maximal frequency in the representation of . Therefore,
in practice, this maximal frequency cannot be very large. For
example, the implementations reported above reach maximum
Fig. 4. Compressive imaging via coded aperture. (a) Example of an ideal point
frequency rates of 1 MHz; the corresponding signal represen- spread function for a camera with a pinhole aperture. The size of the support of
tation therefore has one million coefficients. The matrix mul- the function is dependent on the amount of blur caused by the imager. (b) Coded
tiplications can be implemented using algorithmic tools such aperture used for compressive imaging (taken from [94]).
as the fast Fourier transform (FFT). In conjunction with certain
optimization solvers and greedy algorithms, this approach sig-
Theorem 17 [91]: Let be a subsampled circulant matrix
nificantly reduces the complexity of signal recovery. While the
whose distinct entries are independent random variables fol-
original formulation sets the sparsity basis to be the Fourier
lowing a Rademacher distribution, and is an arbitrary
basis, limiting the set of recoverable signals to periodic mul-
identity submatrix. Furthermore, let be the smallest value for
titone signals, we can move beyond structurally subsampled
which (7) holds for all . Then for we have
matrices by using redundant Fourier domain frames for spar-
provided that
sity. These frames, together with modified recovery algorithms,
can enable acquisition of a wider class of frequency-sparse sig-
nals [87], [88].
While the RD is capable of implementing CS acquisition for
a specific class of analog signals having a finite parametrization, where is a universal constant. Furthermore, for
the CS framework can be adapted to infinite-dimensional signal ,
models, enabling more efficient analog acquisition and digital
processing architectures. We defer the details to Section VI. where

C. Subsampled Circulant Matrices for a universal constant .


In words, if we have , then the RIP
The use of Toeplitz and circulant structures [89][91] as CS required for by many algorithms for successful recovery is
matrices was first inspired by applications in communications achieved with high probability.
including channel estimation and multi-user detectionwhere 3) Applications: There are several sensing applications
a sparse prior is placed on the signal to be estimated, such as where the signal to be acquired is convolved with the sampling
a channel response or a multiuser activity pattern. When com- hardwares impulse response before it is measured. Addition-
pared with generic CS matrices, subsampled circulant matrices ally, because convolution is equivalent to a product operator in
have a significantly smaller number of degrees of freedom due to the Fourier domain, it is possible to speed up the CS recovery
the repetition of the matrix entries along the rows and columns. process by performing multiplications by the matrices and
1) Formulation: A circulant matrix is a square matrix via the FFT. In fact, such an FFT-based procedure can
where the entries in each diagonal are all equal, and where the also be exploited to generate good CS matrices [92]. First, we
first entry of the second and subsequent rows is equal to the last design a matrix in the Fourier domain to be diagonal and
entry of the previous row. Since this matrix is square, we per- have entries drawn from a suitable probability distribution.
form random subsampling of the rows (in a fashion similar to Then, we obtain the measurement matrix by subsampling
that described in Section IV-B) to obtain a CS matrix , the matrix , similarly to the incoherent basis case. While
where is an subsampling matrix, i.e., a submatrix of this formulation assumes that the convolution during signal
the identity matrix. We dub a subsampled circulant matrix. acquisition is circulant, this gap between theory and practice
2) Theoretical Guarantees: Even when the sequence has been studied and shown to be controllable [93].
defining is drawn at random from the distributions described Our first example concerns imaging architectures. The im-
in Section III, the particular structure of the subsampled circu- pulse response of the imaging hardware is known as the point
lant matrix prevents the use of the proof techniques spread function (PSF), and it represents imaging artifacts such
used in standard CS, which require all entries of the matrix as blurring, distortion, and other aberrations; an example is
to be independent. However, it is possible to employ different shown in Fig. 4(a). It is possible then to design a compressive
probabilistic tools to provide guarantees for subsampled cir- imaging architecture by employing an imaging device that has a
culant matrices. The results still require randomness in the dense PSF: an impulse response having a support that includes
selection of the entries of the circulant matrix. all of the imaging field. This dense PSF is coupled with a
4064 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

a pseudorandom fashion, thus performing the subsampling re-


quired for CS. This results in an effective subsampled circulant
CS matrix.

D. Separable Matrices
Separable matrices [96], [97] provide computationally
efficient alternatives to measure very large signals, such as hy-
percube samplings from multidimensional data. These matrices
have a succinct mathematical formulation as Kronecker prod-
ucts, and feature significant structure present as correlations
among matrix subblocks.
1) Formulation: Kronecker product bases are well suited for
CS applications concerning multidimensional signals, that is,
signals that capture information from an event that spans mul-
tiple dimensions, such as spatial, temporal, spectral, etc. These
bases can be used both to obtain sparse or compressible repre-
sentations or as CS measurement matrices.
The Kronecker product of two matrices and of sizes
and , respectively, is defined as

.. .. .. ..
. . . .

Thus, is a matrix of size . Let and be


Fig. 5. Schematic for the microelectronic architecture of the convolution-based bases for and , respectively. Then one can find a basis
compressive imager. The acquisition device effectively implements a subsam-
pled circulant CS matrix. Here, Q denotes the quantization operation with for as . We focus now on
1
resolution (taken from [95]). CS matrices that can be expressed as the Kronecker product of
matrices:

downsampling step in the pixel domain to achieve compressive (28)


data acquisition [93], [94]. This is in contrast to the random
sampling advocated by Theorem 17. A popular way to achieve If we denote the size of as , then the matrix has
such a dense PSF is by employing so-called coded apertures, size , with and .
which change the pinhole aperture used for image formation Next, we describe the use of Kronecker product matrices
in most cameras to a more complex design. Fig. 4(b) shows in CS of multidimensional signals. We denote the entries
an example coded aperture that has been used successfully in of a -dimensional signal by . We call
compressive imaging [93], [94]. the restriction of a multidimensional signal to fixed indices
Our second example uses special-purpose light sensor arrays for all but its dimension a -section of the signal. For
that perform the multiplication with a Toeplitz matrix using a example, for a 3-D signal , the vector
custom microelectronic architecture [95], which is shown in is a 3-section of .
Fig. 5. In this architecture, an pixel light sensor array is Kronecker product sparsity bases make it possible to simul-
coupled with a linear feedback shift register (LFSR) of length taneously exploit the sparsity properties of a multidimensional
whose input is connected to a pseudorandom number gen- signal along each of its dimensions. The new basis is simply the
erator. The bits in the LFSR control multipliers that are tied to Kronecker product of the bases used for each of its -sections.
the outputs from the light sensors, thus performing addi- More formally, we let
tions and subtractions according to the pseudorandom pattern and assume that each -section is sparse or compress-
generated. The weighted outputs are summed in two stages: ible in a basis . We then define a sparsity/compressibility
column-wise sums are obtained at an operational amplifier be- basis for as , and obtain a coefficient
fore quantization, whose outputs are then added together in an vector for the signal ensemble so that , where is
accumulator. In this way, the microelectronic architecture cal- a column vector-reshaped representation of . We then have
culates the inner product of the images pixel values with the .
sequence contained in the LFSR. The output of the LFSR is 2) Theoretical Guarantees: Due to the similarity between
fed back to its input after the register is full, so that subsequent blocks of Kronecker product CS matrices, it is easy to obtain
measurements correspond to inner products with shifts of pre- bounds for their performance metrics. Our first result concerns
vious sequences. The output of the accumulator is sampled in the RIP.
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4065

Lemma 3 [96], [97]: Let , , be matrices that


have the -RIP, , respectively. Then , defined
in (28), has the -RIP, with

(29)

When is an orthonormal basis, it has the -RIP with


for all . Therefore the RIP constant of the
Kronecker product of an orthonormal basis and a CS matrix is
equal to the RIP constant of the CS matrix.
It is also possible to obtain results on mutual coherence (de-
scribed in Section III) for cases in which the basis used for spar-
sity or compressibility can also be expressed as a Kronecker
product.
Lemma 4 [96], [97]: Let , be bases for for
. Then

(30) Fig. 6. Schematic for the compressive analog CMOS imager. The acquisition
device effectively implements a separable CS matrix (taken from [99]).

Lemma 4 provides a conservation of mutual coherence across


Kronecker products. Since the mutual coherence of each -sec- image is partitioned into blocks of size to form
tions sparsity and measurement bases is upper bounded by one, a set of tiles; here indexes the block locations. The transform
the number of Kronecker product-based measurements neces- imager is designed to perform the multiplication ,
sary for successful recovery of the multidimensional signal (fol- where and are fixed matrices. This product of three matrices
lowing Theorems 14 and 15) is always lower than or equal to the can be equivalently rewritten as , where
corresponding number of necessary partitioned measurements denotes a column vector reshaping of the block [100]. The
that process only a portion of the multidimensional signal along CS measurements are then obtained by randomly subsampling
its dimension at a time, for some . This re- the vector . The compressive transform imager sets and
duction is maximized when the -section measurement basis to be noiselet bases and uses a 2-D undecimated wavelet trans-
is maximally incoherent with the -section sparsity basis . form to sparsify the image.
3) Applications: Most applications of separable CS matrices
involve multidimensional signals such as video sequences V. STRUCTURE IN FINITE-DIMENSIONAL MODELS
and hyperspectral datacubes. Our first example is an exten- Until now we focused on structure in the measurement ma-
sion of the single pixel camera (see Fig. 1) to hyperspectral trix , and considered signals with finite-dimensional sparse
imaging [98]. The aim here is to record the reflectivity of a representations. We now turn to discuss how to exploit structure
scene at different wavelengths; each wavelength has a corre- beyond sparsity in the input signal in order to reduce the number
sponding spectral frame, so that the hyperspectral datacube of measurements needed to faithfully represent it. Generalizing
can be essentially interpreted as the stacking of a sequence the notion of sparsity will allow us to move away from finite-di-
of images corresponding to the different spectra. The single mensional models extending the ideas of CS to reduce sampling
pixel hyperspectral camera is obtained simply by replacing the rates for infinite-dimensional continuous-time signals, which is
single photodiode element by a spectrometer, which records the main goal of sampling theory.
the intensity of the light reaching the sensor for a variety of We begin our discussion in this section within the finite-di-
different wavelenghts. Because the digital micromirror array mensional context by adding structure to the nonzero values of
reflects all the wavelengths of interest, the spectrometer records . We will then turn, in Section VI, to more general, infinite-di-
measurements that are each dependent only on a single spec- mensional notions of structure, that can be applied to a broader
tral band. Since the same patterns are acting as a modulation class of analog signals.
sequence to all the spectral bands in the datacube, the resulting
measurement matrix is expressible as , where A. Multiple Measurement Vectors
is the matrix that contains the patterns programmed into the Historically, the first class of structure that has been consid-
mirrors. Furthermore, it is possible to compress a hyperspectral ered within the CS framework has been that of multiple mea-
datacube by using a hyperbolic wavelet basis, which itself is surement vectors (MMVs) [101] which, similarly to sparse ap-
obtained as the Kronecker product of one-dimensional wavelet proximation, has been an area of interest in signal processing for
bases [96]. more than a decade [21]. In this setting, rather than trying to re-
Our second example concerns the transform imager [99], an cover a single sparse vector , the goal is to jointly recover a set
imaging hardware architecture that implements a separable CS of vectors that share a common support. Stacking these
matrix. A sketch of the transform imager is shown in Fig. 6. The vectors into the columns of a matrix , there will be at most
4066 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

nonzero rows in . That is, not only is each vector -sparse, the SMV setting, two main approaches to solving MMV prob-
but the nonzero values occur on a common location set. We use lems are based on convex optimization and greedy methods. The
the notation to denote the set of indices of the analogue of (10) in the MMV case is
nonzero rows.
MMV problems appear quite naturally in many different subject to (32)
applications areas. Early work on MMV algorithms focused
on magnetoencephalography, which is a modality for imaging where we define the norms for matrices as
the brain [21], [102]. Similar ideas were also developed in the
context of array processing [21], [103], [104], equalization
of sparse communication channels [105], [106], and more (33)
recently cognitive radio and multiband communications [9],
[86], [107][109]. with denoting the th row of . With a slight abuse of no-
1) Conditions on Measurement Matrices: As in standard CS, tation, we also consider the quasi-norms with such that
we assume that we are given measurements , where each for any . Optimization based algorithms
vector is of length . Letting be the matrix with relax the norm in (32) and attempt to recover by mixed
columns , our problem is to recover assuming a known mea- norm minimization [17], [112], [114][117]:
surement matrix so that . Clearly, we can apply any
CS method to recover from as before. However, since the subject to (34)
vectors all have a common support, we expect intuitively
to improve the recovery ability by exploiting this joint informa-
for some ; values of and have been
tion. In other words, we should in general be able to reduce the
advocated.
number of measurements needed to represent below ,
The standard greedy approaches in the SMV setting have
where is the number of measurements required to recover one
also been extended to the MMV case [101], [110], [114],
vector for a given matrix .
[118][120]. The basic idea is to replace the residual vector
Since , the rank of satisfies . When
by a residual matrix , which contains the residuals with re-
, all the sparse vectors are multiples of each
spect to each of the measurements, and to replace the surrogate
other, so that there is no advantage to their joint processing.
vector by the -norms of the rows of . For example,
However, when is large, we expect to be able to ex-
making these changes to OMP (Algorithm 1) yields a variant
ploit the diversity in its columns in order to benefit from joint
known as simultaneous orthogonal matching pursuit, shown
recovery. This essential result is captured by the following nec-
as Algorithm 3, where denotes the restriction of to the
essary and sufficient uniqueness condition:
rows indexed by .
Theorem 18 [110]: A necessary and sufficient condition for
the measurements to uniquely determine the jointly Algorithm 3: Simultaneous Orthogonal Matching Pursuit
sparse matrix is that
spark Input: CS matrix , MMV matrix
(31) Output: Row-sparse matrix
The sufficiency result was initially shown for the case Initialize: , , ,
[111]. As shown in [110], [112], we can while halting criterion false do
replace by in (31). The sufficient direction
of this condition was shown in [113] to hold even in the case
where there are infinitely many vectors . A direct conse- , {form residual matrix
-norm vector}
quence of Theorem 18 is that matrices with larger rank can be
recovered from fewer measurements. Alternatively, matrices {update row support
with larger support can be recovered from the same number of with index of residuals row with largest magnitude}
measurements. When and spark takes on its , {update signal
largest possible value equal to , condition (31) becomes estimate}
. Therefore, in this best-case scenario, only {update measurement residual}
measurements per signal are needed to ensure uniqueness. This
end while
is much lower than the value of obtained in standard CS via
the spark (cf. Theorem 4), which we refer to here as the single return
measurement vector (SMV) setting. Furthermore, as we now
show, in the noiseless setting can be recovered by a simple An alternative MMV strategy is the ReMBo (reduce MMV
algorithm, in contrast to the combinatorial complexity needed and boost) algorithm [113]. ReMBo first reduces the problem
to solve the SMV problem from measurements for general to an SMV that preserves the sparsity pattern, and then recovers
matrices . the signal support set; given the support, the measurements can
2) Recovery Algorithms: A variety of algorithms have been be inverted to recover the input. The reduction is performed by
proposed that exploit the joint sparsity in different ways. As in merging the measurement columns with random coefficients.
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4067

The details of the approach together with a recovery guarantee where indicates an appropriate index set (which can be count-
are given in the following theorem. able or uncountable). Here again the assumption is that the set
Theorem 19: Let be the unique -sparse solution matrix of vectors all have common support of size .
of and let have spark greater than . Let be a Such an infinite-set of equations is referred to as an infinite mea-
random vector with an absolutely continuous distribution and surement vector (IMV) system. The common, finite, support set
define the random vectors and . Consider the can be exploited in order to recover efficiently by solving
random SMV system . Then: an MMV problem [113] Reduction to a finite MMV counterpart
1) for every realization of , the vector is the unique is performed via the continuous to finite (CTF) block, which
-sparse solution of the SMV; aims at robust detection of . The CTF builds a frame (or a
2) with probability one. basis) from the measurements using
According to Theorem 19, the MMV problem is first reduced
to an SMV counterpart, for which the optimal solution is
found. We then choose the support of to be equal to that of
, and invert the measurement vectors over this support. In (37)
practice, computationally efficient methods are used to solve the
SMV counterpart, which can lead to recovery errors in the pres- Typically, is constructed from roughly snapshots ,
ence of noise, or when insufficient measurements are taken. By and the (optional) decomposition allows removal of the noise
repeating the procedure with different choices of , the empir- space [109]. Once the basis is determined, the CTF solves
ical recovery rate can be boosted significantly [113], and lead to an MMV system with . An alternative
superior performance over alternative MMV methods. approach based on the MUSIC algorithm was suggested in [111]
The MMV techniques discussed so far are rank blind, namely, and [122].
they do not explicitly take the rank of , or that of , into The crux of the CTF is that the indices of the nonidentically
account. Theorem 18 highlights the role of the rank of in the zero rows of the matrix that solves the finite underdetermined
recovery guarantees. If and (31) is satisfied, then system coincide with the index set that is associated
every columns of are linearly independent. This, in turn, with the infinite signal-set [113], as incorporated in the
means that , where denotes the column following theorem.
range of the matrix . We can therefore identify the support of Theorem 21 [109]: Suppose that the system of (36) has a
by determining the columns that lie in . One way unique -sparse solution set with support , and that the ma-
to accomplish this is by minimizing the norm of the projections trix satisfies (31). Let be a matrix whose column span is
onto the orthogonal complement of : equal to the span of . Then, the linear system
(35) has a unique -sparse solution with row support
equal .
where is the orthogonal projection onto the range of . Once is found, the IMV system reduces to
The objective in (35) is equal to zero if and only if . Since, , which can be solved simply by computing the
by assumption, the columns of are linearly independent, pseudo-inverse .
once we find the support we can determine as . 3) Performance Guarantees: In terms of theoretical guaran-
We can therefore formalize the following guarantee. tees, it can be shown that MMV extensions of SMV algorithms
Theorem 20 [110]: If and (31) holds, then will recover under similar conditions to the SMV setting in
the algorithm (35) is guaranteed to recover from exactly. the worst-case scenario [17], [112], [119] so that theoretical
In the presence of noise, we choose the values of for equivalence results for arbitrary values of do not predict any
which the expression (35) is minimized. Since (35) leverages the performance gain with joint sparsity. In practice, however, mul-
rank to achieve recovery, we say that this method is rank aware. tichannel reconstruction techniques perform much better than
More generally, any method whose performance improves with recovering each channel individually. The reason for this dis-
increasing rank will be termed rank aware. It turns out that it crepancy is that these results apply to all possible input signals,
is surprisingly simple to modify existing greedy methods, such and are therefore worst-case measures. Clearly, if we input the
as OMP and thresholding, to be rank aware: instead of taking same signal to each channel, namely when , no ad-
inner products with respect to or the residual , at each stage ditional information on the joint support is provided from mul-
the inner products are computed with respect to an orthonormal tiple measurements. However, as we have seen in Theorem 18,
basis for the range of or [110]. higher ranks of the input improve the recovery ability. In par-
The criterion in (35) is similar in spirit to the MUSIC algo- ticular, when , rank-aware algorithms such as
rithm [121], popular in array signal processing, which also ex- (35) recover the true value of from the minimal number of
ploits the signal subspace properties. As we will see below in measurements given in Theorem 18. This property is not shared
Section VI, array processing algorithms can be used to treat a by the other MMV methods.
variety of other structured analog sampling problems. Another way to improve performance guarantees is by posing
The MMV model can be further extended to include the case a probability distribution on and developing conditions under
in which there are possibly infinitely many measurement vectors which is recovered with high probability [101], [118], [119],
[123]. Average case analysis can be used to show that fewer
(36) measurements are needed in order to recover exactly [119].
4068 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

temporal regularity on the brain activity [124]. Such temporal


regularization can correct estimation errors that appear as
temporal spikes in EEG/MEG activity. The example in Fig. 7
shows a test MEG activation signal with three active vertices
peaking at separate time instances. A 306-sensor acquisition
configuration was simulated with 3 dB. The perfor-
mance of MEG inversion with independent recovery of each
time instance exhibits spurious activity detections that are
removed by the temporal regularization enforced by the MMV
model. Additionally, the accuracy of the temporal behavior for
each vertex is improved; see [124] for details.
MMV models are also used during CS recovery for certain
infinite-dimensional signal models [18]. We will discuss this ap-
plication in more detail in Section VI-B.

B. Unions of Subspaces

To introduce more general models of structure on the input


signal, we turn now to extend the notion of sparsity to a much
Fig. 7. Example performance of the MMV model for EEG inversion with broader class of signals which can incorporate both finite-di-
temporal regularization. The setup consists of a configuration of 306 sensors
recording simulated brain activity consisting of three vertices peaking at mensional and infinite-dimensional signal representations. The
distinct times over a 120-ms period. The figures show (a) the ground truth enabling property that allows recovery of a sparse vector
EEG activity, (b) the inversion obtained by applying MMV recovery on all the from measurements is the fact that the set cor-
data recorded by the sensors, and (c) the inversion obtained by independently
applying sparse recovery on the data measured at each particular time instance. responds to a union of -dimensional subspaces within .
In each case, we show three representative time instances (30, 65, and 72 ms, More specifically, if we know the nonzero locations of ,
respectively). The results show that MMV is able to exclude spurious activity then we can represent by a -dimensional vector of coeffi-
detections successfully through implicit temporal regularization (taken from
[124]). cients and therefore only measurements are needed in order
to recover it. Therefore, for each fixed location set, is restricted
to a -dimensional subspace, which we denote by . Since the
location set is unknown, there are possible subspaces in
Theorem 22 [119]: Let be drawn from a
which may reside. Mathematically, this means that sparse sig-
probabilistic model in which the indices for its nonzero
nals can be represented by a union of subspaces [15]:
rows are drawn uniformly at random from the possi-
bilities and its nonzero rows (when concatenated) are given
by , where is an arbitrary diagonal matrix and each (38)
entry of is an i.i.d. standard Gaussian random variable. If
then can be recovered exactly from where each subspace , , is a -dimensional sub-
via (34), with and , with high probability. space associated with a specific set of nonzero values.
In a similar fashion to the SMV case (cf. Theorem 13), while For canonically sparse signals, the union is composed of
worst-case results limit the sparsity level to , av- canonical subspaces that are aligned with out of the co-
erage-case analysis shows that sparsity up to order ordinate axes of . Allowing for more general choices of
may enable recovery with high probability. Moreover, under a leads to powerful representations that accommodate many inter-
mild condition on the sparsity and on the matrix , the failure esting signal priors. It is important to note that union models are
probability decays exponentially in the number of channels not closed under linear operations: The sum of two signals from
[119]. a union is generally no longer in . This nonlinear behavior
4) Applications: The MMV model has found several of the signal set renders sampling and recovery more intricate.
applications in the applied CS literature. One example is To date, there is no general methodology to treat all unions in a
in the context of electroencephalography and magnetoen- unified manner. Therefore, we focus our attention on some spe-
cephalography (EEG/MEG) [21], [102]. As mentioned earlier, cific classes of union models, in order of complexity.
sparsity-promoting inversion algorithms have been popular in The simplest class of unions result when the number of sub-
EEG/MEG literature due to their ability to accurately localize spaces comprising the union is finite, and each subspace has finite
electromagnetic source signals. It is also possible to further dimensions. We call this setup a finite union of subspaces (FUS)
improve estimation performance by introducing temporal model. Within this class, we consider below two special cases:
regularization when a sequence of EEG/MEG is available. For Structured sparse supports: This class consists of sparse
example, one may apply the MMV model on the measurements vectors that meet additional restrictions on the support (i.e.,
obtained over a coherent time period, effectively enforcing the set of indices for the vectors nonzero entries). This
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4069

corresponds to only certain subspaces out of the measurements that are needed to code the exact subspace where
subspaces present in being allowed [20].4 the sparse signal resides. We now specialize this result to the
Sparse sums of subspaces where each subspace com- two FUS classes given earlier.
prising the union is a direct sum of low-dimensional In the structured sparse supports case, corresponds to
subspaces [17] the number of distinct -sparse supports allowed by the
constraints, with ; this implies a reduction in
(39)
the number of measurements needed as compared to the
traditional sparsity model. Additionally, , since
Here, are a given set of subspaces each subspace has dimension .
with dimensions , and denotes a For the sparse sum of subspaces setting, we focus on the
sum over indices. Thus, each subspace corresponds case where each is of dimension ; we then have
to a different choice of subspaces that comprise the . Using the approximation
sum. The dimensionality of the signal representation in this
case will be ; for simplicity, we will often (41)
let for all so that . As we show, this
model leads to block sparsity in which certain blocks in
we conclude that for a given fraction of nonzeros ,
a vector are zero, and others are not [120], [133], [134].
roughly measurements
This framework can model standard sparsity by letting ,
are needed. For comparison, to satisfy the standard RIP
be the one-dimensional subspace spanned by
a larger number is required. The FUS
the canonical vector.
model reduces the total number of subspaces and there-
The two FUS cases above can be combined to allow only certain
fore requires times less measurements to code the signal
sums of subspaces to be part of the union .
subspace. The subspace dimension equals the
More complicated is the setting in which the number of possi-
number of degrees of freedom in .
bilities is still finite while each underlying subspace has infinite
Since the number of nonzeros is the same regardless of the spar-
dimensions. Finally, there may be infinitely many possibilities
sity structure, the term is not reduced in either
of finite or infinite subspaces. These last two classes allow us to
setting.
treat different families of analog signals, and will be considered
There also exists an extension of the coherence property to
in more detail in Section VI.
the sparse sum of subspaces case [120]. We must first elaborate
1) Conditions on Measurement Matrices: Guarantees for
on the notation for this setup. Given a signal our goal is
signal recovery using a FUS model can be developed by ex-
to recover it from measurements where is an appro-
tending tools used in the standard sparse setting. For example,
priate measurement matrix of size . To recover from
the -RIP for FUS models [15][17], [20] is similar to the
we exploit the fact that any can be represented in an
standard RIP where instead of the inequalities in (7) having to
appropriate basis as a block-sparse vector [135]. This follows
be satisfied for all sparse vectors , they have to hold
by first choosing a basis for each of the subspaces . Then,
only for vectors . If the constant is small enough, then
any can be expressed as
it can be shown that recovery algorithms tailored to the FUS
model will recover the true underlying vector . (42)
An important question is how many samples are needed
roughly in order to guarantee stable recovery. This question is
addressed in the following proposition [16], [17], [20], [46]. following the notation of (39), where are the repre-
Proposition 1 [46, Theorem 3.3]: Consider a matrix sentation coefficients in . Let be the column concatenation
of size with entries independently drawn from a of , and denote by the th subblock of a length- vector
sub-Gaussian distribution, and let be composed of sub- over . The th subblock is of length , and
spaces of dimension . Let and be constant the blocks are formed sequentially so that
numbers. If
(43)
(40)

If for a given the th subspace does not appear in the sum


then satisfies the -RIP with probability at least . (39), then . Finally, we can write , where there
As observed in [46], the first term in (40) has the dominant are at most nonzero blocks . Consequently, our union
impact on the required number of measurements for sparse sig- model is equivalent to the model in which is represented by a
nals in an asymptotic sense. This term quantifies the amount of block-sparse vector . The CS measurements can then be written
4While the description of structured sparsity given via unions of subspaces is as , where we denoted . An ex-
deterministic, there exists many different CS recovery approaches that leverage ample of a block-sparse vector with is depicted in Fig. 8.
probabilistic models designed to promote structured sparsity, such as Markov When for all , block sparsity reduces to conventional
random fields, hidden Markov Trees, etc. [125][131], as well as deterministic
approaches that rely on structured sparsity-inducing norm minimization [17], sparsity in the dictionary . We will say that a vector
[132]. is block -sparse if is nonzero for at most indices .
4070 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

sparse support models [20], [142][144]. For example, the ap-


proximation algorithm under block sparsity is equivalent to
block thresholding, with the blocks with the largest energies
=
Fig. 8. A block-sparse vector  over I fd ; . . . ; d g. The gray areas repre- (or norms) being preserved; we will show another example
sent ten nonzero entries that occupy two blocks. in greater detail later in this section.
For the sparse sum of subspaces setting, it is possible to for-
mulate optimization-based algorithms for signal recovery. A
The block-sparse model we present here has also been studied convex recovery method can be obtained by minimizing the sum
in the statistical literature, where the objective is often quite of the energy of the blocks . To write down the problem ex-
different. Examples include group selection consistency [136], plicitly, we define the mixed norm as
[137], asymptotic prediction properties [137], [138], and block
sparsity for logistic regression [139]. Block sparsity models are (48)
also interesting in their own right (namely, not only as an equiva-
lence with an underlying union), and appear naturally in several
problems. Examples include DNA microarray analysis [140], We may then recover by solving [17], [134], [136], [137]
[141], equalization of sparse communication channels [105],
and source localization [104]. subject to (49)
We now introduce the adaptation of coherence to sparse sums
of subspaces: the block-coherence of a matrix is defined as The optimization constraints can be relaxed to address the case
of noisy measurements, in a way similar to the standard BPIC
(44) algorithm. Generalizations of greedy algorithms to the block
sparse setting have been developed in [101] and [120].
with denoting the spectral norm of the matrix . Here 3) Recovery Guarantees: Many recovery methods for the
is represented as a concatenation of column-blocks of size FUS model inherit the guarantees of their standard counterparts.
: Our first example deals with the model-based CoSaMP algo-
rithm. Since CoSaMP requires RIP of order , here we must
(45) rely on enlarged unions of subspaces.
Definition 6: For an integer and a FUS model , denote
the -sum of as the sum of subspaces
When , as expected, . More generally,
. (50)
While quantifies global properties of the matrix ,
local properties are characterized by the subcoherence of , de-
fined as following the notation of (39), which contains all sums of
signals belonging in .
(46) We can then pose the following guarantee.
Theorem 23 [20]: Let and let be a
We define for . In addition, if the columns of set of noisy CS measurements. If has the -RIP with
are orthogonal for each , then . , then the signal estimate obtained from iteration of
2) Recovery Algorithms: Like in the MMV setting, it is the model-based CoSaMP algorithm satisfies
possible to extend standard sparsity-based signal recovery
algorithms to the FUS model. For example, greedy algorithms (51)
may be modified easily by changing the thresholding
One can also show that under an additional condition on the
(which finds the best approximation of in the union of sub-
algorithm is stable to signal mismodeling [20]. Similar guaran-
spaces ) to a structured sparse approximation step:
tees exist for the model-based IHT algorithm [20].
(47) Guarantees are also available for the optimization-based ap-
proach used in the sparse sum of subspaces setting.
For example, the CoSaMP algorithm (see Algorithm 2) is modi- Theorem 24 [17]: Let and let be
fied according to the FUS model [20] by changing the following a set of noisy CS measurements, with . If has the
two steps: -RIP with , then the signal estimate
1) prune residual: ; obtained from (49) with relaxed constraints
2) prune signal: . satisfies
A similar change can be made to the IHT algorithm (15) to ob-
tain a model-based IHT variant: (52)

where is defined in (47), and is a sparse sum of sub-


spaces.
Structured sparse approximation algorithms of the form (47) are Finally, we point out that recovery guarantees can also be
feasible and computationally efficient for a variety of structured obtained based on block coherence.
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4071

Fig. 9. The class of tree-sparse signals is an example of a FUS that enforces structure in the sparse signal support. (a) Example of a tree-structured sparse support
for a 1-D piecewise smooth signals wavelet coefficients (taken from [20]). (b) Example recovery of the N = 5122512 = 262 144-pixel Peppers test image from
M= 40 000 random measurements (15%) using standard CoSaMP ( SNR = )
17.45 dB . (c) Example recovery of the same image from the same measurements
using model-based CoSaMP for the tree-structured FUS model (SNR = )22.6 dB . In both cases, the Daubechies-8 wavelet basis was used as the sparsity/com-
pressibility transform.

Theorem 25 [120]: Both the greedy methods and the opti- Fig. 9(b) shows an example of improved signal recovery from
mization-based approach of (49) recover a block-sparse vector random measurements leveraging the FUS model.
from measurements if the block-coherence satisfies In our development so far, we did not consider any structure
within the subspaces comprising the unions. In certain applica-
(53) tions it may be beneficial to add internal structure. For example,
the coefficient vectors may themselves be sparse. Such sce-
In the special case in which the columns of are or- narios can be accounted for by adding an penalty in (49) on
thonormal for each , we have and therefore the the individual blocks [146], an approach that is dubbed C-Hi-
recovery condition becomes . Comparing to Lasso in [147]. This allows to combine the sparsity-inducing
Theorem 4 for recovery of a conventional -sparse signal we property of optimization at the individual feature level, with
can see that, thanks to , making explicit use of the block-sparsity property of (49) on the group level, obtaining
block-sparsity leads to guaranteed recovery for a potentially a hierarchically structured sparsity pattern. An example FUS
higher sparsity level. These results can also be extended to model featuring sparse sums of subspaces that exploits the ad-
both adversarial and random additive noise in the measure- ditional structure described above is shown in Fig. 10. This ex-
ments [145]. ample is based on identification of digits; a separate subspace is
4) Applications: A particularly interesting example of trained for each digit 0, , 9, and the task addressed is separa-
structured sparse supports corresponds to tree-structured sup- tion of a mixture of digits from subsampled information. We
ports [20]. For signals that are smooth or piecewise smooth, collect bases that span each of the 10 subspaces
including natural images, sufficiently smooth wavelet bases into a dictionary , and apply the FUS model where the sub-
provide sparse or compressible representations . Additionally, spaces considered correspond to mixtures of pairs of digits.
because each wavelet basis element acts as a discontinuity The FUS model allows for successful source identification and
detector in a local region of the signal at a certain scale, it is separation.
possible to link together wavelet basis elements corresponding
to a given location at neighboring scales, forming what are VI. STRUCTURE IN INFINITE-DIMENSIONAL MODELS
known as branches that connect at the coarsest wavelet scale. One of the prime goals of CS is to allow for reduced-rate sam-
The corresponding full graph is known as a wavelet tree. For pling of analog signals. In fact, many of the original papers in
locations where there exist discontinuities, the corresponding the field state this as a motivating drive for the theory of CS. In
branches of wavelet coefficients tend to be large, forming a this section we focus on union models that include a degree of
connected subtree inside the wavelet tree. Thus, the restriction infiniteness: This can come into play either by allowing for in-
of includes only the subspaces corresponding to finitely many subspaces, by letting the subspaces have infinite
this type of structure in the signals support. Fig. 9(a) shows dimension, or both. As we will show, the resulting models may
an example of a tree-sparse 1-D signal support for a wavelet be used to describe a broad class of structured continuous-time
coefficient vector. Since the number of possible supports signals. We will then demonstrate how such priors can be trans-
containing this structure is limited to for a constant lated into concrete hardware solutions that allow sampling and
, we obtain that random measurements are recovery of analog signals at rates far below that dictated by
needed to recover these signals using model-based recovery Nyquist.
algorithms. Additionally, there exist approximation algorithms The approach we present here to reduced-rate sampling is
to implement for this FUS model based on both greedy based on viewing analog signals in unions of subspaces, and
and optimization-based approaches; see [20] for more details. is therefore fundamentally different than previous attempts to
4072 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

Fig. 10. Example of recovered digits (3 and 5) from a mixture with 60% of missing components. From left to right: noiseless mixture, observed mixture with
missing pixels highlighted in red, recovered digits 3 and 5, and active set recovered for 180 different mixtures using the C-HiLasso and (13) respectively. The last
two figures show the active sets of the recovered coefficients vectors  as a matrix (one column per mixture), where black dots indicate nonzero coefficients. The
coefficients corresponding to the subspace bases for digits 3 and 5 are marked as pink bands. Notice that C-HiLasso efficiently exploits the FUS model, succeeding
in recovering the correct active groups in all the samples. The standard approach (13), which lacks this stronger signal model, clearly is not capable of doing so,
and active sets spread all over the groups (taken from [147]).

treat similar problems [42], [84], [104], [148], [149]. The latter The subspace of signals described by (54) has infinite dimen-
typically rely on discretization of the analog input and are often sions, since every signal is associated with infinitely many coef-
not concerned with the actual hardware. Furthermore, in some ficients . Any such signal can be recovered
cases the reduced rate analog stage results in high rate DSP. In from samples at a rate of ; one possible sampling paradigm at
contrast, in all of the examples below we treat the analog for- the minimal rate is given in Fig. 11. Here, is filtered with a
mulation directly, the DSP can be performed in real time at the bank of filters, each with impulse response which can
low rate, and the analog front end can be implemented in hard- be almost arbitrary, and the outputs are uniformly sampled with
ware using standard analog design components such as modula- period . Denote by a vector collecting the frequency re-
tors, low rate ADCs and lowpass filters (LPFs). A more detailed sponses of , . The signal is then recovered by
discussion on the comparison to discretized approaches can be first processing the samples with a filter bank with frequency re-
found in [10] and [150]. sponse , which depends on the sampling filters and the
In our review, we consider three main cases: generators (see [18] for details). In this way we obtain the
1) finite unions of infinite dimensional spaces; vectors
2) infinite unions of finite dimensional spaces;
3) infinite unions of infinite dimensional spaces. (55)
In each one of the three settings above there is an element that
can take on infinite values. We present general theory and results containing the frequency responses of the sequences ,
behind each of these cases, and focus in additional detail on a . Each output sequence is then modulated by a periodic
representative example application for each class. impulse train with period , followed by fil-
Before describing the three cases, we first briefly introduce tering with the corresponding analog filter .
the notion of sampling in shift-invariant (SI) subspaces, which In the ensuing subsections we consider settings in which fur-
plays a key role in the development of standard (subspace) sam- ther structure is incorporated into the generic SI model (54). In
pling theory [135], [151]. We then discuss how to incorporate Sections VI-B and VI-D, we treat signals of the form (54) in-
structure into SI settings, leading to the union classes outlined volving a small number of generators, chosen from a finite
above. or infinite set, respectively, while in Section VI-C, we consider
a finite-dimensional counterpart of (54) in which the generators
A. Shift-Invariant Spaces for Analog Signals are chosen from an infinite set. All of these examples lead to
union of subspaces models for analog signals. Our goal is to ex-
A signal class that plays an important role in sampling theory ploit the available structure in order to reduce the sampling rate.
are signals in SI spaces [151][154]. Such signals are charac- Before presenting the more detailed applications we point out
terized by a set of generators where in that in all the examples below the philosophy is similar: we de-
principle can be finite or infinite (as is the case in Gabor or velop an analog sensing stage that consists of simple hardware
wavelet expansions of ). Here, we focus on the case in which devices designed to spread out (alias) the signal prior to sam-
is finite. Any signal in such a SI space can be written as pling, in such a way that the samples contain energy from all
subspace components. The first step in the digital reconstruction
(54) stage identifies the underlying subspace structure. Recovery is
then performed in a subspace once the parameters defining the
subspace are determined. The difference between the examples
for some set of sequences and pe- is in how the aliasing is obtained and in the digital recovery step
riod . This model encompasses many signals used in commu- that identifies the subspace. This framework has been dubbed
nication and signal processing including bandlimited functions, Xampling in [9] and [10], which combines CS and sampling,
splines [151], multiband signals [108], [109], [155], [156], and emphasizing that this is an analog counterpart to discrete CS
pulse amplitude modulation signals. theory and methods.
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4073

B. Finite Union of Infinite-Dimensional Subspaces tionship between the measurements and the unknown
sequences [18]:

In this first case, we follow the model (38), (39) where is


composed of a finite number of subspaces, and each subspace (58)
has infinite dimension (i.e., for ).
1) Analog Signal Model: To incorporate structure into (54) where the vector collects the mea-
we proceed in two ways (see Section VI-D for the complement). surements at and the vector
In the first, we assume that only of the generators are collects the unknown generator coefficients for time period
active, leading to the model . Since only out of the sequences are identi-
cally nonzero by assumption, the vectors are jointly
-sparse. Equation (58) is valid when for all ;
(56)
otherwise, the samples must first be pre-filtered with the inverse
filter bank to obtain (58). Therefore, by properly choosing the
sampling filters, we have reduced the recovery problem to an
where the notation means a union (or sum) over at most
IMV problem, as studied in Section V-A-2. To recover ,
elements. If the active generators are known, then it suf-
we therefore rely on the CTF and then reconstruct by
fices to sample at a rate of corresponding to uniform samples
interpolating with their generators .
with period at the output of appropriate filters. However,
4) Recovery Guarantees: From Theorem 21, it follows that
a more difficult question is whether the rate can be reduced if
filters are needed in order to ensure recovery for all
we know that only of the generators are active but do not
possible input signals. In practice, since polynomial-time algo-
know in advance which ones. This is a special case of the model
rithms will be used to solve the equivalent MMV, we will need
(39) where now each of the subspaces is a single-generator
to increase slightly beyond .
SI subspace spanned by . For this model, it is possible to
Theorem 26 [18]: Consider a signal of the form (56). Let
reduce the sampling rate to as low as [18]. Such rate reduc-
, be a set of sampling filters defined by (57)
tion is achieved by using a bank of appropriate sampling filters
where is a size matrix. Then, the signal can be re-
that replaces the filters in Fig. 11, followed by postpro-
covered exactly using the scheme of Fig. 11 as long as allows
cessing via subspace reduction and solving an MMV problem
the solution of an MMV system of size with sparsity .
(cf. Section V-A). We now describe the main ideas underlying
Since our approach to recovering relies on solving an
this approach.
MMV system, the quality of the reconstruction depends directly
2) Compressive Signal Acquisition Scheme: A block
on the properties of the underlying MMV problem. Therefore,
diagram of the basic architecture is very similar to the
results regarding noise, mismodeling, and suboptimal recovery
one given in Fig. 11. We simply change the sampling
using polynomial techniques developed in the MMV context
filters to sampling filters
can immediately be adapted to this setting.
and replace the filter bank with
5) Example Application: We now describe an application
the CTF block with real-time recovery (cf. Section V-A-2).
of the general analog CS architecture of Fig. 11: sampling of
The design of the filters , relies on two main
multiband signals at sub-Nyquist rates. We also expand on
ingredients:
practical alternatives to this system, which reduce the hardware
1) a matrix chosen such that it solves a discrete MMV
complexity.
problem with sparsity ;
The class of multiband signals models a scenario in which
2) a set of functions which can be
consists of several concurrent radio-frequency (RF) trans-
used to sample and reconstruct the entire set of generators
missions. A receiver that intercepts a multiband sees the
according to the Nyquist-rate scheme
typical spectral support that is depicted in Fig. 12. We assume
of Fig. 11.
that the signal contains at most (symmetric) frequency bands
Based on these ingredients, the compressive sampling filters
with carriers , each of maximal width . The carriers are lim-
consist of linear combinations of , with coefficients
ited to a maximal frequency of . When the carrier frequen-
that depend on the matrix through [18]
cies are fixed, the resulting signal model can be described as
a subspace, and standard demodulation techniques may be used
(57) to sample each of the bands at low rate. A more challenging
scenario is when the carriers are unknown. This situation
where denotes the conjugate, concatenate the arises, for example, in spectrum sensing for mobile cognitive
frequency responses of and radio (CR) receivers [157], which aim at utilizing unused fre-
, respectively, and is a arbitrary invertible quency regions on an opportunistic basis.
matrix representing the discrete-time Fourier transform (DTFT) The Nyquist rate associated with is ,
of a bank of filters. Since this matrix can be chosen arbitrarily, which can be quite high in modern applicationson the order
it allows for freedom in selecting the sampling filters. of several GHz. On the other hand, by exploiting the multiband
3) Reconstruction Algorithm: Directly manipulating the structure, it can be shown that a lower bound on the sampling
expression for the sampled sequences leads to a simple rela- rate with unknown carriers is , as incorporated in the fol-
4074 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

Fig. 11. Sampling and reconstruction in shift-invariant spaces. A compressive signal acquisition scheme can be obtained by reducing the number of sampling
N p<N
filters from to and replacing the filter bank ( Ge ) with a CTF block with real-time recovery (taken from [18]).

slices are active, i.e., contain signal energy [18]. The conceptual
division to spectrum slices does not restrict the band positions;
a single band can split between adjacent slices.
One way to realize the sampling scheme of Fig. 11 is through
periodic nonuniform sampling (PNS) [108]. This strategy cor-
responds to choosing

(59)

where is the Nyquist period, and using a sam-


pling period of with . Here are integers
which select part of the uniform sampling grid, resulting in
uniform sequences

(60)
xt
Fig. 12. Spectrum slices of ( ) are overlayed in the spectrum of the output
y n i i
sequences [ ]. In the example, channels and realize different linear com-
The IMV model (58) that results from PNS has sequences
binations of the spectrum slices centered around lf ; lf ; lf
 ~ . For simplicity, representing the contents of the th bandpass subspace of the
the aliasing of the negative frequencies is not drawn (taken from [9]). relevant spectrum slice [108]. The sensing matrix is a partial
discrete Fourier transform (DFT), obtained by taking only the
row indices from the full DFT matrix.
lowing theorem. We refer to a sampling scheme that does not We note that PNS was utilized for multiband sampling al-
exploit the carrier frequencies as blind. ready in classic studies, though the traditional goal was to ap-
Theorem 27 [108]: Consider a multiband model with max- proach a rate of samples/s. This rate is optimal according
imal frequency and total support . Let denote the to the Landau theorem [158], though achieving it for all input
sampling density of the blind sampling set5 . Then, signals is possible only when the spectral support is known and
. fixed. When the carrier frequencies are unknown, the optimal
In our case the support is equal to . Therefore, as long rate is [108]. Indeed, [155] and [159] utilized knowledge
as we can potentially reduce the sampling rate of the band positions to design a PNS grid and the required in-
below Nyquist. terpolation filters for reconstruction. The approaches in [156],
Our goal now is to use the union of subspaces framework in [160] were semi-blind: a sampler design independent of band
order to develop a sampling scheme which achieves the lower positions combined with the reconstruction algorithm of [155]
bound of Theorem 27. To describe a multiband signal as a union which requires exact support knowledge. Other techniques tar-
of subspaces, we divide the Nyquist range into geted the rate by imposing alternative constraints on the
consecutive, nonoverlapping, slices of individual input spectrum [111], [122]. Here, we demonstrate how analog
widths as depicted in Fig. 12, such that . Each CS tools [18], and [113] can lead to a fully-blind sampling
spectrum slice represents a single bandpass subspace . By system of multiband inputs with unknown spectra at the appro-
choosing , we ensure that no more than spectrum priate optimal rate [108]. A more thorough discussion in [108]
5For a formal definition of these parameters, see [108]. Intuitively, DR
( ) studies the differences between the analog CS method presented
R r
describes the average sampling rate where = f g are the sampling points. here based on [18], [108], [113] and earlier approaches.
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4075

of branches by a factor of at the expense of increasing the


sampling rate of each channel by the same factor [109]. The
choice of periodic functions is flexible: The highest Dirac
frequency needs to exceed . In principle, any periodic func-
tion with high-speed transitions within the period can satisfy
this requirement. One possible choice for is a sign-alter-
nating function, with sign intervals within the
period [109], [161]. Imperfect sign alternations are allowed
as long as periodicity is maintained [9]. This property is crucial
since precise sign alternations at high speeds are extremely dif-
ficult to maintain, whereas simple hardware wirings ensure that
for every . The waveforms need
low mutual correlation in order to capture different mixtures of
the spectrum. Popular binary patterns, e.g., the Gold or Kasami
sequences, are especially suitable for the MWC [161]. Another
important practical design aspect is that the lowpass filter
does not have to be ideal. A nonflat frequency response can be
compensated for in the digital domain, using the algorithm de-
veloped in [162].
Fig. 13. Block diagram of the modulated wideband converter. The input passes
through p parallel branches, where it is mixed with a set of periodic functions The MWC has been implemented as a board-level hard-
p (t), lowpass filtered and sampled at a low rate (taken from [9]). ware prototype [9].6 The hardware specifications cover
inputs with a 2-GHz Nyquist rate with spectrum occupation
120 MHz. The total sampling rate is 280 MHz, far
6) Example Hardware: As shown in [108], the PNS strategy below the 2-GHz Nyquist rate. In order to save analog com-
can approach the minimal rate of Theorem 27. However, it re- ponents, the hardware realization incorporates the advanced
quires pointwise samplers that are capable of accommodating configuration of the MWC [109] with a collapsing factor .
the wide bandwidth of the input. This necessitates a high band- In addition, a single shift-register provides a basic periodic
width track and hold device, which is difficult to build at high pattern, from which periodic waveforms are derived using
frequencies. An alternative architecture referred to as the modu- delays, that is, by tapping different locations of the register. A
lated wideband converter (MWC) was developed in [109]. The nice feature of the recovery stage is that it interfaces seamlessly
MWC replaces the need for a high bandwidth track and hold with standard DSPs by providing (samples of) the narrowband
device by using high bandwidth modulators, which are easier to information signals. This capability is provided by a digital
manufacture. Indeed, off-the-shelf modulators at rates of tens of algorithm that is developed in [10].7
gigahertz are readily available. The MWC board is a first hardware example of the use of
The MWC combines the spectrum slices according to ideas borrowed from CS for sub-Nyquist sampling and low-rate
the scheme depicted in Fig. 13. Its design is modular so that recovery of wideband signals where the sampling rate is di-
when the carrier frequencies are known the same receiver can be rectly proportional to the actual bandwidth occupation and not
used with fewer channels or lower sampling rate. Furthermore, the highest frequency. Existing implementations of the random
by increasing the number of channels or the rate on each channel demodulator (RD) (cf. Section IV-B-3) recover signals at effec-
the same realization can be used for sampling full band signals tive sampling rates below 1 MHz, falling outside of the class
at the Nyquist rate. of wideband samplers. Additionally, the signal representations
The MWC consists of an analog front-end with channels. used by the RD have size proportional to the Nyquist frequency,
In the th channel, the input signal is multiplied by a peri-
leading to recovery problems that are much larger than those
odic waveform with period , lowpass filtered by a generic
posed by the MWC. See [9] and [10] for additional information
filter with impulse response with cutoff , and then sam-
on the similarities and differences between the MWC, the RD,
pled at rate . The mixing operation scrambles the spec-
and other comparable architectures.
trum of , so that as before, the spectrum is conceptually di-
vided into slices of width , and a weighted-sum of these slices C. Infinite Union of Finite-Dimensional Subspaces
is shifted to the origin [109]. The lowpass filter transfers
only the narrowband frequencies up to from that mixture to The second union class we consider is when is composed
the output sequence . The output has the aliasing pattern of an infinite number of subspaces, and each subspace has
illustrated in Fig. 12. Sensing with the MWC leads to a matrix finite dimension.
whose entries are the Fourier expansion coefficients of the 6A video of experiments and additional documentation for the MWC
periodic sequences . hardware are available at http://webee.technion.ac.il/Sites/People/Yoni-
The MWC can operate with as few as channels and naEldar/hardware.html. A graphical package demonstrating the MWC
numerically is available at http://webee.technion.ac.il/Sites/People/Yoni-
with a sampling rate on each channel, so that it naEldar/software_det3.html.
approaches the minimal rate of . Advanced configurations 7The algorithm is available online at http://webee.technion.ac.il/Sites/People/
enable additional hardware savings by collapsing the number YoninaEldar/software_det4.html.
4076 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

1) Analog Signal Model: As we have seen in Section VI-A, for which where is the Fourier transform
the SI model (54) is a convenient way to describe analog signals of the pulse . Then the samples for ,
in infinite-dimensional spaces. We can use a similar approach to uniquely determine the signal as long as
describe analog signals that lie within finite-dimensional spaces for any satisfying
by restricting the number of unknown gains to be finite. In
order to incorporate structure into this model, we assume that nonzero (64)
each generator has an unknown parameter associated arbitrary otherwise.
with it, which can take on values in a continuous interval, re-
Theorem 28 was initially focused on low pass filters in [6], and
sulting in the model
was later extended to arbitrary filters in [163].
To see how we can recover from the samples of Theorem
(61) 28, we note that the Fourier series coefficients of the peri-
odic pulse stream are given by [6]
Each possible choice of the set leads to a different -di-
mensional subspace of signals , spanned by the functions
. Since can take on any value in a given interval,
the model (61) corresponds to an infinite union of finite dimen-
sional subspaces (i.e., ). (65)
An important example of (61) is when
for some unknown time delay , leading to a stream of where . Given , the problem of retrieving
pulses and in (65) is a standard problem in array processing [6],
[164] and can be solved using methods developed in that con-
(62)
text such as the matrix pencil [165], subspace-based estimators
[166], [167], and the annihilating filter [8]. These methods re-
Here, is a known pulse shape and , ,
quire Fourier coefficients to determine and . In the next
are unknown delays and amplitudes. This model was
subsection, we show that the vector of Fourier coefficients
first introduced and studied by Vetterli et al. [6][8] as a special
can be computed from the samples of Theorem 28.
case of signals having a finite number of degrees of freedom per
unit time, termed finite rate of innovation (FRI) signals. Since the LPF has infinite time support, the approach of (64)
Our goal is to sample and reconstruct it from a min- cannot work with time-limited signals, such as those of the form
imal number of samples. The primary interest is in pulses which (62). A class of filters satisfying (64) that have finite time sup-
have small time-support, and therefore the required Nyquist rate port are Sum of Sincs (SoS) [163], which are given in the Fourier
would be very high. However, since the pulse shape is known, domain by
the signal has only degrees of freedom, and therefore, we ex-
(66)
pect the minimal number of samples to be , much lower than
the number of samples resulting from Nyquist rate sampling.
A simpler version of the problem is when the signal of where , . Switching to the time domain
(62) is repeated periodically leading to the model
(67)
(63)
which is clearly a time compact filter with support . For the
special case in which and ,
where is the known period. This periodic setup is easier to
treat because we can exploit the properties of the Fourier series
representation of due to the periodicity. The dimensionality
and number of subspaces included in the model (38) remain
unchanged. where denotes the Dirichlet kernel.
2) Compressive Signal Acquisition: To date, there are no Alternative approaches to sample finite pulse streams of the
general acquisition methods for signals of the form (61). In- form (63) rely on the use of splines [7]; this enables obtaining mo-
stead, we focus on the special case of (63). ments of the signal rather than its Fourier coefficients. The mo-
Our sampling scheme follows the ideas of [6][8] and con- ments are then processed in a similar fashion (see the next sub-
sists of a filter followed by uniform sampling of the output section for details). However, this approach is unstable for high
with period , where is the number of samples in one values of [7]. In contrast, the SoS class can be used for stable
period, and . The resulting samples can be written as reconstruction even for very high values of , e.g., .
inner products . The following theorem Multichannel schemes can also be used to sample pulse
establishes properties of the filter that allow recovery from streams. This approach was first considered for Dirac streams,
samples. where a successive chain of integrators allows obtaining
Theorem 28 [163]: Consider the -periodic stream of pulses moments of the signal [168]. Unfortunately, the method is
of order given by (63). Choose a set of consecutive indices highly sensitive to noise. A simple sampling and reconstruction
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4077

Thus, we only need consecutive values of to deter-


mine the annihilating filter. Once the filter is found, the values
are retrieved from the zeros of the -transform in (68). Fi-
nally, the Fourier coefficients are computed using (65). For
example, if we have the coefficients , ,
then (65) may be written as

.. .. .. .. .. ..
Fig. 14. Extended sampling scheme using modulating waveforms (taken from . . . . . .
[170]).

scheme consisting of two channels, each with an RC circuit,


was presented in [169] for the special case where there is no Since this Vandermonde matrix is left-invertible, the values
more than one Dirac per sampling period. A more general mul- can be computed by matrix inversion.
tichannel architecture that can treat a broader class of pulses, Reconstruction results for the sampling scheme based on the
while being much more stable, is depicted in Fig. 14 [170]. SoS filter with are depicted in Fig. 15. The original signal
The system is very similar to the MWC presented in the pre- consists of Gaussian pulses, and samples were
vious section. By correct choice of the mixing coefficients, the used for reconstruction. The reconstruction is exact to numerical
Fourier coefficients may be extracted from the samples precision. A comparison of the performance of various methods
by a simple matrix inversion. in the presence of noise is depicted in Fig. 15 for a finite stream
3) Recovery Algorithms: In both the single-channel and mul- consisting of three and five pulses. The pulse-shape is a Dirac
tichannel approaches, recovery of the unknown delays and am- delta, and white Gaussian noise is added to the samples with a
plitudes proceeds in two steps. First, the vector of samples proper level in order to reach the desired SNR for all methods.
is related to the Fourier coefficient vector through a All approaches operate using samples.
mixing matrix , as . Here, represents the 4) Applications: As an example application of FRI, we con-
number of samples. When using the SoS approach with a filter sider multiple image registration for superresolution imaging.
, where is a diagonal matrix with diagonal A superresolution algorithm aims at creating a single detailed
elements , , and is a Vander- image, called a super-resolved image (SR), from a set of low-
monde matrix with element given by , , resolution input images of the same scene. If different images
where denotes the sampling period. For the multichannel ar- from the same scene are taken such that their relative shifts are
chitecture of Fig. 14, consists of the modulation coefficients not integer multiples of the pixel size, then subpixel information
. The Fourier coefficient vector can be obtained from the exists among the set. This allows to obtain a higher resolution
samples as . accuracy of the scene once the images are properly registered.
The unknown parameters are recovered from Image registration involves any group of transformations that
using, e.g., the annihilating filter method. The annihilating filter removes the disparity between two low resolution (LR) images.
is defined by its -transform This is followed by image fusion, which blends the properly
aligned LR images into a higher resolution output, possibly re-
(68) moving blur and noise introduced by the system [171]. The reg-
istration step is crucial is order to obtain a good quality SR
image. The theory of FRI can be extended to provide superres-
That is, the roots of equal the values through which the
olution imaging. The key idea of this approach is that, using a
delays can be found. It then follows that
proper model for the point spread function (PSF) of the scene
acquisition system, it is possible to retrieve the underlying con-
tinuous geometric moments of the irradiance light-field. From
this information, and assuming the disparity between any two
(69) images can be characterized by a global affine transformation,
the set of images may be registered. The parameters obtained
where the last equality is due to . Assuming without via FRI correspond to the shift vectors that register the different
loss of generality that , the identity in (69) can be images before image fusion. Fig. 16 shows a real-world super-
written in matrix/vector form as resolution example in which 40 low-resolution images allow an
improvement in image resolution by a factor of 8.
A second example application of the FRI model is ultrasonic
imaging. In this imaging modality, an ultrasonic pulse is trans-
.. .. .. .. ..
. . . . . mitted into a tissue, e.g., the heart, and a map of the underlying
tissues is created by locating the echoes of the pulse. Correct
4078 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

Fig. 15. Performance comparison of finite pulse stream recovery using Gaussian [6], B-spline, E-spline [7], and SoS sampling kernels. (a) Reconstructed signal
=3
using SoS filters versus original one. The reconstruction is exact to numerical precision. (b) L =5
Dirac pulses are present, (c) L pulses (taken from [163]).

location of the tissues and their edges is crucial for medical obtaining a sample from each of the channels once every sec-
diagnosis. Current technology uses high rate sampling and onds, and using channels, resulting in samples taken at
processing in order to construct the image, demanding high the minimal possible rate.
computational complexity. Noting that the received signal is a We next turn to treat the more complicated scenario in which
stream of delayed and weighted versions of the known trans- may have support larger than . This setting can no longer
mitted pulse shape, FRI sampling schemes can be exploited in be treated by considering independent problems over periods of
order to reduce the sampling rate and the subsequent processing . To simplify, we consider the special case in which the time
rates by several orders of magnitude while still locating the delays in (70) repeat periodically (but not the amplitudes) [19],
echoes. The received ultrasonic signal is modeled as a finite [150]. As we will show in this special case, efficient sampling
FRI problem, with a Gaussian pulse-shape. and recovery is possible even using a single filter, and without
In Fig. 17, we consider a signal obtained using a phantom requiring the pulse to be time limited. Under our assump-
consisting of uniformly spaced pins, mimicking point scatterers, tions, the input signal can be written as
which is scanned by GE Healthcares Vivid-i portable ultra-
sound imaging system. The data recorded by a single element (71)
in the probe is modeled as a 1D stream of pulses. The recorded
signal is depicted in Fig. 17(a). The reconstruction results using
where is a set of unknown time delays contained
the SoS filter with are depicted in Fig. 17(b)(c). The
in the time interval , are arbitrary bounded energy
algorithm looked for the strongest echoes, using
sequences, and is a known pulse shape.
and samples. In both simulations, the estimation error 2) Compressive Signal Acquisition: We follow a similar ap-
in the location of the pulses is around 0.1 mm. These ideas have proach to that in Section VI-B, which treats a structured SI set-
also been extended recently to allow for 2-D imaging with mul- ting where there are possible generators. The difference is
tiple received elements [172]. that in this current case there are infinitely many possibilities.
Therefore, we replace the CTF in Fig. 11 with a block that sup-
D. Infinite Union of Infinite-Dimensional Subspaces
ports this continuity: we will see that the ESPRIT method es-
Extending the finite-dimensional model of Section VI-C to sentially replaces the CTF block [173].
the SI setting (54), we now incorporate structure by assuming A sampling and reconstruction scheme for signals of the form
that each generator has an unknown parameter associ- (71) is depicted in Fig. 18 [19]. The analog sampling stage is
ated with it, leading to an infinite union of infinite-dimensional comprised of parallel sampling channels. In each channel, the
spaces. As with its finite counterpart, there is currently no gen- input signal is filtered by a band-limited sampling kernel
eral sampling framework available to treat such signals. Instead, with frequency support contained in an interval of width
we focus on the case in which . , followed by a uniform sampler operating at a rate of , thus
1) Analog Signal Model: Suppose that providing the sampling sequence . Note that just as in the
MWC (Section VI-B-6), the sampling filters can be collapsed to
(70)
a single filter whose output is sampled at times the rate of a
single channel. The role of the sampling kernels is to spread out
where there are no more than pulses in an interval of length the energy of the signal in time, prior to low rate sampling.
, and that the pulses do not overlap interval boundaries. In this 3) Recovery Algorithms: To recover the signal from the
case, the signal parameters in each interval can be treated sepa- samples, a properly designed digital filter correction bank,
rately, using the schemes of the previous section. In particular, whose frequency response in the DTFT domain is given by
we can use the sampling system of Fig. 14 where now the inte- , is applied to the sampling sequences in a manner
gral is obtained over intervals of length [170]. This requires similar to (55). The matrix depends on the choice
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4079

Fig. 16. Example of FRI-based image superresolution. 40 images of a target scene were acquired with a digital camera. (a) Example acquired image. (b) Region
2 2
of interest (128 128 pixels) used for superresolution. (c) Superresolved image of size 1024 1024 pixels (SR factor = 8). The PSF in this case is modeled by
2
a B-spline of order 7 (scale 1). (d) Superresolved image of size 1024 1024 pixels (SR factor = 8). The PSF in this case is modeled by a B-spline of order 3
(scale 2) (taken from [171]).

Fig. 17. Applying the SoS sampling scheme with b = 1 on real ultrasound imaging data. Results are shown versus original signal which uses 4160 samples.
(a) Recorded ultrasound imaging signal. The data was acquired by GE healthcares Vivid-i ultrasound imaging system. Reconstructed signal (b) using N = 17
samples and (c) using N = 33 samples (taken from [163]).

agonal elements equal to , and is a Vandermonde


matrix with elements . The ESPRIT algorithm is sum-
marized for our setting as Algorithm 4, where and de-
note the sub matrices extracted from by deleting its last/first
row, respectively.

Algorithm 4: ESPRIT Algorithm

Input: Signal , number of parameters .


Output: Delays .
Fig. 18. Sampling and reconstruction scheme for signals of the form (71).
{construct correlation matrix}
{calculate SVD}
of the sampling kernels and the pulse shape . Its
entries are defined for , as singular vectors of associated with nonzero
singular values.
(72) {compute ESPRIT matrix}
{obtain eigenvalues}
After the digital correction stage, it can be shown that the cor-
, {map eigenvalues to delays}
rected sample vector is related to the unknown amplitude
vector by a Vandermonde matrix which depends on the un- return
known delays [19]. Therefore, we can now exploit known tools
taken from the direction of arrival [174] and spectral estima- 4) Recovery Guarantees: The proposed algorithm is guaran-
tion [164] literature to recover the delays , teed to recover the unknown delays and amplitudes as long as
such as the well-known ESPRIT algorithm [173]. Once the de- is large enough [19].
lays are determined, additional filtering operations are applied Theorem 29 [19]: The sampling scheme of Fig. 18 is guar-
on the samples to recover the sequences . In particular, re- anteed to recover any signal of the form (71) as long as
ferring to Fig. 18, the matrix is a diagonal matrix with di- , where is the dimension of the minimal subspace
4080 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

Fig. 19. Comparison between the target-detection performance for the case of nine targets (represented by 3) in the delayDoppler space with  = 10 s,
 = 10 kHz, W = 1.2 MHz, and T = 0.48 ms. The probing sequence fx g corresponds to a random binary (61) sequence with N = 48 ()
, the pulse p t
is designed to have a nearly flat frequency response and the pulse repetition interval is T= 10 s. Recovery of the Doppler-delay plane using (a) a union of
subspaces approach, (b) a standard matched filter, and (c) a discretized delay-Doppler plane (taken from [150]).

containing the vector set . In addition, the fil- denote the maximal Doppler shift and delay. To identify the tar-
ters are supported on and must be chosen such that gets we transmit the signal
of (72) is invertible.
The sampling rate resulting from Theorem 29 is no larger
than since . For certain signals, the rate can (73)
be reduced to . This is due to the fact that the outputs
are processed jointly, similar to the MMV model. Evidently, the where is a known -length probing sequence, and is a
minimal sampling rate is not related to the Nyquist rate of the known pulse shape. The received signal can then be described
pulse . Therefore, for wideband pulse shapes, the reduc- in the form (71), where the sequences satisfy
tion in rate can be quite substantial. As an example, consider
the setup in [175], used for characterization of ultra-wide band
wireless indoor channels. Under this setup, pulses with band- (74)
width of 1 GHz are transmitted at a rate of 2 MHz.
Assuming that there are 10 significant multipath components, The delays and the sequences can be recovered using the
we can reduce the sampling rate down to 40 MHz compared general scheme for time delay recovery. The Doppler shifts and
with the 2-GHz Nyquist rate. reflection coefficients are then determined from the sequences
5) Applications: Problems of the form (71) appear in a using standard spectral estimation tools [164]. The tar-
variety of different settings. For example, the model (71) can gets can be exactly identified as long as the bandwidth of
describe multipath medium identification problems, which the transmitted pulse satisfies , and the length of the
arise in applications such as radar [176], underwater acous- probing sequence satisfies [150]. This leads to a
tics [177], wireless communications, and more. In this context, minimal TBP of the input signal of , which
pulses with known shape are transmitted through a multipath is much lower than that obtained using standard radar processing
medium, which consists of several propagation paths, at a techniques, such as matched-filtering (MF).
constant rate. As a result the received signal is composed of An example of the identification of nine close targets is illus-
delayed and weighted replicas of the transmitted pulses. The trated in Fig. 19(a). The sampling filter used is a simple LPF.
delays represent the propagation delays of each path, while The original and recovered targets are shown on the Doppler-
the sequences describe the time-varying gain coefficient delay plane. Evidently all the targets were correctly identified.
of each multipath component. The result obtained by MF, with the same TBP, is shown in
Another important application of (71) is in the context of Fig. 19(b). Clearly, the compressive method has superior res-
radar. In this example, we translate the rate reduction to in- olution than the standard MF in this low noise setting. Thus,
creased resolution with a fixed timebandwidth product (TBP), the union of subspaces viewpoint not only offers a reduced-
thus enabling super-resolution radar from low rate samples. In rate sampling method, but allows to increase the resolution in
radar, the goal is to identify the range and velocity of targets. target identification for a fixed low TBP when the SNR is high
The delay in this case captures the range while the time varying enough, which is of paramount importance in many practical
coefficients are a result of the Doppler delay related to the target radar problems.
velocity [150]. More specifically, we assume that several targets Previous approaches for identifying the unknown delays and
can have the same delays but possibly different Doppler shifts gains involve sampling the received signal at the Nyquist rate
so that denote the set of distinct delays. For each delay of the pulse [178][180]. However, prior knowledge of the
value there are values of associated Doppler shifts pulse shape results in a parametric viewpoint, and we would ex-
and reflection coefficients . We also assume that the system pect that the rate should be proportional to the number of de-
is highly underspread, namely , where and grees of freedom, i.e., the number of paths .
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4081

Another approach is to quantize the delay-Doppler plane by design of new sampling schemes and devices that provide the in-
assuming the delays and Doppler shifts lie on a grid [42], [104], formation required for signal recovery in the smallest possible
[148], [149]. After discretization, CS tools for finite represen- representation.
tations can be used to capture the sparsity on the discretized While it is difficult to cover all of the developments in com-
grid. Clearly, this approach has an inherent resolution limitation. pressive sensing theory and hardware [66][68], [181][184],
Moreover, in real world scenarios, the targets do not lie exactly our aim here was to select few examples that were representative
on the grid points. This causes a leakage of their energies in the of wider classes of problems and that offered a balance between
quantized space into adjacent grid points [36], [87]. In contrast, a useful theoretical background and varied applications. We
the union of subspaces model avoids the discretization issue and hope that this summary will be useful to practitioners in signal
offers concrete sampling methods that can be implemented ef- acquisition and processing that are interested in leveraging the
ficiently in hardware (when the SNR is not too poor). features of compressive sensing in their specific applications.
We also hope that this review will inspire further developments
E. Discussion in the theory and practice underlying CS: we particularly envi-
Before concluding, we point out that much of the work in sion extending the existing framework to broader signals sets
this section (Sections VI-B, VI-C and VI-D) has been focused and inspiring new implementation and design paradigms. With
on infinite unions. In each case, the approach we introduced is an eye to the future, more advanced configurations of CS can
based on a sampling mechanism using analog filters, followed play a key role in many new frontiers. Some examples already
by standard array processing tools for recovery in the digital do- mentioned throughout are cognitive radio, optical systems,
main. These techniques allow perfect recovery when no noise medical devices such as MRI, ultrasound and more. These
is present, and generally degrade gracefully in the presence of techniques hold promise for a complete rethinking of many
noise, as has been analyzed extensively in the array processing acquisition systems and stretch the limit of current sensing
literature. Nonetheless, when the SNR is poor, alternative digital capabilities.
recovery methods based on CS may be preferable. Thus, we may As we have also demonstrated, CS holds promise for in-
increase robustness by using the analog sampling techniques creasing resolution by exploiting signal structure. This can
proposed here in combination with standard CS algorithms for revolutionize many applications such as radar and microscopy
recovery in the digital domain. To apply this approach, once we by making efficient use of the available degrees of freedom
have obtained the desired samples, we can view them as the in these settings. Consumer electronics, microscopy, civilian
measurement vector in a standard CS system; the CS matrix and military surveillance, medical imaging, radar and many
is now given by a discretized Vandermonde matrix that captures other rely on ADCs and are resolution-limited. Removing the
all possible frequencies to a desired accuracy; and the vector is Nyquist barrier in these applications and increasing resolu-
a sparse vector with nonzero elements only in those indices cor- tion can improve the user experience, increase data transfer,
responding to actual frequencies [42], [84], [104], [148], [149]. improve imaging quality and reduce exposure timein other
In this way, we combine the benefits of CS with those of analog words, make a prominent impact on the analog-digital world
sampling, without requiring discretization in the sampling stage. surrounding us.
The discretization now only appears in the digital domain during
recovery and not in the sampling mechanism. If we follow this ACKNOWLEDGMENT
strategy, then the results developed in Section III regarding re- The authors would like to thank their colleagues for many
covery in the presence of noise are relevant here as well. useful comments and for their collaboration on many topics re-
lated to this review. In particular, they are grateful to M. Wakin
VII. CONCLUSION for many discussions on the structure of the review, to M.
In this review, our aim was to summarize applications of the Mishali, H. Rauhut, J. Romberg, Y. Xie, and the anonymous
basic CS framework that integrate specific constraints of the reviewers for offering many helpful comments on an early draft
problem at hand into the theoretical and algorithmic formula- of the manuscript, and to P. L. Dragotti, P. Golland, L. Jacques,
tion of signal recovery, going beyond the original random mea- Y. Massoud, J. Romberg, and R. Willett for authorizing the use
surement/sparsity model paradigm. Due to constraints given of their figures.
by the relevant sensing devices, the classes of measurement
REFERENCES
matrices available to us are limited. Similarly, some applica-
[1] D. L. Donoho, Compressed sensing, IEEE Trans. Inf. Theory, vol.
tions focus on signals that exhibit structure which cannot be 52, no. 4, pp. 12891306, Sep. 2006.
captured by sparsity alone and provide additional information [2] E. J. Cands and T. Tao, Near optimal signal recovery from random
that can be leveraged during signal recovery. We also consid- projections: Universal encoding strategies?, IEEE Trans. Inf. Theory,
vol. 52, no. 12, pp. 54065425, Dec. 2006.
ered the transition between continuous and discrete represen- [3] R. G. Baraniuk, Compressive sensing, IEEE Signal Process. Mag.,
tations bridged by analog-to-digital converters. Analog signals vol. 24, no. 4, pp. 118120, 124, Jul. 2007.
are continuous-time by nature; in many cases, the application [4] E. J. Cands, Compressive sampling, in Proc. Int. Congr. Math.,
Madrid, Spain, 2006, vol. 3, pp. 14331452.
of compressed sensing to this larger signal class requires the [5] E. J. Cands and M. B. Wakin, An introduction to compressive sam-
formulation of new devices, signal models, and recovery algo- pling, IEEE Signal Process. Mag., vol. 25, no. 2, pp. 2130, Mar.
rithms. It also necessitates a deeper understanding of hardware 2008.
[6] M. Vetterli, P. Marziliano, and T. Blu, Sampling signals with finite
considerations that must be taken into account in theoretical de- rate of innovation, IEEE Trans. Signal Process., vol. 50, no. 6, pp.
velopments. This acquisition framework, in turn, motivates the 14171428, 2002.
4082 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

[7] P. L. Dragotti, M. Vetterli, and T. Blu, Sampling moments and recon- [33] T. Strohmer and R. Heath, Grassmanian frames with applications to
structing signals of finite rate of innovation: Shannon meets Strang-fix, coding and communication, Appl. Comput. Harmon. Anal., vol. 14,
IEEE Trans. Signal Process., vol. 55, no. 5, pp. 17411757, May 2007. no. 3, pp. 257275, Nov. 2003.
[8] T. Blu, P.-L. Dragotti, M. Vetterli, P. Marziliano, and L. Coulot, [34] S. A. Gersgorin, ber die abgrenzung der eigenwerte einer matrix,
Sparse sampling of signal innovations, IEEE Signal Process. Mag., Izv. Akad. Nauk SSSR Ser. Fiz.-Mat., vol. 6, pp. 749754, 1931.
vol. 25, no. 2, pp. 3140, Mar. 2008. [35] M. A. Herman and T. Strohmer, General deviants: An analysis of per-
[9] M. Mishali, Y. C. Eldar, O. Dounaevsky, and E. Shoshan, Xampling: turbations in compressed sensing, IEEE J. Sel. Topics Signal Process.,
Analog to digital at sub-Nyquist rates, IET Circuits, Devices, Syst., vol. 4, no. 2, pp. 342349, Apr. 2010.
vol. 5, no. 1, pp. 820, Jan. 2011. [36] Y. Chi, L. L. Scharf, A. Pezeshki, and R. Calderbank, Sensitivity to
[10] M. Mishali and Y. C. Eldar, Xampling: Compressed sensing for basis mismatch in compressed sensing, IEEE Trans. Signal Process.,
analog signals, in Compressed Sensing: Theory and Applications, Y. vol. 59, no. 5, pp. 21822195, May 2011.
C. Eldar and G. Kutyniok, Eds. Cambridge, U.K.: Cambridge Univ. [37] J. Treichler, M. A. Davenport, and R. G. Baraniuk, Application of
Press, 2012. compressive sensing to the design of wideband signal acquisition re-
[11] H. Nyquist, Certain topics in telegraph transmission theory, Trans. ceivers, presented at the U.S./Australia Joint Workshop Defense Appl.
Amer. Inst. Electr. Eng., vol. 47, pp. 617644, Apr. 1928. Signal Process. (DASP), Lihue, HI, Sep. 2009.
[12] C. E. Shannon, Communications in the presence of noise, Proc. IRE, [38] S. Aeron, V. Saligrama, and M. Zhao, Information theoretic bounds
vol. 37, pp. 1021, Jan. 1949. for compressed sensing, IEEE Trans. Inf. Theory, vol. 56, no. 10, pp.
[13] E. T. Whittaker, On the functions which are represented by the ex- 51115130, Oct. 2010.
pansions of the interpolation theory, Proc. R. Soc. Edinb., vol. 35, pp. [39] Z. Ben-Haim, T. Michaeli, and Y. C. Eldar, Performance bounds and
181194, 1915. design criteria for estimating finite rate of innovation signals, Sep.
[14] V. A. Kotelnikov, On the transmission capacity of the ether and 2010 [Online]. Available: http://arxiv.org/pdf/1009.2221, preprint.
of cables in electrical communications, in All-Union Conf. Technol. [40] E. Arias-Castro and Y. C. Eldar, Noise folding in compressed
Reconstruction Commun. Sector and Develop. of Low-Current Eng., sensing, IEEE Signal Process. Lett., vol. 18, no. 8, pp. 478481,
Moscow, Russia, 1933. 2011.
[15] Y. M. Lu and M. N. Do, A theory for sampling signals from a union of [41] T. T. Cai, G. Xu, and J. Zhang, On recovery of sparse signals via `
subspaces, IEEE Trans. Signal Process., vol. 56, no. 6, pp. 23342345, minimization, IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 33883397,
2008. July 2009.
[16] T. Blumensath and M. E. Davies, Sampling theorems for signals [42] M. A. Herman and T. Strohmer, High-resolution radar via compressed
from the union of finite-dimensional linear subspaces, IEEE Trans. sensing, IEEE Trans. Signal Process., vol. 57, no. 6, pp. 22752284,
Inf. Theory, vol. 55, no. 4, pp. 18721882, Apr. 2009. Jun. 2009.
[17] Y. C. Eldar and M. Mishali, Robust recovery of signals from a struc- [43] R. A. DeVore, Deterministic constructions of compressed sensing ma-
tured union of subspaces, IEEE Trans. Inf. Theory, vol. 55, no. 11, pp. trices, J. Complex., vol. 23, no. 4, pp. 918925, Aug. 2007.
53025316, 2009. [44] D. Donoho, For most large underdetermined systems of linear
[18] Y. C. Eldar, Compressed sensing of analog signals in shift-invariant equations, the minimal ` -norm solution is also the sparsest solution,
spaces, IEEE Trans. Signal Process., vol. 57, no. 8, pp. 29862997, Comm. Pure Appl. Math., vol. 59, no. 6, pp. 797829, 2006.
Aug. 2009. [45] E. J. Cands and Y. Plan, Near-ideal model selection by ` minimiza-
[19] K. Gedalyahu and Y. C. Eldar, Time delay estimation from low rate tion, Ann. Statist., vol. 37, no. 5A, pp. 21452177, Oct. 2009.
samples: A union of subspaces approach, IEEE Trans. Signal Process., [46] R. G. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, A simple
vol. 58, no. 6, pp. 30173031, Jun. 2010. proof of the restricted isometry property for random matrices, Con-
[20] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, Model-based struct. Approximation, vol. 28, no. 3, pp. 253263, 2008.
compressive sensing, IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. [47] D. L. Donoho and J. Tanner, Observed universality of phase transi-
19822001, Apr. 2010. tions in high-dimensional geometry, with implications for modern data
[21] I. F. Gorodnitsky and B. D. Rao, Sparse signal reconstruction from analysis and signal processing, Philos. Trans. Roy. Soc. London A,
limited data using FOCUSS: A re-weighted minimum norm algo- Math. Phys. Sci., vol. 367, no. 1906, pp. 42734293, Nov. 2009.
rithm, IEEE Trans. Signal Process., vol. 45, no. 3, pp. 600616, Mar. [48] C. Dossal, G. Peyr, and J. Fadili, A numerical exploration of com-
1997. pressed sampling recovery, Linear Algebra Its Appl., vol. 432, no. 7,
[22] S. Chen, D. L. Donoho, and M. Saunders, Atomic decomposition by pp. 16631679, Mar. 2010.
basis pursuit, SIAM J. Sci. Comput., vol. 20, no. 1, pp. 3361, 1998. [49] S. Boyd and L. Vanderberghe, Convex Optimization. Cambridge,
[23] A. M. Bruckstein, D. L. Donoho, and M. Elad, From sparse solutions U.K.: Cambridge Univ. Press, 2004.
of systems of equations to sparse modeling of signals and images, [50] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based
SIAM Rev., vol. 51, no. 1, pp. 3481, Feb. 2009. noise removal algorithms, Phys. D, vol. 60, pp. 259268, Nov. 1992.
[24] S. Mallat and Z. Zhang, Matching pursuit with time-frequency dictio- [51] J. A. Tropp and S. J. Wright, Computational methods for sparse
naries, IEEE Trans. Signal Process., vol. 41, no. 12, pp. 33973415, solution of linear inverse problems, Proc. IEEE, vol. 98, no. 6, pp.
Dec. 1993. 948958, Jun. 2010.
[25] S. Foucart, A note on guaranteed sparse recovery via ` -minimiza- [52] J. Haupt and R. Nowak, Signal reconstruction from noisy random pro-
tion, Appl. Comput. Harmon. Anal., vol. 29, no. 1, pp. 97103, 2010. jections, IEEE Trans. Inf. Theory, vol. 52, no. 9, pp. 40364048, Sep.
[26] S. Foucart and R. Gribonval, Real versus complex null space proper- 2006.
ties for sparse vector recovery, C. R. Acad. Sci. Paris, Ser. I, vol. 348, [53] S. Ji, Y. Xue, and L. Carin, Bayesian compressive sensing, IEEE
no. 1516, pp. 863865, 2010. Trans. Signal Process., vol. 56, no. 6, pp. 23462356, June 2008.
[27] A. Cohen, W. Dahmen, and R. A. DeVore, Compressed sensing and [54] E. Cands and T. Tao, The Dantzig selector: Statistical estimation
best k -term approximation, J. Amer. Math. Soc., vol. 22, no. 1, pp. when p is much larger than n, Ann. Statist., vol. 35, no. 6, pp.
211231, Jan. 2009. 23132351, 2005.
[28] D. L. Donoho and M. Elad, Optimally sparse representation in general [55] Y. Pati, R. Rezaifar, and P. Krishnaprasad, Orthogonal matching pur-
(nonorthogonal) dictionaries via ` minimization, Proc. Nat. Acad. suit: Recursive function approximation with applications to wavelet de-
Sci., vol. 100, no. 5, pp. 21972202, Mar. 2003. composition, presented at the Asilomar Conf. Signals, Syst., Comput.,
[29] D. L. Donoho and X. Huo, Uncertainty principles and ideal atomic de- Pacific Grove, CA, Nov. 1993.
compositions, IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 28452862, [56] D. Needell and J. A. Tropp, CoSaMP: Iterative signal recovery from
Nov. 2001. incomplete and inaccurate samples, Appl. Comput. Harmon. Anal.,
[30] J. A. Tropp, Greed is good: Algorithmic results for sparse approxi- vol. 26, no. 3, pp. 301321, May 2008.
mation, IEEE Trans. Inf. Theory, vol. 50, no. 10, pp. 22312242, Oct. [57] W. Dai and O. Milenkovic, Subspace pursuit for compressive sensing
2004. signal reconstruction, IEEE Trans. Inf. Theory, vol. 55, no. 5, pp.
[31] R. Gribonval and M. Nielsen, Sparse representations in unions of 22302249, May 2009.
bases, IEEE Trans. Inf. Theory, vol. 49, no. 12, pp. 33203325, Dec. [58] T. Blumensath and M. E. Davies, Iterative hard thresholding for com-
2003. pressed sensing, Appl. Comput. Harmon. Anal., vol. 27, no. 3, pp.
[32] L. R. Welch, Lower bounds on the maximum cross correlation of sig- 265274, Nov. 2008.
nals, IEEE Trans. Inf. Theory, vol. 20, no. 3, pp. 397399, May 1974.
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4083

[59] D. L. Donoho, M. Elad, and V. N. Temlyakov, Stable recovery of [84] J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg, and R. G. Bara-
sparse overcomplete representations in the presence of noise, IEEE niuk, Beyond Nyquist: Efficient sampling of sparse bandlimited sig-
Trans. Inf. Theory, vol. 52, no. 1, pp. 618, Jan. 2006. nals, IEEE Trans. Inf. Theory, vol. 56, no. 1, pp. 520544, Jan. 2010.
[60] E. J. Cands, J. K. Romberg, and T. Tao, Stable signal recovery from [85] T. Ragheb, J. N. Laska, H. Nejati, S. Kirolos, R. G. Baraniuk, and Y.
incomplete and inaccurate measurements, Comm. Pure Appl. Math., Massoud, A prototype hardware for random demodulation based com-
vol. 59, no. 8, pp. 12071223, 2006. pressive analog-to-digital conversion, in Proc. IEEE Midwest Symp.
[61] Z. Ben-Haim, Y. C. Eldar, and M. Elad, Coherence-based perfor- Circuits Syst., Knoxville, TN, Aug. 2008, pp. 3740.
mance guarantees for estimating a sparse vector under random noise, [86] Z. Yu, S. Hoyos, and B. M. Sadler, Mixed-signal parallel compressed
IEEE Trans. Signal Process., vol. 58, no. 10, pp. 50305043, Oct. 2010. sensing and reception for cognitive radio, in Proc. IEEE Int. Conf.
[62] Z. Ben-Haim and Y. C. Eldar, The Cramr-Rao bound for estimating Acoust., Speech, Signal Process. (ICASSP), Las Vegas, NV, Apr. 2008,
a sparse parameter vector, IEEE Trans. Signal Process., vol. 58, no. 6, pp. 38613864.
pp. 33843389, June 2010. [87] M. F. Duarte and R. G. Baraniuk, Spectral compressive sensing,
[63] M. B. Wakin and M. A. Davenport, Analysis of orthogonal matching Feb. 2010 [Online]. Available: http://www.math.princeton.edu/~md-
pursuit using the restricted isometry property, IEEE Trans. Inf. Theory, uarte/images/SCS-TSP.pdf, preprint.
vol. 56, no. 9, pp. 43954401, Sep. 2010. [88] G. C. Valley and T. J. Shaw, Compressive sensing recovery of sparse
[64] T. Zhang, Sparse recovery with orthogonal matching pursuit under signals with arbitrary frequencies, IEEE Trans. Signal Process., 2010,
RIP, May 2010 [Online]. Available: http://arxiv.org/pdf/1005.2249, submitted for publication.
preprint. [89] J. D. Haupt, W. U. Bajwa, G. Raz, and R. Nowak, Toeplitz compressed
[65] J. A. Tropp, On the conditioning of random subdictionaries, Appl. sensing matrices with applications to sparse channel estimation, IEEE
Comput. Harmon. Anal., vol. 25, pp. 124, 2008. Trans. Inf. Theory, vol. 56, no. 11, pp. 58625875, Jun. 2010.
[66] M. E. Gehm, R. John, D. Brady, R. Willett, and T. J. Schulz, Single- [90] H. Rauhut, Compressive sensing and structured random matrices,
shot compressive spectral imaging with a dual-disperser architecture, in Theoretical Foundations and Numerical Methods for Sparse Re-
Opt. Exp., vol. 15, no. 21, pp. 1401314027, Oct. 2007. covery, ser. Radon Series on Computational and Applied Mathematics,
[67] M. A. Neifeld and J. Ke, Optical architectures for compressive H. Rauhut, Ed. Berlin, Germany: De Gruyter, 2010, vol. 9, .
imaging, Appl. Opt., vol. 46, no. 22, pp. 52935303, Jul. 2007. [91] H. Rauhut, J. K. Romberg, and J. A. Tropp, Restricted isometries for
[68] A. Wagadarikar, R. John, R. Willett, and D. Brady, Single disperser partial random circulant matrices, Appl. Comput. Harmon. Anal., May
design for coded aperture snapshot spectral imaging, Appl. Opt., vol. 2011, to be published.
47, no. 10, pp. B44B51, Apr. 2008. [92] J. K. Romberg, Compressive sensing by random convolution, SIAM
[69] E. J. Cands and J. K. Romberg, Sparsity and incoherence in compres- J. Imag. Sci., vol. 2, no. 4, pp. 10981128, Dec. 2009.
sive sampling, Inverse Problems, vol. 23, no. 3, pp. 969985, 2007. [93] R. F. Marcia and R. M. Willett, Compressive coded aperture super-
[70] Y. C. Eldar, Uncertainty relations for shift-invariant analog signals, resolution image reconstruction, in Proc. IEEE Int. Conf. Acoust.,
IEEE Trans. Inf. Theory, vol. 55, no. 12, pp. 57425757, Dec. 2009. Speech, Signal Process. (ICASSP), Las Vegas, NV, March 2008, pp.
[71] M. Rudelson and R. Vershynin, On sparse reconstruction from Fourier 833836.
and Gaussian measurements, Comm. Pure Appl. Math., vol. 61, no. 8, [94] R. Marcia, Z. Harmany, and R. Willett, Compressive coded aperture
pp. 10251171, Aug. 2008. imaging, in Computational Imaging VII, San Jose, CA, Jan. 2009, vol.
[72] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, Compressed 7246, Proc. SPIE.
sensing MRI, IEEE Signal Process. Mag., vol. 25, no. 2, pp. 7282, [95] L. Jacques, P. Vandergheynst, A. Bibet, V. Majidzadeh, A. Schmid, and
Mar. 2008. Y. Leblebici, CMOS compressed imaging by random convolution,
[73] E. J. Cands, J. K. Romberg, and T. Tao, Robust uncertainty princi- in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP),
ples: Exact signal reconstruction from highly incomplete frequency in- Taipei, Taiwan, Apr. 2009, pp. 11131116.
formation, IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489509, 2006. [96] M. F. Duarte and R. G. Baraniuk, Kronecker compressive sensing,
[74] S. Gazit, A. Szameit, Y. C. Eldar, and M. Segev, Super-resolution 2009 [Online]. Available: http://www.math.princeton.edu/~md-
and reconstruction of sparse sub-wavelength images, Opt. Exp., vol. uarte/images/KCS-TIP11.pdf, preprint.
17, pp. 2392023946, 2009. [97] Y. Rivenson and A. Stern, Compressed imaging with a separable
[75] Y. Shechtman, S. Gazit, A. Szameit, Y. C. Eldar, and M. Segev, Super- sensing operator, IEEE Signal Process. Lett., vol. 16, no. 6, pp.
resolution and reconstruction of sparse images carried by incoherent 449452, Jun. 2009.
light, Opt. Lett., vol. 35, pp. 2392023946, Apr. 2010. [98] T. Sun and K. F. Kelly, Compressive sensing hyperspectral imager,
[76] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. presented at the Comput. Optical Sens. Imag. (COSI), San Jose, CA,
F. Kelly, and R. G. Baraniuk, Single pixel imaging via compressive Oct. 2009.
sampling, IEEE Signal Process. Mag., vol. 25, no. 2, pp. 8391, Mar. [99] R. Robucci, J. Gray, L. K. Chiu, J. K. Romberg, and P. Hasler, Com-
2008. pressive sensing on a CMOS separable-transform image sensor, Proc.
[77] R. Coifman, F. Geshwind, and Y. Meyer, Noiselets, Appl. Comp. IEEE, vol. 98, no. 6, pp. 10891101, Jun. 2010.
Harmon. Anal., vol. 10, pp. 2744, 2001. [100] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge,
[78] R. A. DeVerse, R. R. Coifman, A. C. Coppi, W. G. Fateley, F. Gesh- U.K.: Cambridge Univ. Press, 1991.
wind, R. M. Hammaker, S. Valenti, F. J. Warner, and G. L. Davis, Ap- [101] D. Baron, M. B. Wakin, M. F. Duarte, S. Sarvotham, and R. G. Bara-
plication of spatial light modulators for new modalities in spectrometry niuk, Distributed compressed sensing, Electr. Comput. Eng. Dept.,
and imaging, in Spectral Imaging: Instrumentation, Applications, and Rice Univ., Houston, TX, Tech. Rep. TREE-0612, Nov. 2006.
Analysis II, San Jose, CA, Jan. 2003, vol. 4959, Proc. SPIE, pp. 1222. [102] J. W. Phillips, R. M. Leahy, and J. C. Mosher, MEG-based imaging
[79] W. L. Chan, K. Charan, D. Takhar, K. F. Kelly, R. G. Baraniuk, and of focal neuronal current sources, IEEE Trans. Med. Imag., vol. 16,
D. M. Mittleman, A single-pixel terahertz imaging system based on no. 3, pp. 338348, June 1997.
compressed sensing, Appl. Phy. Lett., vol. 93, no. 12, Sep. 2008. [103] R. Gribonval, Sparse decomposition of stereo signals with matching
[80] P. Ye, J. L. Paredes, G. R. Arce, Y. Wu, C. Chen, and D. W. Prather, pursuit and application to blind separation of more than two sources
Compressive confocal microscopy, in IEEE Int. Conf. Acoust., from a stereo mixture, presented at the IEEE Int. Conf. Acoust.,
Speech, Signal Process. (ICASSP), Taipei, Taiwan, Apr. 2009, pp. Speech, Signal Process. (ICASSP), Minneapolis, MN, Apr. 1993.
429432. [104] D. Malioutov, M. Cetin, and A. S. Willsky, A sparse signal recon-
[81] S. Pfetsch, T. Ragheb, J. Laska, H. Nejati, A. Gilbert, M. Strauss, R. struction perspective for source localization with sensor arrays, IEEE
Baraniuk, and Y. Massoud, On the feasibility of hardware implemen- Trans. Signal Process., vol. 53, no. 8, pp. 30103022, Aug. 2005.
tation of sub-Nyquist random-sampling based analog-to-information [105] S. F. Cotter and B. D. Rao, Sparse channel estimation via matching
conversion, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Seattle, pursuit with application to equalization, IEEE Trans. Commun., vol.
WA, May 2008, pp. 14801483. 50, no. 3, pp. 374377, Mar. 2002.
[82] A. C. Gilbert, M. J. Strauss, and J. A. Tropp, A tutorial on fast Fourier [106] I. J. Fevrier, S. B. Gelfand, and M. P. Fitz, Reduced complexity de-
sampling, IEEE Signal Process. Mag., vol. 25, no. 2, pp. 5766, Mar. cision feedback equalization for multipath channels with large delay
2008. spreads, IEEE Trans. Commun., vol. 47, no. 6, pp. 927937, Jun. 1999.
[83] W. U. Bajwa, A. Sayeed, and R. Nowak, A restricted isometry prop- [107] J. A. Bazerque and G. B. Giannakis, Distributed spectrum sensing for
erty for structurally subsampled unitary matrices, in Proc. Allerton cognitive radio networks by exploiting sparsity, IEEE Trans. Signal
Conf. Commun., Control, Comput., Monticello, IL, Sep. 2009, pp. Process., vol. 58, no. 3, pp. 18471862, Mar. 2010.
10051012.
4084 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

[108] M. Mishali and Y. C. Eldar, Blind multiband signal reconstruction: [133] L. Peotta and P. Vandergheynst, Matching pursuit with block inco-
Compressed sensing for analog signals, IEEE Trans. Signal Process., herent dictionaries, IEEE Trans. Signal Process., vol. 55, no. 9, pp.
vol. 57, pp. 9931009, Mar. 2009. 45494557, Sep. 2007.
[109] M. Mishali and Y. C. Eldar, From theory to practice: Sub-Nyquist [134] M. Yuan and Y. Lin, Model selection and estimation in regression
sampling of sparse wideband analog signals, IEEE J. Sel. Topics with grouped variables, J. Roy. Stat. Soc. Ser. B Stat. Methodol., vol.
Signal Process., vol. 4, no. 2, pp. 375391, Apr. 2010. 68, no. 1, pp. 4967, 2006.
[110] M. E. Davies and Y. C. Eldar, Rank awareness in joint sparse re- [135] Y. C. Eldar and T. Michaeli, Beyond bandlimited sampling, IEEE
covery, Signal Process. Mag., vol. 26, no. 3, pp. 4868, May 2009.
Apr. 2010 [Online]. Available: http://arxiv.org/pdf/1004.4529, [136] F. R. Bach, Consistency of the group lasso and multiple kernel
preprint. learning, J. Mach. Learn. Res., vol. 9, pp. 11791225, 2008.
[111] P. Feng, Universal Minimum-rate sampling and spectrum-blind re- [137] Y. Nardi and A. Rinaldo, On the asymptotic properties of the group
construction for multiband signals, Ph.D. dissertation, Univ. of Illi- lasso estimator for linear models, Electron. J. Statist., vol. 2, pp.
nois, Urbana, IL, 1998. 605633, 2008.
[112] J. Chen and X. Huo, Theoretical results on sparse representations of [138] M. Stojnic, F. Parvaresh, and B. Hassibi, On the reconstruction of
multiple-measurement vectors, IEEE Trans. Signal Process., vol. 54, block-sparse signals with an optimal number of measurements, IEEE
no. 12, pp. 46344643, Dec. 2006. Trans. Signal Process., vol. 57, no. 8, pp. 30753085, Aug. 2009.
[113] M. Mishali and Y. C. Eldar, Reduce and boost: Recovering arbitrary [139] L. Meier, S. van de Geer, and P. Bhlmann, The group lasso for lo-
sets of jointly sparse vectors, IEEE Trans. Signal Process., vol. 56, no. gistic regression, J. R. Statist. Soc. B, vol. 70, no. 1, pp. 5377, 2008.
10, pp. 46924702, Oct. 2008. [140] S. Erickson and C. Sabatti, Empirical Bayes estimation of a sparse
[114] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, Algorithms for simulta- vector of gene expression changes, Statist. Appl. Genetics Mol. Biol.,
neous sparse approximation. Part I: Greedy pursuit, Signal Process., vol. 4, no. 1, p. 22, 2005.
vol. 86, pp. 572588, Apr. 2006. [141] F. Parvaresh, H. Vikalo, S. Misra, and B. Hassibi, Recovering sparse
[115] J. A. Tropp, Algorithms for simultaneous sparse approximation. Part signals using sparse measurement matrices in compressed DNA
II: Convex relaxation, Signal Process., vol. 86, pp. 589602, Apr. microarrays, IEEE J. Sel. Topics Signal Process., vol. 2, no. 3, pp.
2006. 275285, Jun. 2008.
[116] M. Fornasier and H. Rauhut, Recovery algorithms for vector valued [142] C. Hegde, M. F. Duarte, and V. Cevher, Compressive sensing recovery
data with joint sparsity constraints, SIAM J. Numer. Anal., vol. 46, no. of spike trains using structured sparsity, presented at the Workshop
2, pp. 577613, 2008. Signal Process. Adapt.. Sparse Structured Representations (SPARS),
[117] S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado, Sparse Saint Malo, France, Apr. 2009.
solutions to linear inverse problems with multiple measurement vec- [143] V. Cevher, P. Indyk, C. Hegde, and R. G. Baraniuk, Recovery of clus-
tors, IEEE Trans. Signal Process., vol. 53, no. 7, pp. 24772488, Jul. tered sparse signals from compressive measurements, presented at the
2005. Int. Conf. Sampling Theory Appl. (SAMPTA), Marseille, France, May
[118] R. Gribonval, H. Rauhut, K. Schnass, and P. Vandergheynst, Atoms 2009.
of all channels, unite! Average case analysis of multi-channel sparse [144] M. F. Duarte, V. Cevher, and R. G. Baraniuk, Model-based com-
recovery using greedy algorithms, J. Fourier Anal. Appl., vol. 14, no. pressive sensing for signal ensembles, presented at the Allerton Conf.
5, pp. 655687, 2008. Commun., Control, Comput., Monticello, IL, Sep. 2009.
[119] Y. C. Eldar and H. Rauhut, Average case analysis of multichannel [145] Z. Ben-Haim and Y. C. Eldar, Near-oracle performance of greedy
sparse recovery using convex relaxation, IEEE Trans. Inf. Theory, vol. block-sparse estimation techniques from noisy measurements, IEEE
6, no. 1, pp. 505519, Jan. 2010. J. Sel. Topics Signal Process., 2010 [Online]. Available: http://arxiv.
[120] Y. C. Eldar, P. Kuppinger, and H. Blcskei, Block-sparse signals: Un- org/pdf/1009.0906, to be published
certainty relations and efficient recovery, IEEE Trans. Signal Process., [146] J. Friedman, T. Hastie, and R. Tibshirani, A note on the group Lasso
pp. 30423054, June 2010. and a sparse group Lasso, Jan. 2010 [Online]. Available: http://arxiv.
[121] R. O. Schmidt, Multiple emitter location and signal parameter estima- org/pdf/1001.0736, preprint.
tion, in Proc. RADC Spectral Estimation Workshop, Rome, NY, Oct. [147] P. Sprechmann, I. Ramirez, G. Sapiro, and Y. C. Eldar, C-Hi-
1979, pp. 243258. Lasso: A collaborative hierarchical sparse modeling framework,
[122] P. Feng and Y. Bresler, Spectrum-blind minimum-rate sampling IEEE Trans. Signal Process., Jun. 2010 [Online]. Available:
and reconstruction of multiband signals, in IEEE Int. Conf. Acoust., http://arxiv.org/abs/1006.1346, preprint, to be published
Speech, Signal Process. (ICASSP), Atlanta, GA, May 1996, vol. 3, pp. [148] A. W. Habboosh, R. J. Vaccaro, and S. Kay, An algorithm for de-
16881691. tecting closely spaced delay/Doppler components, in Proc. IEEE Int.
[123] K. Schnass and P. Vandergheynst, Average performance analysis for Conf. Acoust., Speech, Signal Process. (ICASSP), Munich, Germany,
thresholding, IEEE Signal Process. Lett., vol. 14, no. 11, pp. 828831, Apr. 1997, pp. 535538.
Nov. 2007. [149] W. U. Bajwa, A. M. Sayeed, and R. Nowak, Learning sparse
[124] W. Ou, M. S. Hmlinen, and P. Golland, A distributed spatio-tem- doubly-selective channels, in Proc. Allerton Conf. Commun., Control,
poral EEG/MEG inverse solver, Neuroimage, vol. 44, no. 3, pp. Comput., Monticello, IL, Sep. 2008, pp. 575582.
932946, Feb. 2009. [150] W. U. Bajwa, K. Gedalyahu, and Y. C. Eldar, Identification of para-
[125] V. Cevher, M. F. Duarte, C. Hegde, and R. G. Baraniuk, Sparse signal metric underspread linear systems with application to super-resolution
recovery using Markov random fields, presented at the Workshop radar, IEEE Trans. Signal Process., vol. 59, no. 6, pp. 25482561,
Neural Info. Process. Syst. (NIPS), Vancouver, Canada, Dec. 2008. Aug. 2011.
[126] P. Schniter, Turbo reconstruction of structured sparse signals, pre- [151] M. Unser, Sampling50 years after Shannon, Proc. IEEE, vol. 88,
sented at the Conf. Inf. Sci. Syst. (CISS), Princeton, NJ, Mar. 2010. pp. 569587, Apr. 2000.
[127] D. Baron, S. Sarvotham, and R. G. Baraniuk, Bayesian compressive [152] C. de Boor, R. DeVore, and A. Ron, The structure of finitely generated
sensing via belief propagation, IEEE Trans. Signal Process., vol. 58, shift-invariant spaces in L ( ), J. Funct. Anal., vol. 119, no. 1, pp.
no. 1, pp. 269280, Jan. 2010. 3778, 1994.
[128] L. He and L. Carin, Exploiting structure in wavelet-based Bayesian [153] A. Ron and Z. Shen, Frames and stable bases for shift-invariant sub-
compressive sensing, IEEE Trans. Signal Process., vol. 57, no. 9, pp. spaces of L (R ), Can. J. Math., vol. 47, no. 5, pp. 10511094, 1995.
34883497, Sep. 2009. [154] A. Aldroubi and K. Grchenig, Non-uniform sampling and recon-
[129] S. Ji, D. Dunson, and L. Carin, Multi-task compressive sensing, IEEE struction in shift-invariant spaces, SIAM Rev., vol. 43, pp. 585620,
Trans. Signal Process., vol. 57, no. 1, pp. 92106, Jan. 2009. 2001.
[130] L. He, H. Chen, and L. Carin, Tree-structured compressive sensing [155] Y.-P. Lin and P. P. Vaidyanathan, Periodically nonuniform sampling
with variational Bayesian analysis, IEEE Signal Process. Lett., vol. of bandpass signals, IEEE Trans. Circuits Syst. II, vol. 45, no. 3, pp.
17, no. 3, pp. 233236, Mar. 2010. 340351, Mar. 1998.
[131] T. Faktor, Y. C. Eldar, and M. Elad, Exploiting statistical dependen- [156] C. Herley and P. W. Wong, Minimum rate sampling and reconstruc-
cies in sparse representations for signal recovery, Oct. 2010 [Online]. tion of signals with arbitrary frequency support, IEEE Trans. Inf.
Available: http://arxiv.org/pdf/1010.5734, preprint. Theory, vol. 45, no. 5, pp. 15551564, July 1999.
[132] R. Jenatton, J. Y. Audibert, and F. Bach, Structured variable selection [157] J. Mitola, III, Cognitive radio for flexible mobile multimedia commu-
with sparsity inducing norms, Apr. 2009 [Online]. Available: http:// nications, Mobile Netw. Appl., vol. 6, no. 5, pp. 435441, 2001.
arxiv.org/pdf/0904.3523, preprint.
DUARTE AND ELDAR: STRUCTURED COMPRESSED SENSING: FROM THEORY TO APPLICATIONS 4085

[158] H. Landau, Necessary density conditions for sampling and interpola- [182] R. G. Baraniuk and M. B. Wakin, Random projections of smooth man-
tion of certain entire functions, Acta Mathematica, vol. 117, no. 1, pp. ifolds, Found. Comput. Math., vol. 9, no. 1, pp. 5177, Jan. 2009.
3752, Jul. 1967. [183] E. J. Cands and Y. Plan, Matrix completion with noise, Proc. IEEE,
[159] A. Kohlenberg, Exact interpolation of band-limited functions, J. vol. 98, no. 6, pp. 925936, Jun. 2010.
Appl. Phys., vol. 24, no. 12, pp. 14321436, Dec. 1953. [184] A. Hansen, Generalized sampling and infinite dimensional com-
[160] R. Venkataramani and Y. Bresler, Perfect reconstruction formulas and pressed sensing, Feb. 2011 [Online]. Available: http://www.damtp.
bounds on aliasing error in sub-Nyquist nonuniform sampling of multi- cam.ac.uk/user/na/people/Anders/Inf_CS43.pdf, preprint.
band signals, IEEE Trans. Inf. Theory, vol. 46, no. 6, pp. 21732183,
Sep. 2000.
[161] M. Mishali and Y. C. Eldar, Expected-RIP: Conditioning of the
modulated wideband converter, in IEEE Inf. Theory Workshop (ITW),
Taormina, Italy, Oct. 2009, pp. 343347.
[162] Y. Chen, M. Mishali, Y. C. Eldar, and A. O. Hero, III, Modulated
wideband converter with non-ideal lowpass filters, in Proc. IEEE Int.
Conf. Acoust., Speech, Signal Process. (ICASSP), Dallas, TX, Apr.
2010, pp. 36303633. Marco F. Duarte (S99M09) received the B.Sc.
[163] R. Tur, Y. C. Eldar, and Z. Friedman, Innovation rate sampling of degree in computer engineering (with distinction)
pulse streams with application to ultrasound imaging, IEEE Trans. and the M.Sc. degree in electrical engineering from
Signal Process., vol. 59, no. 4, pp. 18271842, Apr. 2011. the University of Wisconsin-Madison in 2002 and
[164] P. Stoica and R. Moses, Introduction to Spectral Analysis. Engle- 2004, respectively, and the Ph.D. degree in electrical
wood Cliffs, NJ: Prentice-Hall, 1997. engineering from Rice University, Houston, TX, in
[165] Y. Hua and T. K. Sarkar, Matrix pencil method for estimating pa- 2009.
rameters of exponentially damped/undamped sinusoids in noise, IEEE He was an NSF/IPAM Mathematical Sciences
Trans. Acoust., Speech, Signal Process., vol. 70, pp. 12721281, May Postdoctoral Research Fellow in the Program of Ap-
1990. plied and Computational Mathematics at Princeton
[166] S. Y. Kung, K. S. Arun, and B. D. Rao, State-space and singular- University, Princeton, NJ, from 2009 to 2010, and
value decomposition-based approximation methods for the harmonic in the Department of Computer Science at Duke University, Durham, NC,
retrieval problem, J. Opt. Soc. Amer., vol. 73, no. 122, pp. 17991811, from 2010 to 2011. He is currently an Assistant Professor in the Department
Dec. 1983. of Electrical and Computer Engineering at the University of Massachusetts,
[167] B. D. Rao and K. S. Arun, Model based processing of signals: A state Amherst, MA. His research interests include machine learning, compressed
space approach, Proc. IEEE, vol. 80, no. 2, pp. 283309, 1992. sensing, sensor networks, and optical image coding.
[168] J. Kusuma and V. K. Goyal, Multichannel sampling of parametric Dr. Duarte received the Presidential Fellowship and the Texas Instruments
signals with a successive approximation property, in Proc. IEEE Int. Distinguished Fellowship in 2004 and the Hershel M. Rich Invention Award in
Conf. Image Process. (ICIP), Oct. 2006, pp. 12651268. 2007, all from Rice University. He coauthored (with C. Hegde and V. Cevher)
[169] C. S. Seelamantula and M. Unser, A generalized sampling method for the Best Student Paper at the 2009 International Workshop on Signal Processing
finite-rate-of-innovation-signal reconstruction, IEEE Signal Process. with Adaptive Sparse Structured Representations (SPARS). He is also a member
Lett., vol. 15, pp. 813816, 2008. of Tau Beta Pi.
[170] K. Gedalyahu, R. Tur, and Y. C. Eldar, Multichannel sampling of
pulse streams at the rate of innovation, IEEE Trans. Signal Process.,
vol. 59, no. 4, pp. 14911504, Apr. 2011.
[171] L. Baboulaz and P. L. Dragotti, Exact feature extraction using finite Yonina C. Eldar (S98M02SM07) received
rate of innovation principles with an application to image super-reso- the B.Sc. degree in physics and the B.Sc. degree in
lution, IEEE Trans. Image Process., vol. 18, no. 2, pp. 281298, Feb. electrical engineering both from Tel-Aviv University
2009. (TAU), Tel-Aviv, Israel, in 1995 and 1996, respec-
[172] N. Wagner, Y. C. Eldar, A. Feuer, G. Danin, and Z. Friedman, Xam- tively, and the Ph.D. degree in electrical engineering
pling in ultrasound imaging, in Medical Imaging: Ultrasonic Imaging, and computer science from the Massachusetts
Tomography, and Therapy, Lake Buena Vista, FL, 2011, vol. 7968, Institute of Technology (MIT), Cambridge, in 2002.
Proc. SPIE. From January 2002 to July 2002, she was a
[173] R. Roy and T. Kailath, ESPRIT-estimation of signal parameters via Postdoctoral Fellow at the Digital Signal Processing
rotational invariance techniques, IEEE Trans. Acoust., Speech, Signal Group at MIT. She is currently a Professor in the
Process., vol. 37, no. 7, pp. 984995, Jul. 1989. Department of Electrical Engineering at the Tech-
[174] H. Krim and M. Viberg, Two decades of array signal processing re- nionIsrael Institute of Technology, Haifa. She is also a Research Affiliate
search: The parametric approach, IEEE Signal Process. Mag., vol. 13, with the Research Laboratory of Electronics at MIT and a Visiting Professor
no. 4, pp. 6794, July 1996. at Stanford University, Stanford, CA. Her research interests are in the broad
[175] M. Z. Win and R. A. Scholtz, Characterization of ultra-wide band- areas of statistical signal processing, sampling theory and compressed sensing,
width wireless indoor channels: A communication-theoretic view, optimization methods, and their applications to biology and optics.
IEEE J. Sel. Areas Commun., vol. 20, no. 9, pp. 16131627, Dec. Dr. Eldar was in the program for outstanding students at TAU from 1992
2002. to 1996. In 1998, she held the Rosenblith Fellowship for study in electrical
[176] A. Quazi, An overview on the time delay estimate in active and passive engineering at MIT, and in 2000, she held an IBM Research Fellowship.
systems for target localization, IEEE Trans. Acoust., Speech, Signal From 2002 to 2005, she was a Horev Fellow of the Leaders in Science and
Process., vol. 29, no. 3, pp. 527533, 1981. Technology program at the Technion and an Alon Fellow. In 2004, she was
[177] R. J. Urick, Principles of Underwater Sound. New York: McGraw- awarded the Wolf Foundation Krill Prize for Excellence in Scientific Research,
Hill, 1983. in 2005 the Andre and Bella Meyer Lectureship, in 2007 the Henry Taub Prize
[178] A. Bruckstein, T. J. Shan, and T. Kailath, The resolution of overlap- for Excellence in Research, in 2008 the Hershel Rich Innovation Award, the
ping echos, IEEE Trans. Acoust., Speech, Signal Process., vol. 33, no. Award for Women with Distinguished Contributions, the Muriel & David
6, pp. 13571367, 1985. Jacknow Award for Excellence in Teaching, and the Technion Outstanding
[179] M. A. Pallas and G. Jourdain, Active high resolution time delay esti- Lecture Award, in 2009 the Technions Award for Excellence in Teaching, in
mation for large BT signals, IEEE Trans. Signal Process., vol. 39, no. 2010 the Michael Bruno Memorial Award from the Rothschild Foundation,
4, pp. 781788, Apr. 1991. and in 2011 the Weizmann Prize for Exact Sciences. She is a member of the
[180] F.-X. Ge, D. Shen, Y. Peng, and V. O. K. Li, Super-resolution time IEEE Signal Processing Theory and Methods technical committee and the Bio
delay estimation in multipath environments, IEEE Trans. Circuits Imaging Signal Processing technical committee, an Associate Editor for the
Syst. I, vol. 54, no. 9, pp. 19771986, Sep. 2007. SIAM Journal on Imaging Sciences, and on the Editorial Board of Foundations
[181] L. Borup, R. Gribonval, and M. Nielsen, Beyond coherence: Re- and Trends in Signal Processing. In the past, she served as an Associate Editor
covering structured time-frequency representations, Appl. Comp. for the IEEE TRANSACTIONS ON SIGNAL PROCESSING, the EURASIP Journal of
Harmon. Anal., vol. 24, no. 1, pp. 120128, Jan. 2008. Signal Processing, and the SIAM Journal on Matrix Analysis and Applications.

You might also like