Talking Head - OA Interview Questions

Open access interview: Questions
Background to the video:

The Library Research Team undertook 50 semi-structured interviews to ascertain researchers
understanding, attitudes and behaviours around open access (OA) publishing. The results highlighted
that most of the respondents could articulate what open access was, but could not always
differentiate between Green and Gold OA publishing. The respondents felt that OA was beneficial to
them, the research community, and the general public, but a large percentage did not automatically
publish through OA. Respondents stated that the university could encourage authors to publish via
OA by providing appropriate funding, plus regularly reminding research staff of good practice
through case studies and the experiences of colleagues.
Purpose of the video:

You as a researcher, highlighting your experience of open access publishing and the benefits
to you, your colleagues, the university and the wider community. Use of examples relating
to your publishing experience would be really good.
Questions:
1. Please can you introduce yourself and your role at NTU?
2. What are your main research interests?
3. How do you normally publish your research outputs?
4. What does open access mean to you?
5. Why is open access important to you?
6. What are the benefits of open access publishing?
7. What advice would you give colleagues in relation to open access publishing?
Possible things to consider for inclusion in your answers
What does open access mean to you?

 Scholarly material free to the reader
 Green and Gold open access
 Deposit in IRep is green route, fee for publishing is Gold route
Why is open access important to you?

 Wider audience
 Increased readership
 Increased potential for greater collaboration
 Potential for higher citation rate
 Compliance with funding councils and submission to REF
What are the benefits of open access publishing?

 For self and research groups: raises profile and can lead to greater collaboration
opportunities and citations
 For NTU: raise profile and can lead to more funding opportunities
 For community: Free sharing of knowledge, both for academics who work in
institutions that cannot afford journal subscriptions, but also members of the public
What advice would you give colleagues in relation to open access publishing?
 If not published research through Gold OA, still can be RCUK and REF compliant by
submitting bibliographic data and full text to IRep as soon as accepted for
publication
 Deposit in IRep is easy
 Help is available - The Library Research Team can provide guidance on Choosing
where to publish; Copyright; Embargoes and APCs
Principal compooem analy,;" or rcA.;s a technique tha! is "'idely u<ed for appli.
cations such as dimensionality .-eduction, lossy data comprc"ion, feature e>tracti"".
and data v;,ualizatiOll (Jolliffe, 2(02). It;s also kno.... " as tile Karoan.n·I..,;"" tran,·
f~.
lbcrc an: t....o commonly used definitions of PeA that giye rise to the >arne
algorithm. PeA can be defined as the unhog<lnal projtttion of the data O/1tO a lo....er
dimensionallincar space. kno....n as the pri/lcip.al $uh.•p.aa. soch that the \'ariance of
the projttted data i' ma~imi,e<J (1I",.lIing. 1933). Equi"alemly,;t can be defined as
tbe linear projection that minimi"'. the average projttlion cost. defined as t~ mean
squa.-ed distance !letween the data [>Oint< and tbeir p<ojtttioo, (Pearson, 19(1). The
l"J'"OC"s< of onhogonal projection i' illustraled in FiguTe 12.2. We con,ider each of
these definitions in tum.
12,1.1 Mllximllm variance lormulation
Con,ider a dala set <If obser"\lations {x,,} where" = 1..... S, and x" i, a
Euclidean variable "'ilh dimen,ionality D. Our goal is to project If>/:: data onto a
'pace ha"ing dimen,ionality M < D" hile Ill3Jli",i,illg the "ariallCe ofthe projttted
data. For the !noll..nl. we 'hall assume that tbe "alue of M is g;\·en. Latcr in this
562 12. CONTINUOUS LATENT VARIABLES
chapter, we shall consider techniques to determine an appropriate value of IV! from
the data.
To begin with, consider the projection onto a one-dimensional space (M = 1).
We can define the direction of this space using a D-dimensional vector Ul, which
for convenience (and without loss of generality) we shall choose to be a unit vector
so that ufUl = 1 (note that we are only interested in the direction defined by Ul,
not in the magnitude of Ul itself). Each data point X n is then projected onto a scalar
value ufX n . The mean of the projected data is ufx where x is the sample set mean
given by
(12.1)
and the variance of the projected data is given by
where S is the data covariance matrix defined by
1N
S = - "(xn - x)(xn - x)T NLJ . n=l
(12.2)
(12.3)
Appendix E
We now maximize the projected variance UfSUl with respect to Ul. Clearly, this has
to be a constrained maximization to prevent Ilulll ..... 00. The appropriate constraint
comes from the normalization condition ufUl = 1. To enforce this constraint,
we introduce a Lagrange multiplier that we shall denote by AI, and then make an
unconstrained maximization of
(12.4)
By setting the derivative with respect to Ul equal to zero, we see that this quantity
will have a stationary point when
(12.5)
which says that Ul must be an eigenvector of S. If we left-multiply by uf and make
use of ufUl = 1, we see that the variance is given by
(12.6)
and so the variance will be a maximum when we set Ul equal to the eigenvector
having the largest eigenvalue AI. This eigenvector is known as the first principal
component.
We can define additional principal components in an incremental fashion by
choosing each new direction to be that which maximizes the projected variance
Exercise 12.1
Section 12.2.2
Appendix C
12.1. Principal Component Analysis 563
amongst all possible directions orthogonal to those already considered. If we consider the general
case of an M -dimensional projection space, the optimal linear projection for which the variance of
the projected data is maximized is now defined by
the M eigenvectors U 1, ... , U M of the data covariance matrix S corresponding to the
M largest eigenvalues >'1, ... ,AM. This is easily shown using proof by induction.
To summarize, principal component analysis involves evaluating the mean x
and the covariance matrix S ofthe data set and then finding the M eigenvectors of S
corresponding to the M largest eigenvalues. Algorithms for finding eigenvectors and
eigenvalues, as well as additional theorems related to eigenvector decomposition,
can be found in Golub and Van Loan (1996). Note that the computational cost of
computing the full eigenvector decomposition for a matrix of size D x Dis O(D3).
If we plan to project our data onto the first M principal components, then we only
need to find the first M eigenvalues and eigenvectors. This can be done with more
efficient techniques, such as the power method (Golub and Van Loan, 1996), that
scale like O(MD 2 ), or alternatively we can make use of the EM algorithm.
Because we wish to find a sequential sampling scheme, we shall suppose that a set of samples and
weights have been obtained at time step n, and that we have subsequently observed the value of
xn+1, and we wish to find the weights and samples at time step n + 1. We first sample from the
distribution p(zn+1|Xn). This is 646 13. SEQUENTIAL DATA straightforward since, again using Bayes’
theorem p(zn+1|Xn) = p(zn+1|zn, Xn)p(zn|Xn) dzn = p(zn+1|zn)p(zn|Xn) dzn = p(zn+1|zn)p(zn|xn,
Xn−1) dzn = p(zn+1|zn)p(xn|zn)p(zn|Xn−1) dzn p(xn|zn)p(zn|Xn−1) dzn = l w(l) n p(zn+1|z(l) n )
(13.119) where we have made use of the conditional independence properties p(zn+1|zn, Xn) =
p(zn+1|zn) (13.120) p(xn|zn, Xn−1) = p(xn|zn) (13.121) which follow from the application of the d-
separation criterion to the graph in Figure 13.5. The distribution given by (13.119) is a mixture
distribution, and samples can be drawn by choosing a component l with probability given by the
mixing coefficients w(l) and then drawing a sample from the corresponding component. In summary,
we can view each step of the particle filter algorithm as comprising two stages. At time step n, we
have a sample representation of the posterior distribution p(zn|Xn) expressed as samples {z (l) n }
with corresponding weights {w(l) n }. This can be viewed as a mixture representation of the form
(13.119). To obtain the corresponding representation for the next time step, we first draw L samples
from the mixture distribution (13.119), and then for each sample we use the new observation xn+1
to evaluate the corresponding weights w(l) n+1 ∝ p(xn+1|z (l) n+1). This is illustrated, for the case of
a single variable z, in Figure 13.23. The particle filtering, or sequential Monte Carlo, approach has
appeared in the literature under various names including the bootstrap filter (Gordon et al., 1993),
survival of the fittest (Kanazawa et al., 1995), and the condensation algorithm (Isard and Blake, 1998)

Talking Head - OA Interview Questions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Talking Head - OA Interview Questions

Uploaded by

Copyright:

Available Formats

Open access interview: Questions

Background to the video:

Purpose of the video:

What does open access mean to you?

Why is open access important to you?

What are the benefits of open access publishing?

cations such as dimensionality .-eduction, lossy data comprc"ion, feature e>tracti"".

l"J'"OC"s< of onhogonal projection i' illustraled in FiguTe 12.2. We con,ider each of

these definitions in tum.

12,1.1 Mllximllm variance lormulation

562 12. CONTINUOUS LATENT VARIABLES

chapter, we shall consider techniques to determine an appropriate value of IV! from

To begin with, consider the projection onto a one-dimensional space (M = 1).

and the variance of the projected data is given by

where S is the data covariance matrix defined by

to be a constrained maximization to prevent Ilulll ..... 00. The appropriate constraint

comes from the normalization condition ufUl = 1. To enforce this constraint,

will have a stationary point when

which says that Ul must be an eigenvector of S. If we left-multiply by uf and make

use of ufUl = 1, we see that the variance is given by

We can define additional principal components in an incremental fashion by

12.1. Principal Component Analysis 563

the M eigenvectors U 1, ... , U M of the data covariance matrix S corresponding to the

To summarize, principal component analysis involves evaluating the mean x

eigenvalues, as well as additional theorems related to eigenvector decomposition,

scale like O(MD 2 ), or alternatively we can make use of the EM algorithm.

You might also like