You are on page 1of 10

Data Analysis & Tools Question Bank

Unit 1
1. What is purpose of Using C in Statistical Computing? Or what are the features of C programming
Language that can be utilized in data analysis.
2. What are the different types of variables available in C? Explain their scope.
3. Describe the usage of frame in C with an example
4. State the stepwise process of compilation
5. Write a C program to the likelihood that a student shares first persons birthday & the likelihood that
any 2 students share a birthday.
6. Write a note on declaring types & initializing in C.
7. How a function is defined in C & explain call by value.
8. Explain the concept of pointers & usage in C.
9. Compare call by address & call by value or State the differences between pass-by-value and pass-byreference
10. Explain memory allocation in C.
11. Explain malloc & calloc with an example.
12. Using pointer as an array to fill it with square numbers.
13. What are strings? How to define string functions in C
14. Explain the following string functions in C
Strlen,strncpy,strncat,snprintf,asprintf,strcmp
15. Explain assert macro with an example.
16. How to test functions using assert.
17. What is SQLlite? What are its advantages when using in data analysis project?
18. Explain select,from,where clause & distinct keyword.
19. Explain use of group by & having keyword.
20. Write a note on aggregate functions.
21. How to carry out sorting in databases.
22. How to limit the no. of records in the display.
23. State the purpose of limit and offset clauses with examples
24. Write a note on DDL commands in databases.
25. How to delete a table from databases. What is the use of apop_table_exists function from apophenia
library?
26. How to insert, delete & update records in table.
27. Write a note on joins.
28. Explain subqueries in detail.
29. Explain apophenia functions used to fold queries into C code.
30. Explain the following apophenia functions
apop_db_to_crosstab
apop_db_merge
apop_crosstab_to_db
31. Describe the use of the following functions
a. apop_open_db() or apop_open_db(null), apop_close_db(0) or apop_close_db(1)
b. apop_query_to_text(),apop_data_show(),apop_query_to_data(),apop_query_to_matrix()

c. apop_data_print(), apop_matrix_print(),apop_vector_print()
32. What are the two ways of creating tables in SQL?
33. Explain how a user can switch from SQLlite to another database MySQL.

Unit 2

1. What is the naming convention for gsl,apophenia & glib library.


2. List & explain the functions used matrix & vector operations.
3. Explain the purpose of apop_data structure.
4. State the uses of get, set and point in apop_data structure
5. State the differences between apop_data() and apop_model()
6. List & explain apophenia functions for getting , setting & pointing to apop_data.
7. Explain the use of apop_data_stack function.
8. Explain the different types of matrix and vector operations using GSL library
9. Explain the apop library function which transforms the row state to column state
10. Write the code to
1. Copy a text file to another text file.
2. Copy a gsl_vector to another gsl_vector.
3. Copy a gsl_matrix to another gsl_matrix.
4. Copy a apop_data to another apop_data.
5. Copy a double[] to another double[].
6. Convert a text file to db table.
7. Convert text file to apop_data
8. Convert a multidimentsional double array to gsl_vector & gsl_matrix
9. Convert single dimensional double array to gsl_vector & gsl_matrix
10. Convert gsl_vector to gsl_matrix
11. Convert gsl_vector & gsl_matrix to apop_data
12. Convert gsl_vector to double[]
11. Explain the use of apop_opts.output_type function & its different choices.
12. Explain the functions apop_vector_print, apop_matrix_print, apop_data_print.
13. How to get data out of database & assign to a double variable,gsl_vector,gsl_matrix,apop_data.
14. Write apophenia library function to retrieve 4th row of the matrix named row_v, 5th column of the
matrix named col_v & 4x5 matrix whose (0,0) is at location (2,5).
15. Explain apop_dot function.
16. List the functions used to obtain determinant & inverse of the matrix.
17. What is the purpose of special values INFINITY,-INFINITY,NAN of floating point numbers.
18. Explain the use of terms set term & set out in gnuplot.
19. Explain plot,replot & splot with an example in gnuplot.
20. Explain
Set style

Set pointtype
Set linetype
Set title
Set xlabel
Set ylabel
Set key
Set xrange,set yrange
Set xtics ,set ytics
21. Write a program in C which has a function to open a pipe to gnuplot and plot a vector.
22. Explain with a program how to define a self executing gnuplot script.
23. Explain the function apop_plot_lattice()
24. What are error bars & how to plot data using error bars.
25. What is a histogram & which apophenia function can be used to plot a histogram.Give an example.
26. Log plots
27. Pruning & jittering
28. What are the different types of styles used while plotting graphs?
29. How to convert a matrix to basic plot? Explain with an example
30. Explain animation with gnuplot.
31. Explain plotting of graph with gnuplot.
Unit 3
1. How to define a function pointer? What is the use of function pointer?
2. Explain the use of typedef.
3. Explain Linked lists .Also list the functions from Glib used to implement Linked lists.
4. Explain Linked lists .Also list the functions from Glib used to implement Linked lists.
5. Explain binary tree .Also list the functions from Glib used to implement binary tree.
6. Explain working with interactive parameters, environment variables & parameter files.
7. How to read parameters from command line in C programs.
8. Write a note on getopt function.
9. Write a note on macros in C.
10. Explain the significance of following tools
Memory debugger, revision control, the profiler
Part II statistics
1. Define statistic (in terms of function).
2. What are moments?
3. Explain the term expected value in relation with probability distribution. Also specify its formula in
case of discrete & continuous probability distribution.
4. Define standard deviation & variance.Also if E[x] denotes expextation of variable x then show that
var(x)=E[x2]-E[x]2.
5. Derive an expression for mean squared error.
6. Explain within group & among group variance.
7. What is coefficient of determination & what is its significance.
8. What is covariance & give the covariance matrix.

9. What are raw & central moments.How are they obtained.


10. Explain the significance of skewness & kurtosis .Also state the formulae for the calculation of
skewness and kutosis.
11. List & explain the apophenia functions along with the syntax to obtain mean
,variance,skewness,kurtosis, covariance & correlation.
12. Explain the apophenia function apop_matrix_summarize.
13. Explain how to obtain quantiles,median & deciles of data.
14. List the opaphenia function to obtain quantiles,median & deciles of data.
15. List & explain the discrete probability distributions (including the pmf,mean & variance) .
16. List & explain the continous probability distributions (including the pdf & cdf).
17. State Bayes & explain its significance.
18. Explain the method of moving average & also specify the apophenia function to obtain moving
average.
Unit 4
1. Explain the role of PCA in linear projection.
2. Explain linear projection with OLS.
3. Explain GLS & weighted least squares method.
4. Write a note on progit & logit model.
5. State the central limit theorem & explain its applications.
6. Discuss chi square distribution in short.
7. Discuss students t distribution in short.
8. Discuss F distribution in short.
9. List the procedure of z test.
10. Explain the application of chi square & F based test.
11. Write a note on Anova methods.
12. Compare anova & regression.
13. What is the purpose of goodness of fit test.
14. Explain goodness of fit test using chi square.
15. Explain goodness of fit test using kolmogorovs method.

Unit 5
1. Explain the purpose of log likelihood test. Also state the log likelihood function,score S & information
matrix.
2. State the cramer Rao lowerbound lemma.
3. State the Neyman Pearson Lemma.
4. Discuss the methods of finding optima. OR
5. Explain the Nelder-Mead Simplex method of finding optima.
6. Explain the conjugate gradient method of finding optima.
7. Explain the root finding method of finding optima.
8. Explain the simulated annealing method of finding optima.
9. Compare global & local optima.
10. Explain the use of apop_estimate_restart function.
11. What are Monte carlo methods ?

12. List the gsl function to generate random numbers.


13. How to plot a distribution using random numbers.Explain with an example.
14. Explain the use of apop_histogram function.
15. Write a note on bootstrapping method estimating variance & standard error.
16. Write a note on Markov chain Monte carlo.
17. Explain the testing of bimodality.

Problems:
On distribution: Binomial , Poisson and Normal distributions
P #1.It is found that, in general 2 out of 5 persons are swimmer.
If 4 persons are selected at random, what is the probability that (i)
two of them are swimmers and
(ii)
at least on of them is a swimmer?
P #2.A bomber wants to destroy a bridge. Two bombs are sufficient to destroy
it. If 4 bombs are dropped, what is the probability that it is destroyed, if
the chance of a bomb hitting the target is 0.4?
P #3.If the chance that any of the 5 telephone lines is busy at any instant is 0.1,
find the probability that all the lines are busy. Also find the probability
that not more than 3 lines are busy.
P #4.An industrial establishment is such that 20 % workers are liable to a
disease. Find the probability that out of 6 workers (i) four, (ii) at least four
catch the disease.
P #5.The probability that a seed from a contain lot will germinate is 0.7. If 8
seeds are sown, find the probability that at least 6 of them will germinate.
P #6.The probability that India wins a cricket test match against England is
given to be 1/3. If India and England play 3 test matches, what is the
probability that (i) India will lose all the 3 test matches, (ii) India will
win at least one test match?
P #7.The percentage of defective blades manufactured by a firm is known to be
10%. If a packet of 5 blades produced by this firm is selected at random,
determine the probability that there are exactly 2 defectives in this packet.
P #8. A box contains 100 C.D.s, 20 of which are defective , 10 are selected for
inspection. Find the probability that (i) all 10 are defectives (ii) al 10 are good
(iii) at least one is defective (iv) at most 2 are defectives
P# 9. An industrial chemical that will retard the spread of fire in paint has been
developed, The local sales representative has estimated, from past
experience that 48% of the sales calls result in an order.
a) If eight sales calls are made in a day, what is the probability of receiving
exactly six orders

b) If four sales calls are made before lunch, what is the probability that one
or fewer results in an order?

P# 10.The number of hurricanes hitting the coast of Florida annually as a Poisson


distribution with a mean of 0.8.
a) what is the probability that more than two hurricanes will hit the Florida
coast in a year?
b) what is the probability that exactly one hurricane will hit the coast of Florida
in a year?
P# 11.Arrivals at a bank tellers drive-through window are Poisson distributed at the
rate of 1.2 per minute. What is the probability of :
a) zero arrivals in the next minute (b) zero arrivals in the next two minutes
P# 12. A computer repair person is beeped each time there is a call for service.
The number of beeps per hour is known to occur in accordance with a
Poisson distribution with a mean of two per hour. Find the probability of three
beeps in the next hour? ( Given e- 2 = 0.1353 )
P# 13.A local electrical appliances shop has found from experience that the demand
for tube lights is distributed as Poisson with a mean of four tube per week. If
the shop keeps six tubes during particular week, what is the probability that the
demand will exceed the supply during that week ?(given e-4=0.0183)
P # 14If 2% of electric bulbs manufactured by a certain company are defective, find
The probability that the sample of 200 bulbs,
i) less than two bulbs
(ii) more than three bulbs are defective
P#15.It is known from past experience that in a certain plant there are on an
average four industrial accidents per month. Find the probability that in a
given month there will be less than four accidents.
P #16. If X is normally distributed with mean 100 and s.d. 10 , find
(i) P(90 < X < 105 ) (ii) P(X < 120 ) (iii) P( X > 85) (iv) P(105 < X < 115)
P #17. If the heights of 1000 soldiers in a regiment are normally distributed with a
mean of 172 cm. and s.d. of 5 cm. , how many soldiers have heights greater
than 180 cm
P #18. The income distribution of a group of 10000 persons was found to be normal
with mean of Rs. 7500 per month and s.d. of Rs. 500 per month. What
percentage of this group had income (i) exceeding Rs. 6680 (ii) not more
than Rs. 7000
P #19. The linear measurements of the items of a product are approximately
Normally distributed with a mean of 20 cm. and a s.d. of 4 cm. Items which
measure between 18 cm. and 23 cm. are sold at Rs. 5 each and the other
items at Rs. 3 each. Fin the total amount collected if in all 10000 items are
sold. How many items must be of measurement 26 cm. or more ?

P #20. IQ scores are normally distributed throughout society , with mean 100 and
s.d. 15. (a) A person with an IQ of 140 or higher is called a genius. What
proportion of society is in genius category ?
(b) What proportion of society will miss the genius category by 5 or less
points ?
(c) Suppose that an IQ of 110 or higher is required to make it through an
accredited college or university. What proportion could be eliminated for
completing a higher education by a low IQ score ?
P # 21. A set of examination marks is approximately normally distributed with mean
of 75 and standard deviation of 5. If top 5% students get grade A and the
bottom 25% get grade F , what marks is the lowest A and what marks is the
highest F ?
P # 22. The distribution of monthly incomes of a group of 3000 factory workers is
following normal distribution with the mean equal to Rs. 10000 and s.d.
Rs 2000. Find (i) the percentage of workers having a monthly income of
more than Rs. 12000
(ii) the number of workers having a monthly income of less than Rs. 9000
(iii)the highest monthly income among the lowest paid 100 workers
(iii)
the least monthly income among the highest paid 100 workers
multinomial distribution

The following formula gives the probability of obtaining a specific set of outcomes when there are
three possible outcomes for each event:

where
p is the probability,
n is the total number of events
n1 is the number of times Outcome 1 occurs,
n2 is the number of times Outcome 2 occurs,
n3 is the number of times Outcome 3 occurs,
p1 is the probability of Outcome 1
p2 is the probability of Outcome 2, and
p3 is the probability of Outcome 3.

For the chess example,

n = 12 (12 games are played),


n1 = 7 (number won by Player A),
n2 = 2 (number won by Player B),
n3 = 3 (the number drawn),
p1 = 0.40 (probability Player A wins)
p2 = 0.35(probability Player B wins)
p3 = 0.25(probability of a draw)

The formula for k outcomes is

Hypergeometric distribution

Problems on Z test
1. A random sample of 100 bundles gives a mean of 8.5 tons and standard deviation 4 tons.Can the
sample be regarded as drawn from a population with mean 7 tons?test this at level of
significance 5%.
2. The average life of an Indian is 70 years.A random sample of 100 Indians has an average life of
71.8 years with standard deviation of 7.8 years.test the hypothesis.

Problems on t test
formula:

Test the hypothesis for the following data:

Question paper pattern for Credit Based Semester and Grading System (CBSGS)
(2 hrs)

Tot Marks: 60

Q1

Attempt any two questions


a
b
c
d

12
6
6
6
6

Q2

Attempt any two questions


a
b
c
d

12
6
6
6
6
6

Attempt any two questions


a
b
c
d
Attempt any two questions
a
b
c
d

12
6
6
6
6
12
6
6
6
6

Attempt any two questions


a
b
c
d

12
6
6
6
6

Q3

Q4

Q5

Each major Question is covered from each unit

Theory questions can be from all the units. Problems will be based on distributions, Z-test, T-Test and
F-Test(Refer to the question bank for the sample problems)

Reference Books for Theory and Problems:

1. Modeling with Data: Tools and Techniques for Scientific Computing Ben Klemens,
Princeton University Press.
2. Computational Statistics, James E. Gentle, Springer
3. Computational Statistics, Second Edition, Geof H. Givens and Jennifer A. Hoeting, Wiley
Publications
4. Introduction to Mathematical Statistics-RObert Hogg & A T Graig, pearson education.
5. Intro to theory of statistics-GrayBill & Boes-tata mcgraw
6. Introductory Statistics (Preliminary Edition): A Problem-Solving Approach ,Stephen
Kokoska
7. Discrete Event System Simulation by Jerry Banks, John S. Carson II, Barry L. Nelson,
David M. Nicol

You might also like