Proposal Markov Chain - Original Selected Proposal For GSoC 2017.

Project Info:
Project Title: Markovchain package
Project short title: This project aims to extend the current functionality and
capabilities of the R package ‘markovchain’ in order to provide statisticians a more
functional tool to perform analysis of stochastic projects related to Markov chains
(MCs).
URL of project idea -

https://github.com/rstats-gsoc/gsoc2017/wiki/The-markovchain-package
Bio of Student:
I am a computer science student studying in Indian Institute of Technology
(Banaras Hindu University), Varanasi, India. I have relevant coding experience in R
that would be needed to build this package. I have previously worked on datasets
like Hubway visualization challenge, movielens dataset. In addition, I have been
working on building a shiny web application using the Rgbif package currently
(Github link below). A similar visualization application can be a part of the proposed
package. In addition to R programming, I am also familiar with c++ programming
and the Rcpp package. I have implemented all the assignments and projects in the
data structures and algorithms course using c++ language and hence a command
on c++ also. Along with this I am also familiar with git/github (version control).
Academics:
I have been enrolled in many MOOCs (Massive Open Online Course) related to
data science in R language, Data science specialization courses on Coursera
among others. In total, I have taken three courses in computer programming along
with Data structures course. Currently I am attending an Algorithms course and an
Artificial Intelligence course at my college. I have also taken a statistics and
probability course in my institute. I have implemented a Hidden Markov Model and
then used the viterbi algorithm to perform part of speech tagging as an assignment
in one of the courses, hence I am familiar with stochastic processes. link -
https://github.com/vandit15/AI-lab-codes .
Previous Works References:

1. Data Analysis using R:
● Analysis of Hubway visualization dataset.
● Movielens dataset (Applying recommender system)
● Kaggle Titanic solution in R (random forest algorithm)
● Analysis of iris dataset.
Link – https://github.com/vandit15
2. Built a website for training and placement cell IIT(BHU) using django for
backend, MySQL for database management and materializecss for front-end
designing.
Link - https://github.com/vandit15/IITBHU-TPO-site
3. Built a tank game using pygame and python2.7

Link - https://github.com/vandit15/tank
4. Building a signature verification system using Convolutional Neural

Networks(CNN) and then applying similarity functions to detect forgeries
using KL-KS test (in progress).
Link - https://github.com/DeepVisionaries/Signature-Verification
Contact Information:
Student Name: Vandit Jain
Student postal address: 138-A, R K Puram, near Gauri hospital, Kota, Rajasthan,
India (pin code-324005)
Phone number: (+91) 8764340070, 7233013328
Email: jainvandit15@gmail.com, vandit.jain.cse15@itbhu.ac.in
Student affiliation:
Institution: Indian Institute of Technology (Banaras Hindu University), Varanasi,
India
Program: Bachelor of Technology (B.Tech) in Computer Science and Engineering.
Stage of completion: Part 2 (4th Semester)
Contact to verify:
Dr. Rajeev Shrivastava
Professor, Department of Computer Science and Engineering
Email: rs.cse@iitbhu.ac.in
Schedule Conflicts:
I do not find myself working in any kind of internship/part time jobs/other jobs during
summer of 2017. I have no conflicts with the GSoC schedule. I am willing to invest
whole of my three months towards the success of my GSoC project.
Mentors:
Mentor-1 - Sai Bhargav Yalamanchi
Mentor-2 - Giorgio A. Spedicato
I established contact with the mentors after solving the tests for the project. We
have been in contact since then.
Coding plan and Methods:
The project at its heart is to improve the markovchain package, improve run time of
current functions and add more functions.
Optimisation of current functions – For optimisation I would search for
opportunities in the current package where I can improve run-time. This would take
overviewing the code. R has packages such as microbenchmark among others that
can be used to detect bottlenecks in the code. Looping in R is quite a slow process.
After detection of slow running parts of code using above methods, the task is to
fasten the process. If I find a slow running loop, I would replace it with apply family
of functions. This would considerably improve the running time. All functions written
in R that are slow can be written in c++ using Rcpp package. This also improves
running time considerably. Fine-tuning current functions also includes improving
current documentation and unit-testing according to changes made. The package
also uses RcppParallel. I intend to use it wherever possible.
Continuous-time Markov Chain – Considerable amount of work has been done

on CTMCs in the package. I will implement more functions which includes:
● Getting probabilities of states at any given time t. For a CTMC object of the
S4 class already implemented, I would add function to evaluate P(t).
● Functions to get the Generator Matrix and Transition Diagram Plotting.
● Function for expected hitting time (from some state j to i).
● Implementing imprecise CTMCs, using ideas from
https://arxiv.org/pdf/1611.05796.pdf https://arxiv.org/pdf/1702.07150.pdf .
Implementation of basic infrastructure and methods – algorithms for
computing lower expectations of functions that depend on a state at any
number of finite points.
These functions would be written so as to work according to current implemented
code. Studying about imprecise CTMCs and then implementing the functions would
take considerable time in the project.
Higher Order Multivariate MCs – A considerable amount of time has been given
in implementing HOMMCs in the previous year of GSoC (2016). Continuing the
work, I will write functions to generate random sequence from chain object and
initial conditions. This is just to add more features to the the present work. I will
continue the methods used to implement this functions. Maintaining the S4 classes
for HOMMCs in R and the fitting functions are implemented in Rcpp. Package main
vignette will be updated as well.
Building important graphics features – The graphics related functions are

minimal in the package currently. For visualisation of large and complex
markovchains, a plotting feature would enable users to view the transition
probability diagram for communicating classes. R packages like ggplot2, plotly
would be helpful to complete this task. I have some experience in visualisation
using R and it will be helpful in completing this task.
Joint Distributions of the number of visits for Finite-State MCs – This function
when implemented is expected to return a pdf of the number of visits to the various
states of the DTMC during the first N steps or before the Nth visit.
Stability tests for a Markov Chain – The idea presented in this

(https://arxiv.org/pdf/1608.03257.pdf) paper about stability of markovchains is worth
implementing. The paper describes a simulated annealing based approach which
can be added as a feature. For example, in queuing applications Markovchain
offers the guarantee that service has been sufficiently provisioned to cope with the
load imposed on the network in the long run.
Markovchain Statistics - Currently computation of only the first passage time has
been implemented. The pdfs for each of these can be obtained by solving a set of
equations with similar forms but varying initial conditions for a ‘minimal’ solution. I
will be spending time building two functions that perform these tasks: Extending the
first passage time pdf computation for a set of states A and the expected first
passage time. Second is function that takes two disjoint sets A, B, the pdf which
takes an initial state i and tells you the probability that A is hit before B. Functions
would be implemented using the idea given in
(http://www2.math.uu.se/~takis/L/McRw/mcrw.pdf). Proper unit-testing and
documentation using roxygen2 would be an important part.
Computation of Rewards – This is something new to the markovchain package. It

is basically implementing a set of functions that give the expected reward before a
set of states is hit. Also, for Markov chains possessing a positive recurrent state,
given a bounded reward function, the time-average reward function can be easily
computed.
Timeline:
According to the coding plan, the timeline is set so as to implement considerable
deliverables at the time of both the mid-term evaluation (June 30th , July 28th ) and
the final evaluation (29th August) .
Pre community Bonding Period (April 3rd – May 4th ) - I would invest this period
of time in improving my knowledge about markovchain through sources one of them
definitely being Dobrow, Introduction to Stochastic Processes in R. I have took a
basic course in statistics and probability and also implemented a hidden
markovchain model as an assignment in Artificial Intelligence course which would
help. I would also brush up my R skills especially Rcpp.
Community Bonding Period (May 5th – May 29th ) – This period is important as
this time would be invested in discussing about the structure about the proposed
functions for the project. Also I would go through the whole package as currently I
have read very few functions from the package (during solving the tests). I would at
least write pseudo code or summary for some functions (after studying the papers
referred to) and also start implementing them if time permits. Also I intend to
perform optimisation related work in this period.
Coding Period -
Continuing from the work done in Community Bonding Period coding period would
be divided as follows:
30th May - 4th June – Complete optimization related work carrying on from the
community bonding period.
5th June - 12th June – Discuss with mentors and write pseudo code for functions
related to CTMCs. for p(t) read page 301 of book
13th June – 25th June – Implement the pseudo code decided.

26th June - 28th June – Build unit-testing for the above implemented functions.
29th June - Write documentation for the new functions written.
30th June - This marks the point for deadline of 1st

evaluation.
1st July - 4th July – Discuss with mentors and write pseudo code for implementing
stability tests.
5th July - 12th July – Implement the discussed functions in the previous week.
13th July - 16th July - Discuss with mentors and write pseudo code for functions
related to markovchain statistics.
17th July - 24th July – Implement the discussed functions in the previous week.
25th July - 26th July – Unit testing of functions implemented after first evaluation.
27th July – Write documentation for the implemented functions.
28th July – This marks the point for deadline of 2nd

evaluation.
29th July - 30th July - Discuss with mentors and write pseudo code for
improvement in graphics for the package.
31st July - 4th August – Implement the graphics functions for the package and
update documentation.
5th August - 9th August - Discuss with mentors and write pseudo code for
proposed functions related to HOMMCs.
10th August - 13th August - Implement the discussed functions.
14th August - 15th August – Unit testing and documentation for modified and newly
implemented functions.
16th August - 20th August – Implementing miscellaneous functions related to
computation of rewards and number of visits for finite state Mcs.
21st August - 28th August – Revising all modifications, updating documentation,
unit-testing and bug-fixing.
29th August - Final evaluation of the project.
Management of the coding project:

● I would try to maintain close contact with the mentors discussing ongoing
work and future steps.
● I will try to maintain unit testing in order to check that package functionalities
are preserved.
● Background process of documentation and vignette building would be
maintained.
● Code would be hosted on github from the beginning of the project.
Tests:
For the markovchain package, I had to submit pull requests for issue #106 and
#115.
Issue#106 pertains to NA handling for functions markovchainFit and

markovchainListFit.
Issue #115 pertains to round off error in steadyStates function. This is the link to my
fork.
https://github.com/vandit15/markovchain

Proposal Markov Chain - Original Selected Proposal For GSoC 2017.

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Proposal Markov Chain - Original Selected Proposal For GSoC 2017.

Uploaded by

Copyright:

Available Formats

Project Info:

Project Title: Markovchain package

URL of project idea -

Previous Works References:

3. Built a tank game using pygame and python2.7

4. Building a signature verification system using Convolutional Neural

Continuous-time Markov Chain – Considerable amount of work has been done

Building important graphics features – The graphics related functions are

Stability tests for a Markov Chain – The idea presented in this

Computation of Rewards – This is something new to the markovchain package. It

13th June – 25th June – Implement the pseudo code decided.

30th June - This marks the point for deadline of 1st

28th July – This marks the point for deadline of 2nd

29th August - Final evaluation of the project.

Management of the coding project:

Issue#106 pertains to NA handling for functions markovchainFit and

You might also like

Proposal Markov Chain - Original Selected Proposal For GSoC 2017.

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Proposal Markov Chain - Original Selected Proposal For GSoC 2017.

Uploaded by

Copyright:

Available Formats

Project Info:

Project Title: ​Markovchain package

URL of project idea -

Previous Works References:

3. Built a tank game using pygame and python2.7

4. Building a signature verification system using Convolutional Neural

Continuous-time Markov Chain – ​Considerable amount of work has been done

Building important graphics features – ​The graphics related functions are

Stability tests for a Markov Chain – ​The idea presented in this

Computation of Rewards – ​This is something new to the markovchain package. It

13​th​ June – 25​th​ June – ​Implement the pseudo code decided.

30​th​ June - ​This marks the point for deadline of ​1st​

28​th​ July – ​This marks the point for deadline of ​2nd​

29​th​ August - ​ Final evaluation of the project.

Management of the coding project:

Issue#106 pertains to NA handling for functions markovchainFit and

You might also like

Project Title: Markovchain package

Continuous-time Markov Chain – Considerable amount of work has been done

Building important graphics features – The graphics related functions are

Stability tests for a Markov Chain – The idea presented in this

Computation of Rewards – This is something new to the markovchain package. It

13th June – 25th June – Implement the pseudo code decided.

30th June - This marks the point for deadline of 1st

28th July – This marks the point for deadline of 2nd

29th August - Final evaluation of the project.