Professional Documents
Culture Documents
Presentation
Nam Le
Email: namlehai90@gmail.com
Short Resume
2015, MSc (by research) in Computer Science, Le Quy Don Technical University
Research interests:
+ Natural Computing theory and applications:
- Genetic Programming, Genetic Algorithms, Simulated Annealing
+ Computational Biology:
Applying natural computing to solve NP-hard problems in biology:
- Gene-mapping: Recovery gene makers after cutting DNA into segments
(Double, Partial, Simplified Partial Digest problem).
- (In addition) Phylogenetic tree reconstruction.
Research experience:
+ Network Security Lab, Le Quy Don Technical University: 2 small projects
- Adaptive operators for genetic programming
- Genetic programming for network intrusion detection
+ Hanu R&D (Professor Hoai is the director), 2 prospective researches:
- Robustness of GP to noise
- Stochastic fitness in GP
+ Self-study: Physical mapping
Nam Le 09 November, 2015
2
Bioinformatics researches:
(Independently)
1. Heuristics for Physical mapping
implement genetic algorithms for Simplified
Partial Digest Problem.
vs
1.
Double digest
The decision problem of the DDP is NP-complete.
All algorithms have problems with more than 10
restriction sites for each enzyme.
A solution may not be unique and the number of solutions
grows exponenially.
DDP is a favorite mapping method since the experiments
are easy to conduct.
We assume that
multiplicity of a
fragment can be
detected, i.e., the
number of restriction
fragments of the
same length can be
determined (e.g., by
observing twice as
much fluorescence
intensity for a double
fragment than for a
single fragment)
n:
X:
PDP analysis
No polynomial time algorithm is known for
SPDP
Let = {1, . . . , 2N } be the multi-set of all
fragment lengths obtained by the short
experiment, and
let = {1, . . . , N+1} be the multi-set of all
fragment lengths obtained by the long
experiment,
where N is the number of restriction sites in S.
Here is an example: Given these (unknown)
restriction sites (in kb): 2 8 9 13 16
We obtain % = {2kb, 6kb, 1kb, 4kb, 3kb}.
21
Topic description
Statement of problem
Objectives
Research question
Research Method
Results
22
The Problem
Real-world data always has noise
Noisy data is one of the main cause of overfitting in any learning mechanisms
Noisy data makes EAs in general does not
converge well to optimal point.
23
Objectives
Main objective:
To visualize the robustness of GP and the impact of
noise on over-fitting property of GP.
Sub-objectives:
1. To build and contribute standard noisy data sets to
GP benchmark problems.
2. To classify the hardness / difficulty level of those
problems based on the robustness of GP to noise.
24
Research questions
1. Analyse and propose an effective noisegenerating model.
2. How effectiveness of GP learning model can
be affected by different types of noise and
noise level? And which noise level makes GP
be over-fitted?
3. Which type of problems GP can be good?
25
26
Question 3:
Experimenting on BVGP (Bias / variance GP)
Rank problems (Benchmarks and Real-UCI) based on the
difficulty for GP to solve
27
Results
Question 1:
Distribution is invariant after adding noise.
Generating noisy data for experiment
Question 2:
Collect results from experiment.
Affect of noise
Over-fitted problems
Nam Le 09 November, 2015
28
29
Over-fitted problems
30
31
Topic description
Statement of problem
Objectives
Research question
Research Method
Results
32
The Problem
Real-world data always has noise =>
uncertainty
Fitness functions in EA always accompanied by
noise
Noisy fitness function could result in a high
fitness being mistakenly assigned to low
individual, and conversely.
+ f(x1) > f(x2) does not mean trueF(x1) >
trueF(x2) in noisy environment.
Nam Le 09 November, 2015
33
Objectives
Main objective:
To insert uncertainty into stochastic selection in GP.
Sub-objectives:
1. To figure out whether a not Stochastic GP can solve
the negative effects caused by noise on 1st phase (overfitting).
34
35
Results
Question 1:
Distribution is invariant after adding noise.
Generating noisy data for experiment
Question 2:
Collect results from experiment.
Affect of noise
Over-fitted problems
Nam Le 09 November, 2015
36