Professional Documents
Culture Documents
Clustering
Hierarchical Partitional
K-means
Hierarchical
Biclustering algorithms
12
Biclustering
A biclustering method is an unsupervised learning method
which looks for sub-matrices in a data matrix with a high
similarity of elements.
Simultaneous clustering of the rows and columns of a data
matrix
Lots of Algorithms: Statistical based, AI, machine learning.
Software:
Biclust
BiclustGUI
FAST
Setia Pramana 13
Clustering vs. Biclustering
Biclustering - Identifies groups of genes with similar/coherent
expression patterns under a specific subset of the conditions.
Clustering - Identifies groups of genes/conditions that show similar
activity
Different Type of Biclusters
Bicluster Structure
Setia Pramana 16
Why Biclustering?
Biclustering is the key technique to use when
Only a small set of the genes participates in a cellular process of interest.
An interesting cellular process is active only in a subset of the conditions.
A single gene may participate in multiple pathways that may or not be
coactive under all conditions
Biclustering Algorithms
Based on evaluation measures
Non metric-based biclustering
Biclustering: Cheng and Church
approach
Cheng and Church approach:
Find biclusters with mean squared residue (H) < h
bij; biJ ; bIj and bIJ represent the element in the ith row (condition)
and jth column, the row and column means, and the mean of Biclust.
Remove the row/col that reduces H the most
Add rows/cols that do not increase H
Stop when H <
Mask bicluster with random values
Repeat to find next bicluster
21
Software
Biclust R
Software
BiclustGUI
Software
BiclustGUI
Biclust Shiny (https://uhasselt.shinyapps.io/shiny-biclust/)
Software
FAST
Forum Analisis
Statistik
STIS-BPS
Applications: Biclustering in Bioinformatics
Genes not regulated under all conditions
Genes regulated by multiple factors/processes concurrently
Key to determine function of genes
Key to determine classification of conditions
Applications
Goals
Cluster the districts in Java based on the education indicators,
Study the spatial effect,
Investigate cluster specific indicators
32
Data
Quality: Illiteracy, Expected Year School, and Mean Year School
Participation : Net Enrollment Rate and Drop Out Rate.
Facilities: Ratio of StudentTeacher and Ratio of Student-School
118 Districts in Java
Sources Susenas 2014, Region in Figure, 2014.
33
Education Indicators
Net Enrolment Rate Primary School Adult Literacy Rate
(PS) Dropout Rate in PS
Net Enrollment Ratio Junior High Dropout Rate in JHS
School (JHS)
Expected Years of Schooling
Ratio of Students per Schools PS
Average Years of Schooling
Ratio of Students per Schools JHS
Ratio of Students per Teachers PS
Ratio of Students per Teachers JHS
34
Approaches
Fuzzy C-Means
Fuzzy Geographically Weighted Clustering
Biclustering: Cheng and Church Method
35
Biclustering CC
36
Results: CC Biclustering
37
Discussion
Much more to be explored
Broad Applications
Ongoing: Social vulnerability Index
Many things to develop:
technique for biclustering algorithms performance comparison