You are on page 1of 39

Biclustering

Overview and Applications


Setia Pramana
Clustering
Clustering
Clustering: the process of grouping a set of objects into classes of
similar objects
Most common form of unsupervised learning
Unsupervised learning = learning from raw data, as opposed to
supervised data where a classification of examples is given
Clustering vs. Class prediction
Class prediction:
A learning set of objects with known classes
Goal: put new objects into existing classes
Also called: Supervised learning, or classification
Clustering:
No learning set, no given classes
Goal: discover the best classes or groupings
Also called: Unsupervised learning, or class discovery
Clustering
Clustering
Clustering
Clustering
Clustering Techniques

Clustering

Hierarchical Partitional

Single Complete Square Mixture


Wards
Link Link Error Maximization

Average K-means Expectation


Link Maximization

Data Mining and Knowledge Discovery 9


Issues in clustering
Used to explore and visualize data, with few preconceptions
Many subjective choices must be made, so a clustering output tends
to be subjective
It is difficult to get truly statistically significant conclusions
Algorithms will always produce clusters, whether any exist in the data
or not
Clustering
Cluster the Rows
Cluster the Coloumns
Cluster both simultaneously

K-means
Hierarchical
Biclustering algorithms

12
Biclustering
A biclustering method is an unsupervised learning method
which looks for sub-matrices in a data matrix with a high
similarity of elements.
Simultaneous clustering of the rows and columns of a data
matrix
Lots of Algorithms: Statistical based, AI, machine learning.
Software:
Biclust
BiclustGUI
FAST
Setia Pramana 13
Clustering vs. Biclustering
Biclustering - Identifies groups of genes with similar/coherent
expression patterns under a specific subset of the conditions.
Clustering - Identifies groups of genes/conditions that show similar
activity
Different Type of Biclusters
Bicluster Structure

Setia Pramana 16
Why Biclustering?
Biclustering is the key technique to use when
Only a small set of the genes participates in a cellular process of interest.
An interesting cellular process is active only in a subset of the conditions.
A single gene may participate in multiple pathways that may or not be
coactive under all conditions
Biclustering Algorithms
Based on evaluation measures
Non metric-based biclustering
Biclustering: Cheng and Church
approach
Cheng and Church approach:
Find biclusters with mean squared residue (H) < h

bij; biJ ; bIj and bIJ represent the element in the ith row (condition)
and jth column, the row and column means, and the mean of Biclust.
Remove the row/col that reduces H the most
Add rows/cols that do not increase H
Stop when H <
Mask bicluster with random values
Repeat to find next bicluster
21
Software
Biclust R
Software
BiclustGUI
Software
BiclustGUI
Biclust Shiny (https://uhasselt.shinyapps.io/shiny-biclust/)
Software
FAST
Forum Analisis
Statistik
STIS-BPS
Applications: Biclustering in Bioinformatics
Genes not regulated under all conditions
Genes regulated by multiple factors/processes concurrently
Key to determine function of genes
Key to determine classification of conditions
Applications
Goals
Cluster the districts in Java based on the education indicators,
Study the spatial effect,
Investigate cluster specific indicators

32
Data
Quality: Illiteracy, Expected Year School, and Mean Year School
Participation : Net Enrollment Rate and Drop Out Rate.
Facilities: Ratio of StudentTeacher and Ratio of Student-School
118 Districts in Java
Sources Susenas 2014, Region in Figure, 2014.

33
Education Indicators
Net Enrolment Rate Primary School Adult Literacy Rate
(PS) Dropout Rate in PS
Net Enrollment Ratio Junior High Dropout Rate in JHS
School (JHS)
Expected Years of Schooling
Ratio of Students per Schools PS
Average Years of Schooling
Ratio of Students per Schools JHS
Ratio of Students per Teachers PS
Ratio of Students per Teachers JHS

34
Approaches
Fuzzy C-Means
Fuzzy Geographically Weighted Clustering
Biclustering: Cheng and Church Method

35
Biclustering CC

36
Results: CC Biclustering

37
Discussion
Much more to be explored
Broad Applications
Ongoing: Social vulnerability Index
Many things to develop:
technique for biclustering algorithms performance comparison

You might also like