You are on page 1of 62

Recognition of 3D Objects

or, 3D Recognition of Objects


Alec Rivers
Overview
3D object recognition was dead, now its
coming back
These papers are within the last 2 years
Doesnt really work yet, but its just a
beginning
Papers
The Layout Consistent Random Field
for Recognizing and Segmenting
Partially Occluded Objects
CVPR 2006

3D LayoutCRF for Multi-View Object


Class Recognition and Segmentation
CVPR 2007

3D Generic Object Categorization,


Localization and Pose Estimation
ICCV 2007
The Layout Consistent Random Field for
Recognizing and Segmenting Partially Occluded
Objects

John Winn Jamie Shotton


Microsoft Research University of
Cambridge Cambridge
Introduction
Needed to understand next paper
Its 2D
What does it try to solve?
Recognize one class of object at one pose and one
scale, but with occlusions
Does it work?
Yes, really well, especially given occlusions
Introduction
What is interesting about it?
Segments objects
Interesting methods
No sliding windows
Multiple instances for free
Overview
Instead of sparse parts at features, use a
densely covering part grid

[Fischler & Elschlager 73]


[Winn & Shotton 06]
Recognizing New Image Overview
Walk through an example
Recognizing a New Image Overview
1. Pixels guess their part
Recognizing a New Image Overview
2. Maximize layout consistency
Layout Consistency
Defined pairwise between two pixels:
PI, PJ => Bool
Means pixels I, J could be part of one instance
Toy example:
Object: 1,2,3,4,5
Image:
2,3,4,5,0,0,1,2,3,4,5,2,3,4,5,0,0
Layout Consistency
Defined pairwise between two pixels:
PI, PJ => Bool
Means pixels I, J could be part of one instance
Toy example:
Object: 1,2,3,4,5
Image:
2,3,4,5,0,0,1,2,3,4,5,2,3,4,5,0,0
instance 1 instance 2 instance 3

occlusion
Layout Consistency
In 2D, consistent IFF their relative assignments
could exist in a deformed regular grid
Formally:
Overview
2. Maximize layout consistency
Layout Consistency
3. Find consistent regions; create instances

Possible due to layout inconsistency at


occluding borders
Overview
1. Pixels guess parts
2. Maximize layout consistency
3. Create instances

[Winn & Shotton 06]


Implementation Details
Trained on manually segmented data
Crux of algorithm is conditional distribution
Like a probability for each possibility, or a score
Algorithm is just finding maximum
Part Appearance
Each pixel prefers parts that match
surrounding image data
Randomized decision trees
Multiple trees, each trained on a subset of the
data
Node is maximal-information-gain binary test on
two nearby pixels intensities
Leaf of node is histogram of part possibilities
Actual preference is average over all trees
Deformed Training Part Labelings
Fits parts tighter
1. Label by grid
2. Learn from data
3. Apply to data
4. Set guesses as
truth
5. Relearn
Part Layout
Preference for layout consistency plus additional
pairwise costs:

Helps remove noise


Align edges along image edges
Part Layout
Return to toy example
Just appearance:
1,2,0,4,5,0,0,1,2,3,3,4,0,0,1,0
With layout costs:
1,2,3,4,5,0,0,1,2,3,3,4,0,0,0,0
instance 1 instance 2
Instance Layout
Apply weak force trying to keep parts at sane
positions relative to instance data (centroid, L/R flip)

Toy example: 0,1,1,1,1,1,2,3,4,5 is bad!


Implementation
Theoretically, finding global maximum of

This is MAP estimation


MAP = Maximum A Posteriori
In reality, using tricks to find a local maximum
-expansion, annealed expansion move
Approximating MAP Estimation
Global maximum is intractable
-expansion
Start with given configuration
For a given new label, ask each pixel: do you want
to switch?
Can be solved efficiently with graph cuts
Repeat over all part labels
Annealed expansion move
Relabel grid, but offset to avoid local maxima
Results
Results
Results

Oh, snap!
Thoughts
Bottom-up system is great
No sliding windows
Multiple instances for free
Information about segment boundaries:
occlusion vs. completion
Reason about complete segment boundaries?
3D LayoutCRF for Multi-View Object Class
Recognition and Segmentation

Derek Hoiem Carsten Rother John Winn


Carnegie Mellon Microsoft Research
University Cambridge
Introduction
What does it try to solve?
Extend LayoutCRF to be pose and scale invariant
Does it work?
Improvements to LayoutCRF work;
3D information does little
What is interesting about it?
One method for combining 2D methods with a 3D
framework
The improvements to 2D are good
Overview
Generate rough 3D model of class

Parts created over 3D model


Overview
Probability distribution
Refinements
Part layout, instance layout take into account
3D position
Refinements
New term: Instance cost
Instance Cost
Eliminates false positives
LayoutCRF: object-background cost
Explain multiple groups with one instance
Refinements
New term: Instance appearance
Instance appearance
Learn color distribution for each instance
Separate groups of pixels: definitely object,
definitely background
Use these to learn colors
Apply cost to non-standard-color pixels

This would fail


Implementation Details
Parts are learned separately for each 45o
viewing range, and for different scales
Instance layout is also discretized by viewpoint
Results Comparison to LCRF
A little better
(+ 8% recall)
BUT they
actually turn off
3D information
for this
comparison
Better
segmentation
Results PASCAL 2006
61% precision-recall
Previous best: 45%
But, reduced test set
Without 3D: -5%
Without color: -5%
Thoughts
Color, instance costs very nice
Shoehorns LCRF into 3D without much success
LCRF is already somewhat viewpoint-invariant:
segments can stretch
3D Generic Object Categorization, Localization
and Pose Estimation

Silvio Savarese Fei-Fei Li


University of Illinois at Princeton University
Urbana-Champaign
Introduction
What does it try to solve?
Multiclass pose-invariant, scale-invariant object
recognition
Does it work?
Not well. But it may be due to implementation
Why is it interesting?
Attempt learn actual 3D structure of an object
Interesting data structure for 3D info
Overview Data Structure
Decompose object into large parts; find canonical view
Relate parts by mutual appearance
Related Work Aspect Graphs

Aspect graph
of a cube:

Image [Khoh & Kovesi, 99]

Represent stable views rather than parts


Data Structure for Cube

Top

Back Left Front Right

Bottom
Related Work
Constellation models

vs.

Similar, but wraps around in 3D


Implementation Links
Link from canonical PI to PJ consists of

Matrix defines transformation to observe


PJ when PI is viewed canonically
AIJ is skew, tIJ is translation
Implementation Links

HIJ
Part J
canonical view
Part I
canonical view
Implementation Links

HJI

Part I
canonical view
Part J
canonical view
Overview
Learn data structure from images
(unsupervised)
Apply to new image by recognizing parts and
selecting model that best accounts for their
appearances
Implementation Learning Parts
Tricky implementation!
Part = collection of SIFT features
For each pair of images of the same
instance:
1. Find set M of shared SIFT features
2. RANSAC M to find a group of pairs
that transform together
3. Group close-together parts of M
into candidate parts
Background: What is RANSAC?
Finds subset of data that
is accounted for by some
model; ignores outliers
1. Guess points
2. Fit model
3. Select matching points
4. Calculate error
Repeat!
RANSAC
In our case: find points for which a
homographic transformation of
the points in image I yield the
points in image J
Implementation Canonical Views
Goal: front-facing view of part
Construct directed graph
Direction means more front-facing
Traverse to find canonical view

How to go from pairwise-defined to graph?


Implementation
Upshot: a collection of
parts with canonical
views and links
Recognizing a New Image
1. Extract SIFT features
2. Use scanning windows to get 5 best canonical
part matches
3. For every pair of found parts, for each model,
score how well the model accounts for their
relative appearances
4. Select the model with the best score
Results
Not stellar
New test set
Overfit?
Comparison?
Results
Thoughts
Low performance may make it useless as a
system, but the data structure is very nice
Implementation has a lot of tricky parts
Doesnt seem to select great canonical parts
I wonder if theres a simpler way
Are SIFT features the right choice?
Extremely Confusing Figure

Each dashed box indicates a particular view. A


subset of the canonical parts is presented for each
view. Part relationships are denoted by arrows.
Overall Conclusions
3D is just starting out. Doesnt work too well
right now, but neither did MV at the
beginning.
LayoutCRF:
Nice method to learn 2D patches
3D Object Categorization:
Nice conceptual model relating 3D parts
Possible to combine strengths of both?

You might also like