You are on page 1of 3

Reporte de trabajos cientficos.

Neural networks: building high-leave features using unsupervised learning.


Wilson R. Tingo Y. tyn20@hotmail.com

State of art
Recent studies observe that it is uiet time intensive to train deep learning algorithms. !t supposes that the long training time is partially responsible for the lac" of high#level features reported in the literature. $or e%ample& researchers typically reduce the si'es of datasets and models in order to train net(or"s in a practical amount of time& and these reductions limited the learning of high#level features. This problem is resolve by scaling up the core components involved in training deep net(or"s) the dataset& the model& and the computational resources. $irst& (e use a large dataset generated by sampling random frames from random YouTube videos.

ABSTRACT: The unsupervised learning allows building high-leave features from only unlabeled data. For example is it possible to learn the features from face (face detector) using only unlabeled images? The answer this, the neural networ trained is composed of several layered locally connected, the model has ! billion connections, the dataset has !" million #""x#"" pixel images downloaded from the $nternet. This networ using model parallelism and asynchronous %&'(system gestion data ) on a cluster with !,""" machines for three days. (ur experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. $t networ is sensitive to other high-level concepts such as cat faces and human bodies. KEYWOR S) artificial neural net(or"s& artificial intelligence& face detector* unsupervised learning. $igure 0. Thirty randomly#selected training images 1sho(n before the (hitening step2.

! "N#RO $%#"ON
The focus of this (or" is to build high#level& class#specific feature detectors from unlabeled images. This (or" investigates the feasibility of building high#level features from only unlabeled data. +ut perhaps more importantly& it ans(ers an intriguing uestion (hether the neuron could learn from unlabeled data. ,nsupervised feature learning and deep learning are methodologies in machine learning for building features from unlabeled data. ,sing unlabeled data in the (ild to learn features is the "ey idea behind the self#taught learning frame(or".

3 subset of training images is sho(n to chec" the proportion of faces in the dataset* (e run an 4pen56 face detection on 70%70 randomly#sampled patches from the dataset

) '*(OR"#+,S
The architecture and parameters in one layer of the net(or".

& #R'"N"N( SE# %ONS#R$%#"ON


T-. training dataset is constructed by sampling frames from /0 million YouTube videos. To avoid duplicates& each video contributes only one image to the dataset. .ach e%ample is a color image (ith 200%200 pi%els.

Reporte de trabajos cientficos.

3fter training& it used this test set to measure the performance of each neuron in classifying faces against distracters.

2.). RE%O(N"#"ON
;urprisingly& the best neuron in the net(or" performs very (ell in recogni'ing faces& the best neuron in the net(or" achieves </.0= accuracy in detecting faces in relation to other.

2.2. 5isuali1ation
!n this section& (e (ill present t(o visuali'ation techni ues to verify if the optimal stimulus of the neuron is indeed a face. The first method is visuali'ing the most responsive stimuli in the test set. ;ince the test set is large& this method can reliably detect near optimal stimuli of the tested neuron. The second approach is to perform numerical optimi'ation to find the optimal stimulus $igure /.The overall net(or" replicates this structure three times. $or simplicity& the images are in /8. $igure 9& confirm that the tested neuron indeed learns the concept of faces.

).! 'r-hite-ture
The algorithm is built upon these ideas and can be vie(ed as a sparse deep autoencoder (ith three important ingredients) local receptive fields& pooling and local contrast normali'ation. 4ur deep autoencoder is constructed by replicating three times the same stage composed of local filtering& local pooling and local contrast normali'ation. The output of one stage is the input to the ne%t one and the overall model can be interpreted as a nine#layered net(or" 1see $igure /2.

).). *E'RN"N( 'N

O.#","/'#"ON

*earning: 8uring learning& the parameters of the second sublayers 1-2 are fi%ed to uniform (eights& (hereas the encoding (eights W/ and decoding (eights W2 of the first sublayers are adjusted using the optimi'ation problem. Opti0i1ation: 3ll parameters in model (ere trained jointly (ith the objective being the sum of the objectives of the three layers

2. E3.ER",EN#S ON 4'%ES
!n this section& (e describe our analysis of the learned representations in recogni'ing faces.

2.!. #ES# SE#


The test set consists of 90&000 images sampled from t(o datasets) :abeled $aces !n the dataset.

2.&. E3.ER",EN#'* .RO#O%O*S

$igure 9. Top) Top >< stimuli of the best neuron from the test set. +ottom) The optimal stimulus according to numerical constraint optimi'ation.

Reporte de trabajos cientficos.

6 %ON%*$S"ON
!n this (or"& (e simulated high#level class#specific neurons using unlabeled data. We achieved this by combining ideas from recently developed algorithms to learn invariances from unlabeled data.

7 Referen-es
,sar formato 3?3* ejemplos
@/A http)BBresearch.google.comBpubsBpub9<//C.html @2A http)BBar%iv.orgBabsB///2.720D @9A

http://arxiv.org/pdf/1112.6209.pdf

You might also like