Professional Documents
Culture Documents
CS/ECE/ME 539
Professor Hu
UW-Madison
MWF 1:20p
David A. Gerasimow
The Design and Implementation of a Dynamic Data MLP to
Predict Motion Picture Revenue
Table of Contents
Introduction: Preface, Past Research, Improvements Over
Past Research
Initial Data Collection
Data Collection Improvements, Data Encoding
Pre-analysis of Data, Development of the Dynamic Data
Neural Network, Step 1 of the UpdateWizard:
Downloading, Step 2 of the UpdateWizard: Updating
Step 3 of the UpdateWizard: Creating Training and Testing
Files, Development of the MLP using Dynamic Data
Using the Dynamic Data MLP, Choice 1 of moviesbp.m,
Choice 2 of moviesbp.m
Figure 1: DataExtractor Screenshot
Figure 2: DataConcatenator Screenshot
Figure 3: DataConverter Screenshot: Films Removed
From Data File
Figure 4: DataConverter Screenshot: Films to be Updated
Figure 5: Results of preanalysis.m
Figure 6: UpdateWizard Screenshot Step 1, Figure 7:
UpdateWizard Screenshot Step 2
Figure 8: UpdateWizard Screenshot Step 3, Figure 9:
NewMovie Screenshot
Discussion of Results
Bibliography
VB Source Code
3
4
5
6
7
9
10
11
12
13
14
17
18
19
20
21
Preface
For the last century, film has been one of the American publics favorite entertainment
mediums. Large production companies often spend hundreds of millions of dollars to
create a single film. However, the amount of money spent on creating a film seems to
have little bearing on its success. The Blair Witch Project, for instance, was made for
under one million dollars, but it made over twenty-nine million dollars in its first
weekend in the box office. On the other hand, Waterworld, starring superstar Kevin
Costner, cost roughly one-hundred and seventy-five million dollars to produce, but made
back less than half of that amount in domestic box office revenue.
Predicting how much a movie will earn in opening-weekend box office revenue is a
notoriously difficult thing to do. There are many subjective aspects of a movie. In
addition, public taste changes quickly and unpredictably. Developing a mathematical
formula to predict how much a film will make will allow production companies to
maximize profit and skip film development projects that will hurt their profit margins.
Past Research
In CS/ECE/ME 539, in the fall semester of 2001, a student attempted to predict the
opening weekend box office revenue of a given film using an artificial neural network.
He claimed that an accurate prediction of how much a movie will gross in total can be
achieved by examining its opening weekend. They are proportional to each other. If a
film has a huge opening weekend, it is likely to earn a lot of money in the long run. His
logic is correct, and I will use it again in this project.
The networks inputs are the films characteristics, such as genre, rating, runtime, etc.
Despite his thorough work, there are deficiencies in his project. This project will be a
major improvement over his results. Namely, it will produce higher correct classification
results; while, at the same time, it will allow future users to easily update the data files.
The neural network will, over time, accumulate more and more training data. A major
component of this report is developing what I call a dynamic data neural network. The
training data is automatically updated weekly. Instead of the project ending with the end
of the semester, future classes will be able to easily update this projects results, and the
networks correct classification rates will improve over time.
Data Encoding
The data contained in movies.txt is the final data file after the procedures described above
have been followed. Many features of a film are not numerical. As such, I created an
encoding scheme that allowed the non-numerical data fields to be useful to the multilayer perceptron.
Genre
Action
Comedy
Drama
Family
Horror
Mystery
Animation
Romance
Sci-Fi
Thriller
Western
Rating
20
21
22
23
24
25
26
27
28
29
30
Distributor
-5
PG
-4
PG-13
-3
Sony
Universal
Warner Brothers
Fox
New Line
Buena Vista
Paramount
MGM/United Artists
MGM
DreamWorks
Miramax
TriStar
1
2
3
4
5
6
7
8
9
10
11
12
-2
Columbia
Artisan
Polygram
USA Films
Orion
13
14
15
16
17
= .1
52.1739
53.8753
48.2246
50.3491
52.4582
52.3498
49.3014
52.9586
Mean
51.461
Std
1.9544
= .3
= .5
= .7
= .1
= .3
= .5
= .7
= .1
= .3
= .5
= .7
= .1
= .3
= .5
= .7
=.
3
=.
5
=.
7
54.3248
50.3244
52.3487
49.0239
50.3249
50.8708
53.9238
50.1239
49.0973
51.3094
50.3496
43.2308
44.4039
46.3897
44.1230
52.3408
54.3324
57.2349
51.0235
49.9929
51.3047
52.377
47.1344
50.8917
50.3410
51.2508
46.0943
42.054
44.8724
46.2347
50.3140
55.3398
54.1437
50.1203
51.3418
49.3140
55.0431
46.1238
51.8187
51.5209
51.2540
43.0529
46.4289
47.0251
45.3047
51.6512
52.3202
55.2304
50.7654
53.1238
49.0219
52.2908
48.0283
50.2398
51.5141
52.0295
43.2308
47.0319
46.1421
49.0329
49.0293
53.4839
52.2095
52.5478
51.2095
54.3014
52.5438
49.8721
48.2102
52.2984
50.0493
47.4230
49.1098
48.2437
50.0987
51.6612
49.9238
58.3094
49.3140
51.5637
52.0237
54.0324
45.2938
48.1234
52.2085
54.0321
44.0132
46.4877
50.1239
49.8713
50.2323
53.3204
56.4205
51.5036
50.3140
53.2878
54.1487
43.1734
50.3209
53.3407
55.0132
46.3209
45.4827
49.3409
47.0929
52.8437
54.8942
53.3657
51.1126
50.4327
53.2049
55.8714
50.1919
52.5407
50.4310
49.0829
47.4980
47.0121
48.2138
50.1319
51.550
52.992
54.908
50.676
51.038
51.666
53.779
47.493
50.155
51.621
51.633
45.108
46.001
47.544
47.736
1.6750
2.0092
2.2731
1.1589
1.0170
1.9033
1.3041
2.5537
1.6065
0.9953
2.0102
1.9267
2.0897
1.7537
2.3663
Mean and standard deviation of eight trials to determine optimal number of hidden layers,
HL. Selected number of hidden layers in bold.
Trial
HL=1
HL=2
HL=3
1
56.3094
54.1708
53.4308
2
57.9541
52.3407
52.5499
3
58.3420
49.1203
53.0023
4
57.4092
51.8274
48.2348
5
54.2231
53.3479
50.3299
6
55.3209
52.5027
51.4390
7
58.3428
53.0523
53.2581
8
57.9112
49.1042
50.2109
Mean
56.977
51.933
51.557
Std
1.536
1.8776
1.8458
Mean and standard deviation of eight trials to determine optimal number of hidden nodes
in the single hidden layer. Selected number of hidden neurons in bold.
Trial
H=2
H=4
H=6
H=8
1
48.2351
50.3460
54.5407
52.2308
2
49.4239
50.0148
53.4509
54.3498
3
47.3402
48.0129
57.2098
52.1487
4
45.4223
52.5489
55.2340
51.5023
5
47.3100
51.0253
54.2390
50.2530
6
47.1236
50.0544
53.5409
52.4879
7
45.9810
52.2587
56.4308
53.3205
8
46.3498
53.3980
55.2107
55.3991
Mean
47.148
50.957
54.982
52.712
Std
1.2763
1.7294
1.328
1.6205
The mean and standard deviation values were computed using alphamu.m, hl.m, and h.m.
These MATLAB files read data from files stats_am.txt, stats_h.txt, and stats_hl.txt.
Based on the above trials, the MLP configuration is as follows:
Learning Rate
0.1
Momentum Constant
0.7
Number of Hidden Layers
1
Number of Hidden Neurons
6
Maximum Number of Epochs
5000
Samples Per Epoch
64
Scaling of Input
[-5,5]
After using the UpdateWizard to update the data file and create training and testing files,
the dynamic data MLP is ready to use. From the MATLAB prompt, run moviesbp.m by
entering moviesbp. Note that moviesbp.m requires many support m-files that are not
included in the *.zip file. They are, however, available for download from the
CS/ECE/ME 539 website http://www.cae.wisc.edu/~ece539/fall03/index.html in the
section entitled MATLAB Files Used in the Class.
When moviesbp.m begins, the user has two choices.
Choice 1 of moviesbp.m
Choice 1, or Predict the Revenue of a Newly Released Film allows the user to test the
dynamic data MLP on a new movie. The user must first, however, run the Windows
application called NewMovie (newmovie.exe). This program, developed using Visual
Basic 6.0, provides a graphical user interface that lets the user enter the characteristics of
a new film. The NewMovie program then creates a file called testsinglemovie.txt based
on the entered characteristics.
Once the output file is created, moviesbp.m can be run. After selecting option one, the
MLP is trained per the users instructions. Then, the MLP is tested using the film
information contained in testsinglemovie.txt. Finally, moviesbp.m classifies the movie
and predicts its revenue. Depending on the classification scheme the user chose earlier,
the film is classified. Consult classes.txt for a description of classification schemes.
A screenshot of the NewMovie in action can be found on page 18 (fig. 9).
The source code for NewMovie can be found on pages 21+.
Choice 2 of moviesbp.m
Choice 2, or Simply Train and Test the MLP, allows the user to train and test the
dynamic data MLP. It runs much like Professor Hus bp.m. Neuron weights can be
found in the variable w. It also outputs the confusion matrices and classification rates for
the training and testing datasets.
Discussion of Results
The classification rates of the dynamic data multi-layer perceptron are in the range from
fifty-four to fifty-nine percent. This is roughly a four percent improvement over a similar
project performed in the fall of 2001. As discussed in the introduction, predicting the
box-office success of a film is difficult to do. As such, the MLP classifies films correctly
more than half of the time. This is a good result because it occurs when there are four
classes. If the MLP did not perform better than random classification, its classification
rates would be around twenty percent. This project is a success because I improved upon
past results.
Moreover, the most interesting aspect of the project was the UpdateWizard. The MLP
developed can easily be retrained to data that is constantly changing. The UpdateWizard
makes this entire process easy and seamless. It is my hope that over time the wizard will
accumulate more and more data which will cause correct classification rates to further
improve. The UpdateWizards functionality is better than I originally expected. It is
fairly easy to use and rarely makes mistakes. As such, this component of the project is a
success, especially because no CS/ECE/ME 539 students have attempted such an
application of neural networks in the past.
Bibliography
Film Industry:
Rand, Philip A Guide to the Film Industry London: Emerald, 2003.
Visual Basic References:
David, Harold Visual Basic 5 Secrets. Foster City, CA: IDG Books Worldwide, 1997.
Mansfield, Richard The Visual Guide to Visual Basic for Windows: The Illustrated, PlainEnglish Encyclopedia to the Windows Programming Language Version 3.0, 2nd
Edition. Chapel Hill, NC: Ventana Press, Inc., 1993.
Neural Networks:
Haykin, Simon Neural Networks: A Comprehensive Foundation, 2nd Edition. Upper
Saddle River, NJ: Prentice-Hall, Inc., 1999.
Neelakanta, Perambur S., ed. Information-Theoretic Aspects of Neural Networks. Boca
Raton, FL: CRC Press, 1999.