You are on page 1of 19

Abstract

The mens NCAA tournament is a yearly event in which teams representing 64 colleges and
universities play for the right to be crowned champion. Many Americans try to pick the winners
of these games for office pools and friendly competitions. This analysis attempts to utilize
regular season data from the teams who participated in the 2015 NCAA tournament in order to
correctly pick winners for first round games in 2016 NCAA tournament. By utilizing discriminant
analysis, principal component analysis, and stepwise selection, and this analysis results in a
method which correctly selects the result of 18 out of the 24 first round games considered in the
2016 NCAA tournament.
Introduction
Every year in March, the eyes of the sporting world turn to the mens NCAA basketball
tournament. The tournament is single elimination (lose one game and go home) and consists 64
collegiate teams The opening round of the tournament features 32 games spread out over the
course of two days in what is the ultimate exercise in sports binging. While NCAA mens
basketball popularity lags behind its professional counterpart, the NBA, the tournament is wildly
popular. The reason for this is simple: brackets.
In a recent survey, over half of adult working Americans participate in at least one betting pool
for the NCAA tournament. Participants try to predict the outcome of the tournament by picking
the winner of each game, and the person with the best predicted bracket wins. Essentially,
Americans love the tournament because they have a stake in every game. In addition, the
NCAA tournament is popular because of its seeming unpredictability. The tournament is rife with
upsets which leave observers asking, Who would have guessed that?
This analysis is intended to find variables that matter and to create a system for effectively and
correctly picking the outcome of games in the first round of the tournament. We hypothesize that
a successful method will be determined. Furthermore, we also hypothesize that the number of
players who score 10 points per game, RPI, rebound differential, and free throw percentage will
be among the important factors.
Data
This project utilizes two data sets: one with regular season data from teams that made the 2015
mens NCAA tournament and another with the same variables from teams that made the 2016
mens NCAA tournament. In the tournament, the bracket is broken up into four regions each
consisting of 16 teams. In each region, the teams are seeded from 1 to 16, where the 1 seed is
theoretically the best team and the 16 seed is theoretically the worst team. The first round
matchups consist of the 1 seed v. 16 seed, 2 seed v. 15 seed, 3 seed v. 14 seed, and so on. To
date, a 16 seed has never beaten a 1 seed, and it is rare for a 15 seed to beat a 2 seed. As
such, the data sets contain no information on the 15 or 16 seeds in the tournament. So, each

data set contains regular season data on the top 56 teams in the tournament for its respective
year.
The first part of the analysis utilizes data for each individual team as well as coding for which
teams won their first round game. The analysis then uses that coding in an attempt to predict
which teams will win their first round game. However, when attempting to pick the winner of a
basketball game, it is important to consider both teams playing the game. As a result, the
second part of the analysis modifies the data set to include data regarding both teams playing in
each game. To do so, each game or matchup has one observation which includes the same
variables. The value of each variable represents the difference between the two teams playing.
Essentially, the data vector for the non favored team is subtracted from the data vector for the
favored team. For example, in 2015, Oklahoma won 22 games and Albany won 24 games.
Oklahoma is the 3 seed and Albany is the 14 seed. Subtracting Albanys value for Wins from
Oklahomas value for Wins results in the value for Wins in the new data set for the observation
for their first round game. So the value for Wins in the game OuAlb is -2 because 22-24=-2.
Each game is then coded as either an upset or non upset. Because the original dataset does
not include the bottom 8 seeds in the tournament, 8 of the 32 first round games are not included
in the second part of the analysis. The variables are as follows:
Team

School Name

Wins

Total regular season wins On T

RoadWins

Total regular seaons wins occuring on the opponent's home court

RPI

A computerized ranking system where 1 is the best and 353 is the worst

WARPI50

Number of wins against the top 50 teams according to RPI

WARPI100

Number of wins against the top 100 teams according to RPI

RWARPI50

Number of wins on the road against the top 50 teams according to RPI

RWARPI100

Number of wins on the road against the top 100 teams according to RPI

WL10

Number of wins in the last 10 games

BIGS

Number of players taller than 6' 10"

FRESH

Number of players who are freshmen

SENIOR

Number of players who are seniors

MEANHT

Mean height of players on the roster in inches

PPG

Mean number of points scored

OPPG

Mean number of points scored by opponents

DPPG

Mean difference between points scored and points scored by opponents

_10PT

Number of players who average 10 points per game

_50Pct

Number of players who make at least 50% of their shot attempts

_40Pct3

Number of players who make at least 40% of their 3 point shot attempts

FGPCT

Field goal percentage for the whole team

FTPCT

Free throw percentage for the whole team

_3PCT

3 point field goal percentage for the whole team

TODIFF

Mean difference between the number of turnovers commited by the team and its

opponents
TOPG

Mean number of turnovers per game

ASTPG

Mean number of assists per game

REBDIFF

Mean difference between the number of rebounds for the team and its opponents

Analysis
Discriminant analysis is a multivariate technique that utilizes a known classified sample to make
a linear discriminant functon in order to discriminate an unknown sample. In this analysis, 2015
mens NCAA tournament dataset is the known sample used to discriminate data in 2016 mens
NCAA tournament dataset.
Table 1: Classification Summary for NCAA2016

Table 1 shows the classification summary for 2016 mens NCAA tournament. We define 0 as a
team which lost in the first round and 1 as a team which won in the first round. With the prior
probability of 0 as 0.42 and of 1 as 0.58. The estimated misclassification rate shown in the table
is 37.19%, and the observed misclassification rate is 37.5% which means discriminant analysis
correctly classifies 35 out of 56 team.
Using all the variables in the sample to make a linear discriminant function would take a
considerable amount of time to calculate and could possibly lower the precision. Sometimes it is

also hard to complete the calculation because of the correlations among these variables. Also,
those variables with lower discriminant ability might have an interference effect which hinders
correctly making a discriminant function. Thus, selecting the variables with only high
discriminant ability to analyse is quite necessary.
It is known that there are three main methods to select the variables: Forward selection,
backward selection, and stepwise selection. In this research, Stepwise selection is used to
produce the following table.
Table 2: Stepwise Procedure for 2015 mens NCAA tournament

Table 2 shows variables of RPI, _10PT, BIGS, TOPG and RWARPI100 are significant and have
a high discriminant ability. In this table, p-value of RPI is rather small compared to the others,
which indicates that RPI is the most important factor to discriminate data.
Table3: Classification Summary for NCAA2016 after stepwise procedure.

Table 3 shows the classification summary for a discriminant analysis using only those variables
indicated by the stepwise selection. Using prior probabilities of 0.42 and 0.58 for 0 and 1

respectively, both the estimated and observed misclassification rate is 35.7%. Compared to the
misclassification rate before the stepwise procedure, the misclassification rate for the
discriminant analysis using the variables indicated by the stepwise function is better by 1.49%.
Table 4: Classification Summary for matchup2016

As discussed in the data section, the second part of the analysis considers matchups as
opposed to individual teams. Table 4 shows the classification Summary for matchups in 2016. In
this table fav means the favorite won the game, which represents the team with the lower seed
(or the team who is supposed to be better) won the game. On the other hand, up means the
game was an upset, which represents the team who was supposed to be the worse team won
the game. Using prior probabilities of 0.6 and 0.4 for fav and up, respectively, the estimated
misclassification rate shown in the table is 31.67%, though the true misclassification rate is
33%. In short, the 16 of 24 matchups are correctly classified as up or fav.

As in the first part of the analysis, the stepwise selection procedure is used in order to exclude
the variables with low discriminant ability and focus on the variables with high discriminant
ability.

Table 5: Stepwise Procedure for 2016 mens NCAA tournament

Table 5 shows that in the stepwise procedure, the variables RWARPI100, REBDIFF, BIGS,
_10PT, DPPG, and _50PCt are significant and have a high discriminant ability. In this table, pvalue of RWARPI100 is rather small compared to the others, which indicates that RWARPI100
might be the most important factor to discriminate data.
Table 6: Classification Summary for MATCHUP2016 using variables indicated by the stepwise selection

Table 6 shows the classification summary for a discriminant analysis using only those variables
indicated by the stepwise selection in matchups for the 2016 dataset. After the stepwise
procedure, discriminant analysis is used without defining any prior probability. The estimated
misclassification rate shown in the table is 28.33%, while the observed misclassification rate is
29.17%. Compared to misclassification rate before stepwise procedure, this is better by about
4%.

Table 7: Summary of Principal Components

In an attempt to reduce the number of variables and possibly decrease misclassification rate, a
principal component analysis was performed on the six variables found to be most significant by
the stepwise procedure. The first four principal components were selected for discriminant
analysis, accounting for approximately 85% of variation in the data.
Table 8: Description of Principal Components

Principal Component 1

Principal Component 2
Principal Component 3
Principal Component 4
Principal Component 5
Principal Component 6

Heavily weights scoring differential, number of players scoring 10


points per game, and number of players shooting better than 50%,
thus measuring offensive ability
Heavily weights number of wins against point differential and rebound
differential
Heavily weights rebound differential and number of tall players, which
combined may be a measure of a teams defensive abilities
Heavily weights the negative correlation between player height and
shooting accuracy and the remaining measurements
Heavily weights the number of players who average 10 points per
game against the remaining measurements
Heavily weights point differential against the remaining
measurements

Table 9: Discriminant Analysis on PC1-PC4

A discriminant analysis using these four principal components yields an estimated


misclassification rate of about 55.33% and an observed misclassification rate of 50%. Seven
favorites were classified as upsets, and five upsets were classified as favorites. Therefore, this
model is insufficient in predicting the 2016 results and further analysis is needed. In order to
determine if the principal components can be useful in this analysis, a stepwise selection based
on discriminant ability was conducted on the principal components.
Table 10: Stepwise Selection Summary for the Six Principal Components of the Six Variables Yielded by the Previous
Stepwise Selection

As indicated in Table 10, the second, fourth, and sixth components were found to be most

significant and were selected for further analysis. These three principal components were used
to conduct a discriminant analysis. The results are displayed in Table 11 below.
Table 11: Discriminant Analysis on PC2, PC4, and PC6

Using the second, fourth, and sixth principal components results in a misclassification rate of
only 25%. This is a significant improvement over the model with four principal components. In
fact, this is the lowest misclassification rate found in this analysis, and we venture to say this
model is useful in predicting the 2016 results.

Table 12: Misclassified Games Result is what actually happened. _INTO_ is how the discriminant
analysis classified the game. Misclassified games are highlighted.

The model misclassified 3 upsets as favorites and 3 favorites as upsets. Although a total of six
games were misclassified, this model accurately measures 75% of the game results. Therefore,
it is a significant improvement of the baseline model of selecting only favorites, which is only
about 50% accurate.

Plot 1: Each 2006 matchup plotted with principal component 6 against principal component 2

Plot 2: Each 2006 matchup plotted with principal component 6 against principal component 4

Cleary, the sixth principal component separates the model into two groups, upsets and favorites.
This component is therefore the main indicator of who will win, since it measures primarily mean
difference of scores between opposing teams.
OrstVcu (Oregon State v. Virginia Commonwealth University) and TechBut (TexasTech v. Butler)
are outliers with respect to the fourth and sixth principal components. This makes sense since
principal component 4 measure number of players taller than 610, and these matchups each
had 5 such players, more than any other game. Additionally, the sixth component measures
scoring differential, and these matches had low values for this variable.
Conclusion
This analysis was intended to determine a method for effectively and correctly selecting winners
for games in the first round of the mens NCAA tournament, and that intent was fulfilled. By
utilizing determinant analysis, principal component analysis, the stepwise selection procedure
and considering data from the 2015 tournament, the result was correctly selected for 18 of 24
first round games in the 2016 tournament, a 75% success rate. This method outperforms what
is widely considered as the baseline selection process (which merely consists of selecting only
favorites) by greater than 20%, as favorites won 13 of the 24 games in 2016. Additionally, the
variables which matter when selecting the winners are: road wins against the RPI top 100,
rebound differential, number of players who average 10 points per game, number of players
who shoot 50% from the field, number of players on the roster taller than 6 10, and scoring
differential. Basically, in order to win a first round game, it helps if a team is better at scoring,
rebounding, defending, and is taller than the other team. Several of these variables are those
posed in the hypothesis, and the hypothesis that a successful method could be found was
correct.
Future analyses and studies should include a broader range of variables. Most variables used in
this analysis are traditional offensive statistics. Defensive statistics as well as newer metrics
such as points per possession could increase accuracy. It is also possible that using data from
multiple years instead of one year could increase accuracy, so future analyses should include
data from multiple years. Also, this analysis only considered first round games. In order to
develop a more complete understanding of the NCAA tournament, further studies should
explore methods for picking winners in the second round and beyond.

Appendix
Code:
/*Original Dataset from 2015*/
data ncaa;
input Team$ 1-11 Wins RoadWins RPI WARPI50 WARPI100 RWARPI50 RWARPI100 WL10 BIGS FRESH SENIOR
MEANHT PPG OPPG DPPG _10PT _50PCt _40PCT3 FGPCT FTPCT _3PCT TODIFF TOPG ASTPG REBDIFF TWins;
cards;
Kentucky
34 10 1 6 12 2 8 10 4 4 3 77 74.8 54 20.8 2 4 2 0.468 0.721 0.344 3.4 10.5 14.7 7.4
4
Wisconsin
31 10 4 9 19 2 5 9 2 5 4 78 71.9 56.1 15.8 3 3 2 0.48 .763 0.357 2.6 7.4 12.7 6 5
Villanova
32 9 2 13 17 4 4 10 1 2 3 77 76.3 60.9 15.4 4 2 2 0.47 0.727 0.389 3.4 10.9 15.9 2.3
1
Duke
29 9 6 11 17 5 6 9 2 4 2 77 80.6 65.6 15 4 2 2 0.502 0.691 0.386 1.3 11.2 15.5 6.2 6
Kansas
26 5 3 12 18 3 3 6 2 6 1 77 71.2 64.7 6.5 2 2 2 0.44 0.72 0.375 -1.1 12.8 13.3 3.6 1
Arizona
31 8 5 9 18 2 5 10 3 4 3 77 70.4 58.6 11.8 4 3 0 0.489 0.699 0.36 2.9 11.2 14.2 8.8
3
Virgina
29 11 7 8 13 5 7 8 2 5 3 77 65.3 50.8 14.5 3 3 1 0.463 0.723 0.361 1.1 9.5 12.9 7.8
1
Gonzaga
32 10 8 6 9 2 3 9 4 6 4 77 79.1 60.9 18.2 4 5 2 0.524 0.691 0.408 1 10.6 16.5 7.2 3
NotreDame
29 7 16 7 11 3 3 8 1 4 2 78 78.8 65.6 13.2 5 5 4 0.51 0.72 0.392 1.9 9.4 15.1 0.1 3
Baylor
24 6 10 8 12 2 3 6 0 4 2 76 69.5 60.3 9.2 4 1 1 0.434 0.67 0.377 -0.3 12.5 14.6 8 0
Oklahoma
22 6 18 12 12 4 4 7 1 4 3 76 71.9 62.8 9.1 4 2 1 0.436 0.735 0.343 1.7 12.1 12.5 1.1
2
IowaSt
25 5 9 13 16 4 4 8 1 3 3 76 78.4 69.3 9.1 5 4 3 0.48 0.696 0.366 1.9 11 16.2 1.2 0
Maryland
27 6 13 5 10 2 3 8 2 5 8 78 69.5 63.2 6.3 3 1 2 0.438 0.757 0.372 -0.8 12 10.9 1.5 1
UNC
24 7 11 7 11 1 2 6 2 3 4 77 77.9 68.4 9.5 4 3 0 0.475 0.7 0.345 -0.3 12.7 17.7 8.2 2
Louisville
24 15 21 3 9 0 2 5 4 8 2 77 69.2 59.5 9.7 4 2 0 0.429 0.66 0.304 2.7 11.7 11.8 3 3
Gtown
21 5 25 4 8 1 2 6 2 5 5 78 70.7 64.6 6.1 2 2 2 0.455 0.702 0.347 0.9 12.6 13.2 2.8 1
WVirginia
23 5 24 7 10 1 2 5 1 3 4 76 73.9 66.8 7.1 2 0 0 0.412 0.66 0.318 6.6 13.1 14.5 3.6 2
Arkansas
26 7 20 4 12 2 4 7 2 3 2 77 78 70.1 7.9 2 2 1 0.447 0.724 0.35 4.2 11.7 16.1 0.4 1
NIowa
30 9 14 3 8 1 2 9 0 4 5 76 65.4 54.3 11.1 1 1 5 0.483 0.726 0.397 0.8 10.5 11.9 2.4
1
Utah
24 6 19 3 7 1 1 6 2 6 3 77 72.1 56.9 15.2 3 3 3 0.485 0.699 0.404 0.2 11.4 14.3 5.1
2
Butler
22 7 31 6 8 2 3 6 0 3 4 76 69.6 61.2 8.4 3 2 1 0.439 0.68 0.358 1.2 11.4 11.5 6.5 1
Xavier
21 5 30 8 11 3 3 6 5 6 3 77 73.6 67.7 5.9 3 2 1 0.473 0.725 0.349 0.9 12.1 16.4 3.6
2
Providence
22 5 22 6 13 2 3 5 2 7 3 78 70.2 65.5 4.7 3 0 1 0.442 0.71 0.31 1.2 11.8 14.1 3.9 0
SMU
27 9 12 5 13 2 5 9 2 0 7 77 69.4 59.8 9.6 5 5 3 0.479 0.705 0.359 0.2 12.6 15.5 6.9
0
Wichita
28 10 17 2 7 0 2 9 2 7 2 76 69.7 55.8 13.9 4 3 0 0.446 0.688 0.357 3.9 9.4 13.9 5.3
2
VCU
26 8 15 6 14 1 4 7 0 5 3 76 72.5 62.5 10 2 2 1 0.42 0.656 0.342 5.5 10.6 12.6 -1.3 0
MichiganSt
23 7 23 4 10 1 4 7 0 4 3 76 71.9 63.4 8.5 3 4 2 0.471 0.633 0.386 -0.6 11.6 17.1 6.8
4
Iowa
21 7 43 4 9 2 5 7 2 4 4 77 69.4 61.9 7.5 2 1 0 0.427 0.745 0.332 1.2 11.3 14.4 4.2 1
Cincinnati
22 6 37 6 8 3 3 6 2 4 2 77 62.4 55.3 7.1 1 2 1 0.453 0.674 0.333 -0.6 12.8 11.9 5.1
1
Oregon
25 5 27 3 9 0 2 8 2 6 4 76 75.6 70.7 4.9 3 2 1 0.461 0.76 0.36 -0.2 11.8 14.2 1.4 1
NCSt
20 5 39 4 10 2 2 6 0 4 3 77 70.4 65.4 5 3 1 1 0.437 0.683 0.345 -0.5 10.5 11.6 3.3 2
SDSU
26 7 26 4 7 0 1 8 2 4 4 77 63.8 53.1 10.7 3 2 1 0.419 0.629 0.32 1.9 11.4 10.6 3.2 1
Purdue
21 4 55 4 9 0 1 6 2 5 2 77 70 64.5 5.5 2 3 0 0.453 0.685 0.335 -1.4 13.1 14.9 4.5 0
OklahomaSt
18 4 49 6 5 3 4 4 2 5 6 76 67.3 62.3 5 3 1 0 0.439 0.726 0.35 1.5 12.3 11.5 -2 0
LSU
22 8 57 3 12 2 5 6 4 4 2 77 73.7 67.7 6 4 3 0 0.456 0.689 0.339 -0.9 14.6 15.7 2.3 0
StJohn
21 4 44 5 9 2 3 7 3 6 5 77 71.2 67.6 3.6 4 1 0 0.441 0.692 0.353 2.5 10.7 12.6 -3.7
0
Indiana
20 3 61 4 8 0 1 4 1 7 0 78 77.5 71.4 6.1 3 2 4 0.466 0.715 0.403 -0.7 11.5 14 3.2 0
OhioState
23 4 41 1 8 0 1 6 3 5 6 78 75.8 62.4 13.4 3 3 3 0.486 0.678 0.372 3.4 11.3 15.4 2.9
1
Georgia
21 8 38 0 9 0 4 6 0 2 3 76 68.3 64.2 4.1 5 1 2 0.435 0.689 0.344 -1.7 12.8 12.6 4.2
0
Davidson
24 9 35 2 7 0 3 9 1 6 2 76 79.9 69 10.9 4 3 4 0.47 0.711 0.397 2.2 9.6 17.2 0.9 0
Texas
20 4 42 3 8 0 2 5 2 4 2 77 67.9 60.4 7.5 3 2 1 0.436 0.725 0.339 -3.7 12.7 13.2 8.3
0
Mississippi 20 8 60 3 9 2 4 5 1 2 5 78 72.6 67.5 5.1 3 3 1 0.426 0.778 0.338 1.3 11.3 12.8 2.8 0
BYU
25 8 36 1 4 1 1 8 5 6 5 78 83.6 72.6 11 4 1 3 0.467 0.768 0.388 1.4 11.8 16.8 4.9 0
BoiseSt
25 8 40 3 4 1 2 8 3 4 6 77 70.8 60.3 10.5 4 3 5 0.445 0.663 0.394 2.4 10.3 12.1 2.5
0
Dayton
25 5 32 1 6 1 1 7 1 3 2 75 68.2 60.9 7.3 3 2 1 0.471 0.711 0.397 2.1 9.6 17.2 -1.1 1
UCLA
20 2 48 2 5 0 1 6 3 5 2 78 72 68 4 5 1 1 0.441 0.676 0.363 0.7 11.9 13.9 3.9 2

Buffalo
Wofford
Wyoming
SFAustin
0
Valpo
0
Harvard
UCI
EWash
Northeaster
GeorgiaSt
Albany
UAB
;

23
28
25
29

9 28 0 4 0 1 8 2 5 2 76 75 68.3 6.7 4 2 1 0.435 0.722 0.34 2.3 11.3 13.2 3.1 0


10 47 1 3 1 1 9 0 4 3 75 67 59.8 7.2 3 4 1 0.458 0.689 0.376 1.9 10.9 13.4 1.4 0
5 71 5 5 1 1 6 2 5 6 77 61.7 56 5.7 2 3 0 0.462 0.707 0.323 -0.3 11.2 14.3 0.1 0
10 33 0 4 0 2 9 0 3 4 75 79.5 64.6 14.9 2 3 4 0.491 0.734 0.386 3.2 14.1 17.8 5.4

28 8 50 0 3 0 0 9 1 3 2 77 69.8 59.3 10.5 3 1 1 0.459 0.678 0.378 -0.3 12.5 13.1 7.2
22
21
26
23
24
24
19

9 52 0 4 0 1 8 0 3 7 77 64.2 57.2 7 3 3 1 0.435 0.713 0.353 0.9 11.8 12.9 3.7 0


6 88 0 3 0 0 7 4 3 3 78 67.9 62.3 5.6 4 3 3 0.461 0.683 0.39 -0.2 11.6 14.2 1.5 0
11 84 0 1 0 1 7 1 5 3 76 80.8 73.6 7.2 4 2 4 0.48 0.723 0.403 1.5 10.8 13.2 0.2 0
8 86 0 3 0 1 7 0 3 2 77 68.6 65 3.6 4 4 2 0.486 0.725 0.388 -3.7 13.7 14.3 4.8 0
8 53 0 1 0 0 9 1 6 3 75 72 62.2 9.8 3 6 0 0.48 0.727 0.331 4.5 10.7 13.6 -0.6 1
12 99 0 1 0 1 9 1 5 1 76 65.5 60.2 5.3 4 2 1 0.44 0.761 0.36 0.8 11.7 10.4 4.8 0
3 29 1 3 0 0 6 0 6 2 76 68.9 67.7 1.2 1 2 2 0.43 0.741 0.332 -0.3 13.6 14.4 1.6 1

/*Data from 2016*/


data ncaa2016;
input Team$ 1-11 Wins RoadWins RPI WARPI50 WARPI100 RWARPI50 RWARPI100 WL10 BIGS FRESH SENIOR
MEANHT PPG OPPG DPPG _10PT _50PCt _40PCT3 FGPCT FTPCT _3PCT TODIFF TOPG ASTPG REBDIFF round1;
cards;
Kansas
30
7
1
15
20
5
6
10
2
3
4
78
81.6
67.7
13.9
4
5
5
0.494
0.706
0.426
0.4
12.6
15.9
5.5
1
Oregon
28
6
2
10
20
2
5
8
2
2
6
76
78.8
69.1
9.7
4
2
1
0.467
0.723
0.383
2.7
11.6
13.6
3.4
1
UNC
28
6
5
5
15
1
4
8
3
2
6
77
82.3
69.6
12.7
4
3
0
0.479
0.738
0.314
2
10.9
18.1
8.6
1
Virginia
26
5
3
9
16
1
3
7
2
4
5
77
70.4
59.7
10.7
3
4
3
0.487
0.754
0.405
2.3
9.4
14.3
3.8
1
Villanova
29
10
4
7
15
3
6
8
1
4
5
77
77
63.7
13.3
4
3
0
0.467
0.777
0.344
2.9
11.1
16.1
1.6
1
Oklahoma
25
6
6
10
14
2
3
6
2
5
5
77
80.4
70.4
10
4
2
4
0.459
0.728
0.426
-0.4
13
14.5
2.1
1
Xavier
27
8
7
6
12
1
3
7
4
3
4
78
81.3
71
10.3
4
3
1
0.452
0.731
0.362
1.4
12.8
16.4
7.5
1
MichiganSt
29
8
12
8
13
0
2
9
1
5
4
77
79.8
63.4
16.4
3
4
5
0.484
0.73
0.434
-2.3
11.8
20.6
11.4
0
MiamiFL
25
6
9
7
16
1
3
7
3
3
4
77
75.6
66.8
8.8
4
4
2
0.477
0.751
0.366
0.6
10.7
12.8
2
1
TexasA&M
26
5
18
5
14
0
1
8
2
7
5
76
75.9
65.5
10.4
3
2
2
0.449
0.673
0.35
2.4
11.8
16.9
4.4
1
WVirginia
26
7
10
9
13
3
5
7
1
4
3
77
79.2
66.6
12.6
2
2
2
0.452
0.67
0.329
3.6
14
14.8
8.6
0
Utah
25
5
8
9
18
1
3
9
2
5
4
77
77.6
69.1
8.5
4
3
4
0.489
0.714
0.363
-2
12.3
15.6
4.6
1
Cal
23
4
16
7
14
0
2
8
3
3
3
78
75.1
67
8.1
5
2
2
0.461
0.656
0.369
-2.6
12.2
13
6.8
0
Duke
23
5
20
6
11
1
2
6
3
7
2
79
81.5
72.1
9.4
5
2
3
0.459
0.722
0.387
2.1
9.9
13.5
-0.6
1
Kentucky
26
4
11
3
15
0
3
8
3
7
1
77
79.7
68.3
11.4
3
3
2
0.479
0.683
0.37
1.3
11.1
14.4
5.2
1
IowaSt
21
4
23
8
12
1
2
5
2
4
4
78
81.8
75
6.8
7
3
2
0.502
0.706
0.38
0.8
11.6
16.5
-0.5
1
Maryland
24
4
14
5
8
1
2
5
3
4
4
77
76.1
66.3
9.8
5
3
2
0.488
0.76
0.374
-1.6
12.9
14.1
2.6
1
Baylor
22
6
25
5
10
3
4
5
1
3
5
76
77.2
69.3
7.9
4
3
0
0.467
0.727
0.367
0.9
12.9
17.8
7.6
0
Indiana
25
6
24
6
10
1
2
7
1
5
5
77
82.3
68.9
13.4
4
4
7
0.501
0.723
0.415
-0.5
13.7
16
6.5
1
Purdue
26
5
15
4
10
1
2
7
3
5
5
77
77.7
64.6
13.1
4
3
3
0.472
0.742
0.368
-2.7
11.9
17.6
10.5
0
Arizona
25
5
26
4
13
1
4
7
3
5
5
77
81.2
69
12.2
3
4
1
0.482
0.723
0.383
-1.6
12.8
14.5
9.2
0
Texas
20
5
27
7
13
2
4
5
4
3
6
77
71.3
68.1
3.2
3
3
0
0.432
0.665
0.339
2.1
10.4
11.5
-1.3
0
NotreDame
21
5
31
5
8
1
1
6
2
4
2
78
75.7
70.6
5.1
5
2
1
0.471
0.745
0.369
-0.3
9.7
13.5
1.8
1
SetonHall
25
7
19
6
10
1
2
8
0
4
1
77
74.8
67.8
7
4
3
0
0.45
0.666
0.353
0
13.8
13.5
3.6
0
Iowa
21
6
29
5
8
2
3
4
1
7
5
78
78.1
68.7
9.4
2
1
4
0.45
0.742
0.382
2.8
10.4
16
0.8
1

OregonSt
3
1
0
Wisconsin
4.3
3
Dayton
7.4
4
Colorado
3
1
3
StJoe
7.7
3
USC
6
5
2
TexasTech
2.6
4
UConn
10.3
4
Cincinnati
10.3
4
Providence
3
0
0
Butler
9.4
4
Temple
1.3
3
VCU
9.9
4
Pittsburgh
3
3
2
Syracuse
4
2
2
Wichita
14.1
2
NIowa
4
1
2
Michigan
4
4
3
Gonzaga
13.5
4
SDakotaSt
8.5
3
Yale
12.1
3
Chattanooga
9.2
4
LittleRock
11.3
3
Hawaii
11.1
3
UNCWilmingt
7.8
4
StonyBrook
13.4
4
Iona
4
2
3
Buffalo
2.5
4
UWGreenBay
4.5
3
SFAustin
17.5
3
FresnoSt
3
3
2
;

19

4
33
4
11
0
2
6
6
7
6
78
72.1
70
2.1
0.441
0.671
0.37
2.2
11.5
13.5
-2.5
0
20
6
43
5
10
2
3
7
1
8
1
77
68.9
64.6
1
0
0.43
0.707
0.358
0.9
11
11.6
3.1
1
25
8
22
3
9
1
2
6
1
7
2
76
73.2
65.8
3
1
0.46
0.673
0.347
-0.4
13.1
14.7
4.6
0
21
4
35
4
9
0
1
5
1
2
6
76
76
70.7
5.3
0.425
0.738
0.392
-2.3
13.4
13.8
8.6
0
27
10
21
4
9
0
2
7
1
4
4
77
77.6
69.9
2
0
0.454
0.713
0.327
1.3
10.1
15.1
2.8
1
21
3
51
5
9
0
1
3
4
2
1
79
80.8
74.8
3
0.458
0.678
0.385
0.1
12.2
14.9
1.7
0
19
3
36
6
8
1
1
6
3
4
2
77
72.4
69.8
3
2
0.447
0.746
0.344
0.8
12
13
0.3
0
24
6
32
3
8
1
2
7
4
3
5
78
73.4
63.1
2
0
0.459
0.787
0.362
1.3
11.1
13.9
1.7
1
22
6
48
4
7
2
2
6
1
3
4
77
73.2
62.9
3
2
0.428
0.704
0.345
2.6
11.1
15.5
4.7
0
23
7
40
2
8
1
3
5
0
4
0
77
74
69.7
4.3
0.422
0.727
0.321
3.2
11.5
15.9
-0.9
1
21
5
56
4
6
2
2
7
1
2
5
77
80.6
71.2
2
2
0.466
0.731
0.387
2.9
10.2
14.4
3.1
1
21
7
59
5
7
2
3
7
2
4
5
78
68.7
67.4
0
1
0.405
0.684
0.34
1.6
9.2
13.5
-1.1
0
24
7
37
2
8
1
4
7
1
4
2
76
77.2
67.3
3
2
0.45
0.69
0.356
3.6
11.4
14.4
2.9
1
20
3
53
2
9
1
3
4
1
3
5
78
76
67.9
8.1
0.46
0.754
0.348
-0.2
11.7
16.9
7.2
0
19
3
72
5
8
1
1
5
1
8
5
76
70.2
65.7
4.5
0.427
0.683
0.361
1.5
12.1
14.1
-1
1
23
9
47
1
4
0
2
7
2
5
5
77
73.1
59
3
1
0.432
0.693
0.331
5.6
9.8
14.3
4.4
1
21
6
70
4
8
1
2
9
2
4
3
77
68
62.9
5.1
0.457
0.753
0.375
1.9
9.8
11.9
-4.1
1
23
5
57
4
4
0
0
5
2
3
2
78
74.3
67.5
6.8
0.466
0.737
0.384
2.4
9.8
15
-1.1
0
25
9
45
2
5
0
1
8
4
3
4
77
79.7
66.2
2
2
0.486
0.76
0.378
-1.2
11.3
13.9
7.4
1
24
8
28
0
3
0
0
8
1
5
5
76
76.3
67.8
3
2
0.451
0.739
0.357
0.4
11.8
11.3
5.2
0
21
9
44
1
1
0
0
9
1
6
5
77
75.2
63.1
3
0
0.471
0.663
0.374
-1
13.4
15.2
10.9
1
27
11
50
1
5
1
3
8
2
4
5
77
75.8
66.6
2
2
0.456
0.733
0.364
1.7
12.4
14.6
3.2
0
27
12
42
1
4
1
2
8
2
1
5
76
70.9
59.6
2
3
0.458
0.734
0.388
3.5
10.5
13.1
-0.1
1
27
7
80
0
7
0
2
8
2
4
3
76
77.6
66.5
3
1
0.462
0.681
0.327
1.6
13.2
15.8
4.3
1
23
8
46
0
4
0
2
8
1
4
3
77
79.2
71.4
4
1
0.456
0.702
0.336
3.7
11.4
13.4
0.8
0
24
11
60
1
2
0
0
8
2
3
3
77
76.8
63.4
2
3
0.476
0.672
0.372
1.2
11.4
16.6
7.7
0
22
9
77
0
2
0
1
9
1
4
6
76
79.6
73.7
5.9
0.456
0.712
0.372
1.2
12.8
16.8
-0.5
0
19
5
91
1
2
0
0
6
2
7
3
76
77.6
75.1
0
1
0.438
0.71
0.337
-0.3
13.7
12.7
2.5
0
21
8
112
2
2
0
0
8
1
2
2
77
84.2
79.7
2
2
0.448
0.659
0.35
4.7
12.1
13.9
-2
0
23
9
61
0
3
0
1
10
1
7
5
76
80.7
63.2
3
1
0.484
0.73
0.368
6.5
12.4
18.9
2.7
1
23
7
66
2
2
0
0
9
1
4
3
77
75.3
70.4
4.9
0.434
0.692
0.342
4.3
10.7
13.5
0.5
0

proc print data=ncaa2016 noobs;;


run;
/*Remove teams that lost in the play in games and code either first round win or loss*/
data ncaa3;
set ncaa;
if team='BoiseSt' then delete;

if team='BYU' then delete;


round1=.;
if (twins>0) then round1=1;
if (twins=0) then round1=0;
run;
proc print data=ncaa3;
run;

/*Examine correlation and covariance matrices*/


proc corr cov data=ncaa3;
var Wins -- REBDIFF;
run;
/*Code for Table 1*/
/*Discriminant Analysis using full data set*/
PROC DISCRIM data=ncaa3 testdata=ncaa2016 testout=ncaawins CROSSVALIDATE MAHALANOBIS;
CLASS round1;
VAR wins--rebdiff;
priors '0'=.42 '1'=.58;
RUN;
proc print data=ncaawins;
var team round1 _into_;
Run;
/*Table 2*/
/*Stepwise selection to determine which variables are high discriminant ablility*/
PROC STEPDISC DATA=ncaa3 METHOD=Stepwise sle=.4 sls=.2;
CLASS round1;
VAR wins--rebdiff;
run;
/*Table 3*/
/*Discriminant Analysis using only those variables determined to be significant by the stepdisc
procedure*/
proc discrim data=ncaa3 testdata=ncaa2016 testout=ncaawins2 crossvalidate mahalanobis;
class round1;
var rpi _10pt bigs topg rwarpi100;
priors '0'=.42 '1'=.58;
run;
proc print data=ncaawins2;
var team round1 _into_;
run;

/*compute difference vectors*/


data matchups;
input game$ Wins RoadWins RPI WARPI50 WARPI100 RWARPI50 RWARPI100 WL10 BIGS FRESH SENIOR MEANHT
PPG OPPG DPPG _10PT _50PCt _40PCT3 FGPCT FTPCT _3PCT TODIFF TOPG ASTPG REBDIFF result$;
cards;
NdNe 6 -1 -70 7 8 3 2 1 1 1 0 1 10.2 0.6 9.6 1 1 2 0.024 -0.005 0.004 5.6 -4.3 0.8 -4.7 fav
BayGast 0 -2 -43 8 11 2 3 -3 -1 -2 -1 1 -2.5 -1.9 -0.6 1 -5 1 -0.046 -0.057 0.046 -4.8 1.8 1 8.6
up
OuAlb -2 -6 -81 12 11 4 3 -2 0 -1 2 0 6.4 2.6 3.8 0 0 0 -0.004 -0.026 -0.017 0.9 0.4 2.1 -3.7 fav
IastDav 6 2 -20 12 13 4 4 2 1 -3 1 0 9.5 1.6 7.9 4 2 1 0.05 -0.045 0.034 2.2 -2.6 1.8 -0.4 up
MdValpo -1 -2 -37 5 7 2 3 -1 1 2 6 1 -0.3 3.9 -4.2 0 0 1 -0.021 0.079 -0.006 -0.5 -0.5 -2.2 -5.7
fav
UncHar 2 -2 -41 7 7 1 1 -2 2 0 -3 0 13.7 11.2 2.5 1 0 -1 0.04 -0.013 -0.008 -1.2 0.9 4.8 4.5 fav
LouUci 3 9 -67 3 6 0 2 -2 0 5 -1 -1 1.3 -2.8 4.1 0 -1 -3 -0.032 -0.023 -0.086 2.9 0.1 -2.4 1.5
fav
GtownEwash -5 -6 -59 4 7 1 1 -1 1 0 2 2 -10.1 -9 -1.1 -2 0 -2 -0.025 -0.021 -0.056 -0.6 1.8 0 2.6
fav
WvaBuff 0 -4 -4 7 6 1 1 -3 -1 -2 2 0 -1.1 -1.5 0.4 -2 -2 -1 -0.023 -0.062 -0.022 4.3 1.8 1.3 0.5
fav
ArkWoff -2 -3 -27 3 9 1 3 -2 2 -1 -1 2 11 10.3 0.7 -1 -2 0 -0.011 0.035 -0.026 2.3 0.8 2.7 -1 fav

NiaWyo 5 4 -57 -2 3 0 1 3 -2 -1 -1 -1 3.7 -1.7 5.4 -1 -2 5 0.021 0.019 0.074 1.1 -0.7 -2.4 2.3
fav
UtSfa -5 -4 -14 3 3 1 -1 -3 2 3 -1 2 -7.4 -7.7 0.3 1 0 -1 -0.006 -0.035 0.018 -3 -2.7 -3.5 -0.3
fav
ButTex 2 3 -11 3 0 2 1 1 -2 -1 2 -1 1.7 0.8 0.9 0 0 0 0.003 -0.045 0.019 4.9 -1.3 -1.7 -1.8 fav
XavMiss 1 -3 -30 5 2 1 -1 1 4 4 -2 -1 1 0.2 0.8 0 -1 0 0.047 -0.053 0.011 -0.4 0.8 3.6 0.8 fav
ProvDay -3 0 -10 5 7 1 2 -2 1 4 1 3 2 4.6 -2.6 0 -2 0 -0.029 -0.001 -0.087 -0.9 2.2 -3.1 5 up
SmuUcla 7 7 -36 3 8 2 4 3 -1 -5 5 -1 -2.6 -8.2 5.6 0 4 2 0.038 0.029 -0.004 -0.5 0.7 1.6 3 up
WichIu 8 7 -44 -2 -1 0 1 5 1 0 2 -2 -7.8 -15.6 7.8 1 1 -4 -0.02 -0.027 -0.046 4.6 -2.1 -0.1 2.1
fav
ROW18 3 4 -26 5 6 1 3 1 -3 0 -3 -2 -3.3 0.1 -3.4 -1 -1 -2 -0.066 -0.022 -0.03 2.1 -0.7 -2.8 -4.2
up
MsuGa 2 -1 -15 4 1 1 0 1 0 2 0 0 3.6 -0.8 4.4 -2 3 0 0.036 -0.056 0.042 1.1 -1.2 4.5 2.6 fav
IaDavid -3 -2 8 2 2 2 2 -2 1 -2 2 1 -10.5 -7.1 -3.4 -2 -2 -4 -0.043 0.034 -0.065 -1 1.7 -2.8 3.3
fav
CincPur 1 2 -18 2 -1 3 2 0 0 -1 0 0 -7.6 -9.2 1.6 -1 -1 1 0 -0.011 -0.002 0.8 -0.3 -3 0.6 fav
OreOkst 7 1 -22 -3 4 -3 -2 4 0 1 -2 0 8.3 8.4 -0.1 0 1 1 0.022 0.034 0.01 -1.7 -0.5 2.7 3.4 fav
NcstLsu -2 -3 -18 1 -2 0 -3 0 -4 0 1 0 -3.3 -2.3 -1 -1 -2 1 -0.019 -0.006 0.006 0.4 -4.1 -4.1 1
fav
SdsuStj 5 3 -18 -1 -2 -2 -2 1 -1 -2 -1 0 -7.4 -14.5 7.1 -1 1 1 -0.022 -0.063 -0.033 -0.6 0.7 -2
6.9 fav
;
data matchup2016;
input game$ Wins RoadWins RPI WARPI50 WARPI100 RWARPI50 RWARPI100 WL10 BIGS FRESH SENIOR MEANHT
PPG OPPG DPPG _10PT _50PCt _40PCT3 FGPCT FTPCT _3PCT TODIFF TOPG ASTPG REBDIFF result$;
cards;
MiaBuff 6 1 -82 6 14 1 3 1 1 -4 1 1 -2 -8.3 6.3 0 4 1 0.039 0.041 0.029 0.9 -3 0.1 -0.5 fav
TxamGb 5 -3 -94 3 12 0 1 0 1 5 3 -1 -8.3 -14.2 5.9 0 0 0 0.001 0.014 0 -2.3 -0.3 3 6.4 fav
WvaSfa 3 -2 -51 9 10 3 4 -3 0 -3 -2 1 -1.5 3.4 -4.9 -1 -1 1 -0.032 -0.06 -0.039 -2.9 1.6 -4.1 5.9
up
UtaFres 2 -2 -58 7 16 1 3 0 1 1 1 0 2.3 -1.3 3.6 1 0 2 0.055 0.022 0.021 -6.3 1.6 2.1 4.1 fav
CalHaw -4 -3 -64 7 7 0 0 0 1 -1 0 2 -2.5 0.5 -3 2 -1 1 -0.001 -0.025 0.042 -4.2 -1 -2.8 2.5 up
DukWil 0 -3 -26 6 7 1 0 -2 2 3 -1 2 2.3 0.7 1.6 1 -2 2 0.003 0.02 0.051 -1.6 -1.5 0.1 -1.4 fav
KySton 2 -7 -49 2 13 0 3 0 1 4 -2 0 2.9 4.9 -2 -1 1 -1 0.003 0.011 -0.002 0.1 -0.3 -2.2 -2.5 fav
IastIona -1 -5 -54 8 10 1 1 -4 1 0 -2 2 2.2 1.3 0.9 3 1 -1 0.046 -0.006 0.008 -0.4 -1.2 -0.3 0
fav
MdSdst 0 -4 -14 5 5 1 2 -3 2 -1 -1 1 -0.2 -1.5 1.3 2 0 0 0.037 0.021 0.017 -2 1.1 2.8 -2.6 fav
BayYale 1 -3 -19 4 9 3 4 -4 0 -3 0 -1 2 6.2 -4.2 1 0 0 -0.004 0.064 -0.007 1.9 -0.5 2.6 -3.3 up
IuChat -2 -5 -26 5 5 0 -1 -1 -1 1 0 0 6.5 2.3 4.2 0 2 5 0.045 -0.01 0.051 -2.2 1.3 1.4 3.3 fav
PurLr -1 -7 -27 3 6 0 0 -1 1 4 0 1 6.8 5 1.8 1 1 0 0.014 0.008 -0.02 -6.2 1.4 4.5 10.6 up
AzWich 2 -4 -21 3 9 1 2 0 1 0 0 0 8.1 10 -1.9 1 1 0 0.05 0.03 0.052 -7.2 3 0.2 4.8 up
TxNia -1 -1 -43 3 5 1 2 -4 2 -1 3 0 3.3 5.2 -1.9 -1 2 -2 -0.025 -0.088 -0.036 0.2 0.6 -0.4 2.8 up
NdMich -2 0 -26 1 4 1 1 1 0 1 0 0 1.4 3.1 -1.7 1 -2 -2 0.005 0.008 -0.015 -2.7 -0.1 -1.5 2.9 fav
SetGon 0 -2 -26 4 5 1 1 0 -4 1 -3 0 -4.9 1.6 -6.5 0 1 -2 -0.036 -0.094 -0.025 1.2 2.5 -0.4 -3.8
up
IaTemp 0 -1 -30 0 1 0 0 -3 -1 3 0 0 9.4 1.3 8.1 -1 1 3 0.045 0.058 0.042 1.2 1.2 2.5 1.9 fav
OrstVcu -5 -3 -4 2 3 -1 -2 -1 5 3 4 2 -5.1 2.7 -7.8 -1 -2 -2 -0.009 -0.019 0.014 -1.4 0.1 -0.9
-5.4 up
WisPit 0 3 -10 3 1 1 0 3 0 5 -4 -1 -7.1 -3.3 -3.8 0 -2 -2 -0.03 -0.047 0.01 1.1 -0.7 -5.3 -4.1
fav
DaySyr 6 5 -50 -2 1 0 1 1 0 -1 -3 0 3 0.1 2.9 0 1 -1 0.033 -0.01 -0.014 -1.9 1 0.6 5.6 up
CoUcon -3 -2 3 1 1 -1 -1 -2 -3 -1 1 -2 2.6 7.6 -5 -1 -1 3 -0.034 -0.049 0.03 -3.6 2.3 -0.1 6.9 up
StjoCin 5 4 -27 0 2 -2 0 1 0 1 0 0 4.4 7 -2.6 -1 -1 -2 0.026 0.009 -0.018 -1.3 -1 -0.4 -1.9 fav
UscProv -2 -4 11 3 1 -1 -2 -2 4 -2 1 2 6.8 5.1 1.7 2 2 3 0.036 -0.049 0.064 -3.1 0.7 -1 2.6 up
TechBut -2 -1 -23 0 5 -2 0 -1 5 5 1 1 -8.5 -1.2 -7.3 -1 -1 -2 -0.025 -0.06 -0.017 -0.7 1.3 -0.9
-5.6 up
;

proc print data=matchups;


run;
proc print data=matchup2016;
run;
/*Table 4*/
/*Discriminant Analysis for sorting 2016 matchups into upsets and non upsets using all
variables*/
PROC DISCRIM data=matchups testdata=matchup2016 Testout=_20161 CROSSVALIDATE MAHALANOBIS;

CLASS result;
VAR wins--rebdiff;
priors 'up'=.4 'fav'=.6;
RUN;
proc print data=_20161;
var game result _into_;
run;
/*Table 5*/
/*Select Variables that are significant*/
PROC STEPDISC DATA=matchups METHOD=Stepwise sle=.4 sls=.2;
CLASS result;
VAR Wins -- REBDIFF;
run;

/*discriminant analysis using variables determined by stepwise method*/


/**********************************************************************************/
/*Best So Far 71%*/
proc discrim data=matchups testdata=matchup2016 testout=steptest;
class result;
var rwarpi100 rebdiff bigs _10pt dppg _50pct;
run;
proc print data=steptest;
var game result _into_;
run;
/*Table 6*/
/*stepwise method with priors*/
proc discrim data=matchups testdata=matchup2016 testout=steptest2;
class result;
var rwarpi100 rebdiff bigs _10pt dppg _50pct;
priors 'up'=.4 'fav'=.6;
run;
proc print data=steptest2;
var game result _into_;
run;

/*Table 7*/
/*Find Principal components for variables determined in proc stepdisc*/
proc princomp data=matchups out=matchstepprin;
var rwarpi100 rebdiff bigs _10pt dppg _50pct;
run;
data matchup20162;
set matchup2016;
prin1= .264007*rwarpi100-.202955*rebdiff+.284637*bigs+.522868*_10pt+.534842*dppg+.498644*_50pct;
prin2=-.642638*rwarpi100+.552524*rebdiff-.124618*bigs-.097433*_10pt+.406562*dppg+.302355*_50pct;
prin3=.009637*rwarpi100+.52538*rebdiff+.67893*bigs+.32405*_10pt-.135052*dppg-.37375*_50pct;
prin4=.420703*rwarpi100+.405756*rebdiff-.611416*bigs+.335003*_10pt+.251843*dppg-.329985*_50pct;
prin5=.582313*rwarpi100+.385183*rebdiff+.175238*bigs-.612492*_10pt+.065988*dppg+.319898*_50pct;
prin6=.033831*rwarpi100+.253966*rebdiff-.194866*bigs+.353170*_10pt-.680171*dppg+.555898*_50pct;
run;
proc print data=matchup20162;
run;

/*Table 9*/
/*Discriminant Analysis using principal components from previous princomp procedure*/
PROC DISCRIM data=matchstepprin testdata=matchup20162 Testout=_20162 MAHALANOBIS;
CLASS result;
VAR prin1--prin4;
priors 'up'=.3 'fav'=.7;
RUN;

proc print data=_20162;


var game result _into_;
run;
/*Table 10*/
/*Really low success rate for the principal components (50%), which is curious. Run stepwise
analysis on princomps*/
PROC STEPDISC DATA=matchstepprin METHOD=Stepwise sle=.4 sls=.2;
CLASS result;
VAR prin1--prin6;
run;
/*Table 11*/
/*Stepwise analysis yielded principal components 2,4,and 6 as significant*/
/*75% success with 3 variables!*/
PROC DISCRIM data=matchstepprin testdata=matchup20162 Testout=_20164 MAHALANOBIS;
CLASS result;
VAR prin2 prin4 prin6;
priors 'up'=.3 'fav'=.7;
RUN;

/*Table 12*/
proc print data=_20164;
var game result _into_;
run;
/*Plots of the original data and sorted data with respect to the three principal components*/
Proc GPlot Data=matchstepprin;
Plot Prin2*Prin4=result Prin2*Prin6=result Prin4*Prin6=result/ VAxis=Axis1 HAxis=Axis2 Frame;
Axis1 Order=(-3 To 3 By 1);
Axis2 Order=(-3 To 3 By 1);
Symbol1 C=Black V=Dot I=None PointLabel=("#game");
Symbol2 C=Red V=Dot I=None PointLabel=("#game");
Run;
/*Plots 1 and 2*/
Proc GPlot Data=matchup20162;
Plot Prin2*Prin4=result Prin2*Prin6=result Prin4*Prin6=result/
Axis1 Order=(-10 To 10 By 1);
Axis2 Order=(-7 To 7 By 1);
Symbol1 C=Black V=Dot I=None PointLabel=("#game");
Symbol2 C=Red V=triangle I=None PointLabel=("#game");
Run;

VAxis=Axis1 HAxis=Axis2 Frame;

You might also like