You are on page 1of 45

Acct 7397 Data Analytics 1 & 2

Work with competence and integrity: In the end, we seek evidence based
decision making - not decision based evidence making.
My Summer Vacation
My Summer Vacation
My Summer Vacation
My Summer Vacation
current trends

transformation of Goldman Sachs, and increasingly


other Wall Street firms, that began with the rise in
computerized trading, but has accelerated over the past
five years, moving into more fields of finance that
humans once dominated...

In addition to back-office workers, on machines are replacing a lot of highly paid people, too.

Average compensation for staff in sales, trading, and research at the 12 largest global investment banks, of which
Goldman is one, is $500,000 in salary and bonus

for the highly paid who remain the pay of the average managing director at Goldman will probably get even bigger,
as there are fewer lower-level people to share the profits with, he says.

his expertise makes him suited to the task of CFO, a role more typically held by accountants Everything we do is
underpinned by math and a lot of software

Goldmans new consumer lending platform is entirely run by software, with no human intervention
current trends

If machines can do accounting, can they do audit auditing, and tax?


current trends

EY
Whats the likelihood that the machines will
replace accounting and audit work?

If they can do accounting work, what cant they


do?

Answer: We dont know yet, but so far, there


seems to be no ceiling.

2020 will be a different world.


but as jobs are eliminated, others are created

Professional Competency EY
CPA skills are not enough anymore and the workforce is becoming
more and more competitive

Professional Opportunity
The job market is changing rapidly and most of the lower level
admin jobs are disappearing at the same time, analytics jobs are
exploding

McKinsey
Analytics job market today
trends This is our target, and this
position will increase faster
than the others

Analytics
Lead
Business
Translator
...Analytics job market today
Scope and Direction of Analytics
Major 2000 2010 2020
disruptions Internet Mobility Artificial Intelligence

1996 2000-2010 2015 -


The Data Business Intelligence Learning Systems
Warehouse Big Data
Toolkit
Ralph Kimball

IDEA, ACL

Visualization
(Tableau, Power BI)

Data Warehouses
(SSAS)

ETL (Extract, Transform, Load)


(SSIS, Informatica)
Scope of Data Analytics Courses
Major 2000 2010 2020
disruptions Internet Mobility Artificial Intelligence

1996 2000-2010 2015 -


The Data Business Intelligence Learning Systems
Warehouse Big Data
Toolkit
Ralph Kimball

IDEA, ACL Tomorrows Analyst Tools


Covered in multiple courses
Visualization
(Tableau, Power BI)

Data Warehouses
(SSAS)

Obsolete ETL (Extract, Transform, Load) IT Focused


(SSIS, Informatica) (re: DISC data track)
Learning Systems
Two Broad Applications: Deep Learning and Process Automation

Deep Learning Thought Exercise:


Youre auditing a large manufacturing company Your doing preliminary analysis, and you can see that the
operating margin has decreased. Possible explanations?

Revenue: Change of volume or price? product mix? product technology and function? markets?
customers? new products, deprecated products? distribution channels? capacity, competitors, perceived
customer value, delivery costs

Costs: Change in capacity, transportation costs, inventory turn, manufacturing strategy / logistics,
materials market, procurement strategy, suppliers, product design and BOM

How many potential dimensions for analysis?

How many interrelationships?

Can you analyze this with a spreadsheet or visualization tool?

If youre in charge of the audit, do you think you need to understand this? What about managers in the
company?

Can the machines analyze this? If machines can identify hidden drivers, do you think the client would
consider that valuable?
Learning Systems
Two Broad Applications: Deep Learning and Process Automation

Process Automation Thought Exercise:


Youre auditing a large manufacturing company and you need to vouch revenues.

Data: What data needs to be considered? Do you need the entire Order-to-Cash process? Contracts -
T&Cs, Sales Order transactions, Shipping Transactions, Customs transactions, Payments.

How extensive and complex is that data? Where is it? How do you match and what about partial
matching? Will you need to read descriptions and documents? Make judgements?

Can machines do this work? Better than people?


Prerequisite Assumptions DA2
R Language (working knowledge or DA1)
Data Acquisition and Description
Analysis of Variance / Covariance, Correlation, Principle Components
Basic Calculus and Linear Algebra (cheat sheets on Blackboard)
Basic Statistics (cheat sheets on Blackboard)

Attitude
Its a competitive world. It takes a hundreds of things done right to get a
promotion, and one thing done wrong to get fired. Take your job seriously
(and this class is your job right now):
Come to class prepared (there will be pop quizzes)
Apply what you learn ask yourself what if and test your knowledge
Take responsibility this is graduate school
Professor
Ellen Terry

http://econolytics.org

ewterry@bauer.uh.edu
MH 360K
713-743-4820

Background:

JP Morgan - Vice President Data Science


General Electric - Director Planning and Programs
Microsoft Corporation - Industry Solution Architect
+ Research at the Santa Fe Institute and United Nations
Deloitte => Polaris Consulting - Principal Consultant
Syllabus DA2 (tentative)
Introduction Statistical Learning Theory
Ch 1 (Intro) &2 (statistical learning) ISL + Class Material
Regression
Ch 3 ISL (linear regression) + Class Material
Ch 6 ISL (linear model selection and regularization) + Class Material
Ch 7 ISL (moving beyond linearity) + Class Material
Classification
Ch 4 ISL (classification) + Class Material
Support Vector Machines
Ch 9 ISL (support vector machines) + Class Material
Resampling Well learn in R
Studio
Ch 5 ISL (resampling methods) + Class Material Well do Projects in
Projects
R Studio
We may use AML
TBD depending on project
scope and budget
Data Analytics 2
Grading:
1 Mid-Term Exam MC + Problems (R Code files) (40%)
Project Review Team Score %*% Ranking by Leader (30%)
n Pop Quizzes (20%) MC and/or Problems drawn from ISL and class material
Do not miss class and dont be late (no makeups)!
n Homework assignments (10%)
Resources:
Required: Introduction to Statistical Learning (James, Witten, Hastie, Tibshirani)
free download: http://www-bcf.usc.edu/~gareth/ISL/ . (exam questions will be
pulled from this book, whether theyre covered in class or not)
R Studio (on your computer or Lab Server)
Azure Machine Learning (team allocations you can sign up with an individual
account for free with limits)
Optional Reads:
Elements of Statistical Learning (James, Witten, Hastie, Tibshirani)
All of Statistics (Larry Wasserman)
All of Non-Parametric Statistics (Larry Wasserman)
Statistical Learning Body of Knowledge

Body of knowledge is VAST, COMPLEX, and growing at


a BREATHTAKING PACE. But thats too bad you have
to know it anyway: You can learn as you go, but be
aware of the implications not doing the research and
applying best practices is incompetence, which is
unethical and introduces legal exposure to you and
your firm.

Analytics Leads (business translators) take


responsibility for projects: which means you have to
know enough to set direction, garner respect
(respect is earned through competency, not title),
make decisions and fill gaps (resources are scarce
and youll have to pick up the slack youre still
responsible to deliver on time!)
Modeling

Model

Parameters Hyper Parameters

, a,e,

Data

Modeling is iterative and intuitive - all the elements are interconnected in a


complex (and sometimes perplexing) network. Discovery and changes to
models, parameters, hyper parameters and data impact all the elements, and
finding the sweet spot is both art and science. Thats why the modeling
process is called an experiment. Good data scientists are algorithm
whisperers.

And thats why we spend a lot of time on theory and intuition


Modeling

Model Development Model Evaluation


In search of
Why estimate ?
Prediction
Estimation of a value (note: Prediction is not just about the future (time is
just another dimension - e.g., you might estimate past to compare with
actual y in assurance)
Inference
Description of the underlying data and relationships
How do we estimate ?
Parametric Methods. Fits to data based on assumption about the form of
using statistical methods to determine model parameters
Non-Parametric Methods. Develops the form of based on the data (within
broad groups: regression (continuous ) classification (discrete ) and
subgroups (e.g., support vector machines, decision trees, regression)
George Box

PhD from the University of London in 1953, under the supervision of


Egon Pearson (you should know Karl Pearson)
Created the Department of Statistics @ University of Wisconsin
Married Joan Fisher - Ronald Fisher's daughter (you should know Ronald
Fisher)

All models are wrong but some are useful

Ellens Opinion: Its mostly important that you have applied the existing body of knowledge in a competent
manner. If the model is the best you have (not so obvious) then go with it (just be transparent and always
keep project sponsors in the loop). Telling managers and clients that the business model is invalid because of
some theoretical issue doesnt inspire them to write checks and continue your employment

Work with competence and integrity: In the end, we seek evidence based decision making - not decision
based evidence making.
Model Flexibility vs. Interoperability
Linear
Regression

Interoperability Polynomial and


Non-Linear
Least Squares

Local Models
(Splines and LOESS)

Support
Vector
Machines

Flexibility

In general, more restrictive models will be better for inference (understanding relationships between predictors
and response variables) and ensembling (consolidating algorithms within larger models, or piping data
between models).

More flexible models are usually more complex and involve more parameters leading to issues with
overfitting, more training data and processing times.
Central Concept: Bias vs Variance

Dont get these definitions confused with data description and


distributions, were talking about Models here:
^
Bias. The difference between f (x) and f (x) due to the model parameters (testing)
^
Variance. The difference between f (x)s due to training data samples (training)

Mathematically: reducible irreducible

Bias Variance

^ ^ ^
Err(x)=(E[(x)]f(x))2+E[((x)E[(x)])2]+2e
Test MSE model training Random
parameters samples

General rule: as you move to more complex models, bias decreases and variance increases
Bias vs Variance

Test MSE

Minimum Possible Test MSE

Training MSE

^
Comparing different f models to the true f non-linear data (composite for all models)
bias vs variance

Comparing different f^ models to the true f ~linear data


bias vs variance

Fitting ^f models to the true f with highly ~non-linear data


bias vs variance
DA1 Review: R Matrix Creation
A matrix is m x n (rows, columns), so A[2,1] is the second row, first column element in a matrix

#create a matrix and a vector (several ways to do it)


A <- matrix ( c(2, 2)) # this creates a vector
A <- cbind(A, c(3,1)) # then combine vectors with cbind or
A <- matrix (c(2,3), nrow =1)
A <- rbind(A, c(2,1)) # then combine rows
A <- matrix( c(2, 2, 3, 1), nrow=2, ncol = 2) # or just create the matrix at once.
write.csv(A, file = "Class 1 - Foundations/A.csv", row.names = FALSE)
A <- read.csv(file = "Class 1 - Foundations/A.csv", header=TRUE, stringsAsFactors=FALSE)

# read.csv creates a dataframe (covered later)


str(A) #you always need to check on the data types that dataframes create
A <- as.matrix(A)

# OK, let's create a couple of vectors and move on with matrix operations
B <- c(3,2) # vector 1
C <- c(0,1) # Vector 2
D <- A #easy to duplicate data structures
D <- A[,1] # or parts - notice that D beccomes a vector
D <- cbind(D, A[,2]) # and back again
D <- t(D) #transpose a matrix

A matrix transposition rotates the matrix on the main diagonal (from 1,1)
DA1 Review : Basic Matrix Operations

Operator or
Description
Function
A+B Must be same structure
A-B Addition / Subtraction always an element operation
t(A) Transpose
A*B Element multiplication (the product of vectors or matrices e.g., product * price)
A %*% B Matrix multiplication (Important)
A %o% B Outer product. AB'
crossprod(A,B)
A'B and A'A respectively.
crossprod(A)
DA1 Review : Matrix Addition and Multiplication

D <- A+D # has to be same structure


E <- B+C A[1,1] * B[1] = E[1,1]
E <- B-C A[1,2] * B[1] = E[1,2]

E <- A*B # elements

(A[1,1] * B[1,1]) + (A[1,2] * B[2,1]) = E[1,1]


(A[1,1] * B[1,2]) + (A[1,2] * B[2,2]) = E[1,1]

G <- A %*% B

#Dot product

t(A)

H <- crossprod(A,B)

#(equivalent to t(A) * B)
DA1 Review: Diagonals and Determinants
diag(x) Creates diagonal matrix with elements of x in the principal diagonal

E <- diag(A)

I <- diag(2)

If you feed it a matrix, it will give you back a vector of the diagonal
If you feed it a number, it will create an identity matrix with that number (1 being the most common)

diag(x) Creates diagonal matrix with elements of x in the principal diagonal

J <- det(A)
(2*1)-(3*2) =
used to solve linear systems of
equations
K <- solve(A)

used to achieve matrix division, among


other things

To get the inverse (A)-1: swap the positions of a and d, put negatives in front of b and c, and divide everything by
the determinant (thats why you have solve ) a matrix * its inverse equals the identity (definition of inverse) A-1A = I
Check to see if K*A is a matrix of ones (not the same as an identity matrix)
L <- K*A
DA1 Review: Eigenvector transformation
Eigenvectors & Eigenvalues
Originally utilized to study principal axes of the rotational motion of rigid bodies, eigenvalues and eigenvectors have a wide
range of applications, for example in stability analysis, vibration analysis, atomic orbitals, facial recognition, and matrix
diagonalization. In essence, an eigenvector v of a linear transformation T is a non-zero vector that, when T is applied to it,
does not change direction. Applying T to the eigenvector only scales the eigenvector by the scalar value , called an
eigenvalue. This condition can be written as the equation:

Av = v , or (A- I) = 0
if det((A- I) = 0
is a scalar eigenvalue associated with an eigenvector v that can be used for transformation of a matrix.
Eigenvalues are:

Non-Zero
nxn (square) matrix only (matrix is diagonalizable)
From a geometrical perspective, does not change the direction of a vector A (next slide)
There always exists at least one eigenvalue / eigenvector

When eigenvectors are applied to linear transformation, the matrix just gets scaled, and the transformation still tells
us what we need to know about the original matrix

It breaks down the linear transformation into simple operations.


To solve for eigenvalues:
DA1 Review: Dot Product Transformation
A matrix can be thought of as defining a linear transformation in space

Z <- A%*%B

When dot products are applied to linear transformation, the transformation still tells us what we need to know
about the original matrix

Also called the Scalar product


DA1 Review: Eigenvectors as scalars
V1 <- data.frame(x=c(0, 3), y=c(0, 4))

p <- ggplot(V1, aes(x=x, y=y))+geom_point(color="black")


p <- p + geom_segment(aes(x = V1[1,1], y = V1[1,2], xend = V1[2,1], yend
=V1[2,2] ), arrow = arrow(length = unit(0.5, "cm")))
p <- p +xlim(0, 20) + ylim(0,20)
p

# create and draw the eigenvector (transpose first)


tV1 <- t(V1)
eV1 <- cbind(eigen(tV1)$values, eigen(tV1)$vectors)
ev2 <- eigen(V1)$values

V2<-data.frame(X=c(0, (eV1[1,1]*eV1[1,2])),Y=c(0, (eV1[1,1]*eV1[2,2])))


p <- p + geom_segment(aes(x = V2[1,1], y = V2[1,2], xend = V2[2,1], yend
=V2[2,2] ), col="blue", linetype="dashed", arrow = arrow(length = unit(0.5,
"cm")))
p

#create and draw the dot product vector


V3 <- as.matrix(V1)
dpV3 <- V3%*%V3
p <- p + geom_segment(aes(x = dpV3[1,1], y = dpV3[1,2], xend = dpV3[2,1],
yend =dpV3[2,2] ), col="red", linetype="dashed", arrow = arrow(length = unit(0.5,
"cm")))
p Term: scalar

Note that the eigenvectors and the dot product vector scale the original vector, but the direction doesnt
change (i.e., it gives us a mechanism to transform data without senescence)
DA1 Review: More Definitions
Vector Norm (sometimes call the magnitude)


|x|2 = |x| = x = c(1,2,3), then |x|2 = 14
2
=1
Note: the norm of a matrix is often written ||x||
L2 Norm (or Euclidean norm) is most common (there are different ways to calculate a norm and they get
different answers). More later on this
At
Get clear on Transpose (a few examples)

A <- matrix(c(1, 2), nrow = 1, ncol = 2)


tA <- t(A)

B <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)


tB <- t(B)

C <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)


tC <- t(C)
DA1 Review: More Definitions
Determinants

Vectors dont have determinants but they have norms |x|


Matrices have norms |x| and determinants ||x|| (I know, confusing)

Recall earlier:

# check to make sure it's right - should be identity matrix

L <- K%*%A
DA1 Review: matrix decomposition

Remember factoring in HS algebra?

X2 + 4X + 3 = (X + 3)(X + 1)

You can factor Matrices too, which turns out to be very useful

# singular value decomposition U


X <- matrix( c(1, 1, 1, 1, 1, 2, 3, 4), nrow=4, ncol = 2)

SingVal <- svd(X)


U <- SingVal$u D
D <- diag(SingVal$d)
V <- t(SingVal$v)

X2 <- U %*% D %*% V


V
# X = U D V'
DA1 Review Correlation
myQuery <- "
library(RODBC)
SELECT
library(tidyverse)
[Obs]
,[TV]
# NOTE SERVER and DATABASE
,[Radio]
CHANGE!!!!!!!!!!!
,[Newspaper]
,[Sales]
myServer <-
FROM [dbo].[Advertising]
"tcp:analyticslab.database.windows.net,1433"
"
myUser <- "Student"
Advertising <- sq(myQuery)
myPassword <- "Acct7397"
myDatabase <- "Accounting"
Ad <- dplyr::select(Advertising, Sales, TV, Radio, Newspaper)
myDriver <- "ODBC Driver 13 for SQL Server" #
Must correspond to an entry in the Drivers tab of
cor(Ad, method = 'pearson', use = 'pairwise')
"ODBC Data Sources"

connectionString <- str_c(


"Driver=", myDriver,
";Server=", myServer,
";Database=", myDatabase,
";Uid=", myUser,
";Pwd=", myPassword)

sq <- function (myQuery){


conn <- odbcDriverConnect(connectionString)
tQuery <- (sqlQuery(conn, myQuery))
close(conn)
return (tQuery)
}
Normal equations

Solving the derivatives directly

0 = 81 + 202 56 solve(X, B)
[1,] 3.5
0 = 201 + 602 154 [2,] 1.4

X <- matrix( c(8, 20, 20, 60), nrow=2, ncol = 2)


B <- matrix( c(-56, -154), nrow=2, ncol = 1)
solve(X, B)

Solving the equations using matrix algebra

= (XTX)-1 (XTY)

X <- cbind(1, mydata$X)


y <- mydata$Y
# we can solve this from the raw data by using a transpose
print(betaHat)
betaHat <- solve(t(X)%*%X) %*% t(X) %*%y [1,] 3.5
print(betaHat) [2,] 1.4
normal equations

Solving using single value decomposition


w
# now solving using SVD [1,] 3.5
[2,] 1.4
x <- t(X) %*% X
duv <- svd(x)
x.inv <- duv$v %*% diag(1 / duv$d) %*% t(duv$u)
x.pseudo.inv <- x.inv %*% t(X)
w <- x.pseudo.inv %*% y
w

# note we can also use SVD for dimension reduction (like PCA)
# it's also used in advanced numerical solutions (won't be doing
that here)
fun with vectors
# exercise for fun # get the vector magnitude (eclidian norm)
norm_vec <- function(x)sqrt(sum(x^2))
V1 <- data.frame(x=c(0, 3), y=c(0, 4)) mV1 <- norm_vec(V1)

p <- ggplot(V1, aes(x=x, y=y))+geom_point(color="black") # Calculte the Direction Vector and Show that the norm = 1
p <- p + geom_segment(aes(x = V1[1,1], y = V1[1,2], xend = V1[2,1],
yend =V1[2,2] ), arrow = arrow(length = unit(0.5, "cm"))) cosX <- V1[2,1]/mv1
p <- p +xlim(0, 20) + ylim(0,20) cosY <- V1[2,2]/mv1
p V4 <- data.frame(X=c(0,cosX), Y=c(0,cosY))

# create and draw the eigenvector (transpose first) p <- p + geom_segment(aes(x = V4[1,1], y = V4[1,2], xend = V4[2,1], yend
tV1 <- t(V1) =V4[2,2] ), col="red", linetype="dashed", arrow = arrow(length = unit(0.5,
eV1 <- cbind(eigen(tV1)$values, eigen(tV1)$vectors) "cm")))
ev2 <- eigen(V1)$values p

V2<-data.frame(X=c(0, (eV1[1,1]*eV1[1,2])),Y=c(0, norm_vec(V4)


(eV1[1,1]*eV1[2,2])))
p <- p + geom_segment(aes(x = V2[1,1], y = V2[1,2], xend = V2[2,1], mV4 <- as.matrix(V4)
yend =V2[2,2] ), col="blue", linetype="dashed", arrow = arrow(length = norm(mV4, type = '2')
unit(0.5, "cm")))
p

#create and draw the dot product vector


V3 <- as.matrix(V1)
dpV3 <- V3%*%V3
p <- p + geom_segment(aes(x = dpV3[1,1], y = dpV3[1,2], xend =
dpV3[2,1], yend =dpV3[2,2] ), col="red", linetype="dashed", arrow =
arrow(length = unit(0.5, "cm")))
p

# draw a right triangle (just for visual reference)

p <- p + geom_segment(aes(x = V1[2,1], y = V1[1,1], xend = V1[2,1],


yend =V1[2,2] ), col="blue", linetype="dashed")
p
fun with vectors
Euclid

300 B.C.

Alexandria, Ptolemaic
Egypt

Non-Euclidian Geometry

~ 1800 Gauss

Extended

~ 1900 Hilbert
kernel functions
Einstein

Many of the underlying principles that form the basis of statistical learning theory are still based
on Euclids Axioms (we just extend them into infinite dimensions using the work of Gauss and
Hilbert (and many, many others)

You might also like