You are on page 1of 6

Original Work Product Progress Assessment

Date: April 16, 2018

Introduction and Statement of Purpose:

The purpose of this Original Work Project is to familiarize myself with Data
Programming processes and principles as well as foundational Machine Learning. I will
do this by working with NCAA basketball tournament data. The product itself will
contain three components: spreadsheet analysis, visualizations, and an application
utilizing Machine Learning to make predictions. Through working on this project, I
hope to build upon my programming skills, gain an understanding of Machine
Learning, and learn what it is like to be a Software Developer. Furthermore, I am eager
to gain experience in data programming.
Spreadsheet Analysis:
Objective

Spreadsheet analysis consists of analyzing and understanding the data available


by looking at the specific fields. This assisted me in noting what fields I have to use and
allowed me to familiarize myself with the data.
Materials

 Google Sheets
 BigQuery
Process and Reflections

My first assignment was to create a Data Dictionary. I learned that a data


dictionary defined the type of each variable and had a description about the variable’s
purpose. I realized that
data dictionaries
increase the readability
of the data and
eventually allow more
people to use the data
while also increasing
the programmers’
familiarity with the
data. Moreover, looking
at the different fields
forced me to research
basketball terminology
relevant to
my project
and guided
me. Also, I
learned that
it is
important to
look at how the fields are named so as to
maintain the case throughout the program.
Challenges I came across were that at times
values were not assigned to the type I would
have assigned them to. Furthermore, this
was a relatively time-consuming procedure
due to the number of fields. The next step
was to look at the data utilizing Big Query.
In order to complete these tasks, I worked
on building a foundational knowledge of
SQL as BigQuery utilizes SQL. In BigQuery,
my primary task was to look at specific data
points by limiting the query size. After
looking through data sets in BigQuery, I was
astonished at the size. I utilized simple SQL commands to limit the number of points
and limit the fields covered in my table. This allowed for me to look at smaller amounts
of data.
Graphical Representations:
Objective

Creating graphical representations revolves around utilizing software to create


visuals that allow focusing on patterns between certain fields. This step allows
formulating a deeper understanding of the data and noticing correlations. This allows
looking closer at smaller amounts of data.
Materials

 Google Sheets
 BigQuery
 Cloud Data Studio
Process and Reflections
This part of the
project was relatively
undefined as I simply
had to create graphical
representations
utilizing Sheets,
BigQuery, and Cloud
Data Studio. For one
specific example, while experimenting
with BigQuery, I was curious how the
number of empty seats affects the points
scored. I hypothesized that a greater
number of empty seats would correlate to
lower amounts of points scored as it could
mean the game is not as significant or that
the players are not as supported. To test
this, I used SQL to create a table that
focuses on
the empty
seats and
points.
However,
looking at the
table told me
minimal
information
and confused
me as I could
not figure out
any
correlation.
This
prompted me
to export the
table of values I found in BigQuery and create a scatterplot on Google Sheets. This
expressed to me that there is no apparent correlation between the points scored and
fullness of the venue. After looking at this chart, I realize I may have gotten more
accurate results had I considered factors such as the percent of empty seats. Larger
venues may have more empty seats; however, they may look fuller. Additionally, after
speaking with my mentor about the lack of results, he explained to me that a lot of
times audience sizes decrease as teams lose. So, better teams or later games end up
fewer amounts of people watching. Therefore, there really is no correlation between
success and games. Even though this experiment did not work out, I learned to be more
specific with my queries, which entails that I become even more comfortable with SQL.
Also, I learned that context is extremely important before formulating a query.
Following working with BigQuery and Sheets, I started working with Cloud Data
Studio. The first chart I made was a map with the points scored from each region. Mr.
Truty thought this chart was interesting and looked to check if the visuals seemed
viable. This struck me as fascinating as I created the charts and immediately went to
analysis. I did not go through the plots on the chart and analyze if the actual data points
made sense. Luckily, they did. Better teams had larger dots corresponding to more
points. However, I learned that the disparity in the size of the dots could also be
accounted for the fact that better teams play in more games through the tournaments-
so they accumulate more points. This observation made by my mentor showed me the
detail with which I have to look at charts. It is equally important to reason why there
are patterns as it is to observe the patterns. Furthermore, I observed that my mentor
also looked at how the chart was generated from the given data. This is another practice
I should start as it could help me understand how to program certain logic as well as if
the results are accurate. My second chart looked at whether or not Pacific 12 teams
performed better in conference games. My mentor was particularly interested in this
one as it showed an interesting correlation of teams performing better during
conference games with a couple of exceptions. Furthermore, the difference was by a few
thousand points; thus, adding significance to the correlation. At the same time, Mr.
Truty suggested I add more columns- such as school name. He explained this would
better allow for me to look at the data. My last chart depicted the lowest attendances
based on state. This was probably the least interesting chart; however, the results made
sense. Now, I will be programmatically constructing a table with geolocation and
distance fields. I was extremely confused as to where to start, so I looked at the
available data. This helped me realize that I could utilize the school and venue names to
identify their respective geolocations and then develop my own algorithm for a distance
that best suits my purposes. After that, I could not identify the most apt API, so Mr.
Truty assisted me by suggesting I use the Geolocation API. I originally was going
towards the Distance Matrix API, so as to avoid working with geolocation. However, I
learned that formulating an algorithm and calculating distance from geolocations
would be better as I would be certain as to how the distance was calculated. Also, I
learned that good tables contain the data for the intermediate steps, so I should not
avoid utilizing the geolocations. Furthermore, Mr. Truty suggested I utilize the Python
Client API as it is the simplest to work with and that I code using command prompt. In
order to successfully complete this step, I had to work on coding in Python and research
the steps associated with utilizing the Geolocation API. I also needed to work on getting
accustomed to utilizing command prompt to code. I thought coding on command
prompt was going to be extraordinarily difficult as I thought it would require many
statements prior to the actual code. However, through some researching, I soon realized
that all I have to do is type “Python”. Overall, I find it extremely convenient as I do not
need to have
an IDE. After
getting
accustomed
to the
Command
Prompt
interface, I
started
researching
how to
connect my
code to the
API.
Following this, I used the API to look at some basic methods and asked my mentor for
help with connecting my work to a spreadsheet and storing my command prompt code
in a file. Next, I will code the actual table and attempt using batch programming in
order to identify the distances for multiple teams and games.

Machine Learning Application:


Objective

This part of the project is based on using Machine Learning APIs along with the
data to program a Mobile Application that allows users to attain predications by
putting in hypothetical data.
Materials

 API- not sure yet


 XCode
Process and Reflections

I have not yet begun this portion of my final product. The first step will be to
identify an appropriate API in order to assist myself in coding. After that, I will have to
research how to utilize the API from my application and design the application. Finally,
I will code the application and edit it so that it works across a variety of devices. All of
this will be done through XCode. When completed, the application will allow users to
put in data for a particular team and look at the effects of that data.
Conclusion

Although this final product has been and will be challenging, I am eager to learn as
much as possible and work hard on the project. I am currently about 40 percent there
and will work towards completion well before Final Presentation Night. I am certain
that with commitment the application will appear professional and work as I intend it
to. Additionally, I am sure the charts will seem excellent, profound, and well-thought-
out. Furthermore, I yearn put a lot of effort into the product so that the application gives
insightful results with close to accurate success rates and will watch the application to
track its accuracy for future tournaments.

So far, this project has been allowing me to explore areas of programming that I
have not yet had much experience with- Data Programming and Machine Learning.
Exposure to both these fields is especially apt as both are growing. Therefore, early
exposure will allow me to be successful in college and beyond. I am optimistic that due
to my learning through ISM and my internal passion for this topic, I will be able to go
far in the Software industry.

Even though I may not publish this application for the public, I am still going to
be putting my best efforts into the project. My product will reflect meticulousness as
well as months of dedicated focus. Additionally, I will be sharing the steps along with
my mentor on the Google blogs so as to enable more students to explore data
programming. Furthermore, since I will be utilizing the same resources in the future
when I work in the Software industry; I want my program to be something I can look
back at for quick reference. Therefore, I will ensure to code neatly and have my work
organized.

In conclusion, I am ecstatic about this truly marvelous learning opportunity. I am


glad to have gotten to work with such a passionate and dedicated mentor and am glad
that that I got to start my exploration of Machine Learning and Data Programming. I
hope to have a final product that will help me illustrate the multitude of information I
have gathered as well as my passion towards the Technical industry.

You might also like