Ebook800 pages7 hours

Machine Learning with R

Name: Machine Learning with R
Brand: Packt Publishing
Rating: 3.8 (9 reviews)

By Brett Lantz

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

In Detail

Machine learning, at its core, is concerned with transforming data into actionable knowledge. This fact makes machine learning well-suited to the present-day era of "big data" and "data science". Given the growing prominence of R—a cross-platform, zero-cost statistical programming environment—there has never been a better time to start applying machine learning. Whether you are new to data science or a veteran, machine learning with R offers a powerful set of methods for quickly and easily gaining insight from your data.

"Machine Learning with R" is a practical tutorial that uses hands-on examples to step through real-world application of machine learning. Without shying away from the technical details, we will explore Machine Learning with R using clear and practical examples. Well-suited to machine learning beginners or those with experience. Explore R to find the answer to all of your questions.

How can we use machine learning to transform data into action? Using practical examples, we will explore how to prepare data for analysis, choose a machine learning method, and measure the success of the process.

We will learn how to apply machine learning methods to a variety of common tasks including classification, prediction, forecasting, market basket analysis, and clustering. By applying the most effective machine learning methods to real-world problems, you will gain hands-on experience that will transform the way you think about data.

"Machine Learning with R" will provide you with the analytical tools you need to quickly gain insight from complex data.

Approach

Written as a tutorial to explore and understand the power of R for machine learning. This practical guide that covers all of the need to know topics in a very systematic way. For each machine learning approach, each step in the process is detailed, from preparing the data for analysis to evaluating the results. These steps will build the knowledge you need to apply them to your own data science tasks.

Who this book is for

Intended for those who want to learn how to use R's machine learning capabilities and gain insight from your data. Perhaps you already know a bit about machine learning, but have never used R; or perhaps you know a little R but are new to machine learning. In either case, this book will get you up and running quickly. It would be helpful to have a bit of familiarity with basic programming concepts, but no prior experience is required.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateOct 25, 2013

ISBN9781782162155

Author

Brett Lantz

"Brett Lantz has spent the past 10 years using innovative data methods to understand human behavior. A sociologist by training, he was first enchanted by machine learning while studying a large database of teenagers' social networking website profiles. Since then, he has worked on interdisciplinary studies of cellular telephone calls, medical billing data, and philanthropic activity, among others. When he's not spending time with family, following college sports, or being entertained by his dachshunds, he maintains dataspelunking.com, a website dedicated to sharing knowledge about the search for insight in data."

Related to Machine Learning with R

Related ebooks

Skip carousel

Learning Predictive Analytics with Python
Ebook
Learning Predictive Analytics with Python
byKumar Ashish
Rating: 0 out of 5 stars
0 ratings
R for Data Science
Ebook
R for Data Science
byDan Toomey
Rating: 5 out of 5 stars
5/5
Principles of Data Science
Ebook
Principles of Data Science
bySinan Ozdemir
Rating: 4 out of 5 stars
4/5
Mastering Python for Data Science
Ebook
Mastering Python for Data Science
bySamir Madhavan
Rating: 3 out of 5 stars
3/5
Mastering Machine Learning with R
Ebook
Mastering Machine Learning with R
byLesmeister Cory
Rating: 0 out of 5 stars
0 ratings
Learning Predictive Analytics with R
Ebook
Learning Predictive Analytics with R
byMayor Eric
Rating: 0 out of 5 stars
0 ratings
Learning R Programming
Ebook
Learning R Programming
byKun Ren
Rating: 5 out of 5 stars
5/5
Mastering Predictive Analytics with R
Ebook
Mastering Predictive Analytics with R
byRui Miguel Forte
Rating: 4 out of 5 stars
4/5
Python: Real-World Data Science
Ebook
Python: Real-World Data Science
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Learning pandas
Ebook
Learning pandas
byHeydt Michael
Rating: 4 out of 5 stars
4/5
Python Data Analysis
Ebook
Python Data Analysis
byIvan Idris
Rating: 4 out of 5 stars
4/5
Machine Learning with R, the tidyverse, and mlr
Ebook
Machine Learning with R, the tidyverse, and mlr
byHefin Rhys
Rating: 0 out of 5 stars
0 ratings
R: Recipes for Analysis, Visualization and Machine Learning
Ebook
R: Recipes for Analysis, Visualization and Machine Learning
byAtmajitsinh Gohil
Rating: 0 out of 5 stars
0 ratings
Handbook of Statistical Analysis and Data Mining Applications
Ebook
Handbook of Statistical Analysis and Data Mining Applications
byRobert Nisbet
Rating: 4 out of 5 stars
4/5
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Data Mining Applications with R
Ebook
Data Mining Applications with R
byYanchang Zhao
Rating: 4 out of 5 stars
4/5
Python: Real World Machine Learning
Ebook
Python: Real World Machine Learning
byJohn Hearty
Rating: 0 out of 5 stars
0 ratings
Mastering Scientific Computing with R
Ebook
Mastering Scientific Computing with R
byPaul Gerrard
Rating: 3 out of 5 stars
3/5
Python: Deeper Insights into Machine Learning
Ebook
Python: Deeper Insights into Machine Learning
byJohn Hearty
Rating: 0 out of 5 stars
0 ratings
Introduction to Statistical and Machine Learning Methods for Data Science
Ebook
Introduction to Statistical and Machine Learning Methods for Data Science
byCarlos Andre Reis Pinheiro
Rating: 0 out of 5 stars
0 ratings
Introducing Data Science: Big data, machine learning, and more, using Python tools
Ebook
Introducing Data Science: Big data, machine learning, and more, using Python tools
byDavy Cielen
Rating: 5 out of 5 stars
5/5
Simulation for Data Science with R
Ebook
Simulation for Data Science with R
byMatthias Templ
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Data Mining: Concepts and Techniques
Ebook
Data Mining: Concepts and Techniques
byJiawei Han
Rating: 4 out of 5 stars
4/5
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Practical Machine Learning
Ebook
Practical Machine Learning
byGollapudi Sunila
Rating: 2 out of 5 stars
2/5
Machine Learning in Action
Ebook
Machine Learning in Action
byPeter Harrington
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis Cookbook
Ebook
Practical Data Analysis Cookbook
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
Learning Social Media Analytics with R
Ebook
Learning Social Media Analytics with R
byDipanjan Sarkar
Rating: 0 out of 5 stars
0 ratings
Building Machine Learning Systems with Python
Ebook
Building Machine Learning Systems with Python
byWilli Richert
Rating: 4 out of 5 stars
4/5

Computers For You

Skip carousel

The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
Ebook
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
byJohn Adamssen
Rating: 4 out of 5 stars
4/5
The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms
Ebook
The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms
byCory Althoff
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
People Skills for Analytical Thinkers
Ebook
People Skills for Analytical Thinkers
byGilbert Eijkelenboom
Rating: 5 out of 5 stars
5/5
Network+ Study Guide & Practice Exams
Ebook
Network+ Study Guide & Practice Exams
byRobert Shimonski
Rating: 4 out of 5 stars
4/5
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
Ebook
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
byKatherine Johnson Martinko
Rating: 0 out of 5 stars
0 ratings
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 4 out of 5 stars
4/5
Going Text: Mastering the Command Line
Ebook
Going Text: Mastering the Command Line
byBrian Schell
Rating: 4 out of 5 stars
4/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
Remote/WebCam Notarization : Basic Understanding
Ebook
Remote/WebCam Notarization : Basic Understanding
byJeannie Eunice Franks
Rating: 3 out of 5 stars
3/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
Master Builder Roblox: The Essential Guide
Ebook
Master Builder Roblox: The Essential Guide
byTriumph Books
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
YouTube: How to Build and Optimize Your First YouTube Channel, Marketing, SEO, Tips and Strategies for YouTube Channel Success
Ebook
YouTube: How to Build and Optimize Your First YouTube Channel, Marketing, SEO, Tips and Strategies for YouTube Channel Success
byTommy Swindali
Rating: 4 out of 5 stars
4/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
Podcast episode
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
Podcast episode
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
byDataFramed
0 ratings
0% found this document useful
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
Podcast episode
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
byDataFramed
100%
100% found this document useful
Defining Success: Metrics and KPIs - Adam Sroka
Podcast episode
Defining Success: Metrics and KPIs - Adam Sroka
byDataTalks.Club
0 ratings
0% found this document useful
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
Podcast episode
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
byMLOps.community
0 ratings
0% found this document useful
Humans in the Loop - Lina Weichbrodt
Podcast episode
Humans in the Loop - Lina Weichbrodt
byDataTalks.Club
0 ratings
0% found this document useful
051: Strategy evaluation techniques, flaws and solutions with Dave Walton: Today we’re covering a topic which can really be a concern for traders of all levels, from beginner to pro, and that is the topic of strategy evaluation. Have you ever found that real-life performance does not match expected results? Or perhaps you...
Podcast episode
051: Strategy evaluation techniques, flaws and solutions with Dave Walton: Today we’re covering a topic which can really be a concern for traders of all levels, from beginner to pro, and that is the topic of strategy evaluation. Have you ever found that real-life performance does not match expected results? Or perhaps you...
byBetter System Trader
0 ratings
0% found this document useful
Q4: Scott Sanderson – Portfolio Optimization: Risk Preferences In, Trades Out: CWT x Quantopian Pt. 4 – When one has a price model that they think will work well for forecasting returns, the next step is to actually trade it. This isn’t that simple for a variety of reasons. For one thing, you need to...
Podcast episode
Q4: Scott Sanderson – Portfolio Optimization: Risk Preferences In, Trades Out: CWT x Quantopian Pt. 4 – When one has a price model that they think will work well for forecasting returns, the next step is to actually trade it. This isn’t that simple for a variety of reasons. For one thing, you need to...
byChat With Traders
0 ratings
0% found this document useful
Conquering the Last Mile in Data - Caitlin Moorman
Podcast episode
Conquering the Last Mile in Data - Caitlin Moorman
byDataTalks.Club
0 ratings
0% found this document useful
Amazon PPC Automation with Micheal Hecker – SS012: Micheal has big data and a 5 step approach for Amazon PPC Automation to optimise and automate your ad spend. We cover keyword research and search term reports, how to calculate keyword relevance and product data optimisation along with ad group...
Podcast episode
Amazon PPC Automation with Micheal Hecker – SS012: Micheal has big data and a 5 step approach for Amazon PPC Automation to optimise and automate your ad spend. We cover keyword research and search term reports, how to calculate keyword relevance and product data optimisation along with ad group...
bySeller Sessions Amazon FBA and Private Label
0 ratings
0% found this document useful
Becoming a Data-led Professional - Arpit Choudhury
Podcast episode
Becoming a Data-led Professional - Arpit Choudhury
byDataTalks.Club
0 ratings
0% found this document useful
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
Podcast episode
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
byMLOps.community
0 ratings
0% found this document useful
The Art & Science of Finding You Top Performers: The Art & Science of Finding You Top Performers Advanced Insights into Data Analysis and Optimization with Dr. Ellis Welcome to this episode of Seller Sessions, where we dive deep into the nuanced world of data analysis and optimisation with the...
Podcast episode
The Art & Science of Finding You Top Performers: The Art & Science of Finding You Top Performers Advanced Insights into Data Analysis and Optimization with Dr. Ellis Welcome to this episode of Seller Sessions, where we dive deep into the nuanced world of data analysis and optimisation with the...
bySeller Sessions Amazon FBA and Private Label
0 ratings
0% found this document useful
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
Podcast episode
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
Data Observability - Barr Moses
Podcast episode
Data Observability - Barr Moses
byDataTalks.Club
0 ratings
0% found this document useful
Product Owners in Data Science - Anna Hannemann
Podcast episode
Product Owners in Data Science - Anna Hannemann
byDataTalks.Club
0 ratings
0% found this document useful
15: Lifecycle: A Martech Saga part 4: Picking the right MQL model: You need a good MQL model so that marketing leads make it to sales and get followed up. There are a lot of ways to define MQLs and pass them over. It’s very common to have a lead scoring model, and it’s the best way to get to build a scalable, highly auto
Podcast episode
15: Lifecycle: A Martech Saga part 4: Picking the right MQL model: You need a good MQL model so that marketing leads make it to sales and get followed up. There are a lot of ways to define MQLs and pass them over. It’s very common to have a lead scoring model, and it’s the best way to get to build a scalable, highly auto
byHumans of Martech
0 ratings
0% found this document useful
Similarities and Differences between ML and Analytics - Rishabh Bhargava
Podcast episode
Similarities and Differences between ML and Analytics - Rishabh Bhargava
byDataTalks.Club
0 ratings
0% found this document useful
Episode 76: Step 09 (Collecting AP Exam Fees)
Podcast episode
Episode 76: Step 09 (Collecting AP Exam Fees)
byCoordinated
0 ratings
0% found this document useful
Expert Strategies Unveiled for ERP Success with Stephanie Forbes: Part 1 - Contract Considerations
Podcast episode
Expert Strategies Unveiled for ERP Success with Stephanie Forbes: Part 1 - Contract Considerations
byArt of Consulting Podcast
0 ratings
0% found this document useful
Better Done Than Perfect. Using Surveys for Customer Success with Moritz Dausinger: Today we have another episode of Better Done Than Perfect. Listen in as we talk with Moritz Dausinger, founder of Refiner. Moritz shares the story behind his survey tool, when and how to survey your users, and many other tips for making the most of the survey data.
Podcast episode
Better Done Than Perfect. Using Surveys for Customer Success with Moritz Dausinger: Today we have another episode of Better Done Than Perfect. Listen in as we talk with Moritz Dausinger, founder of Refiner. Moritz shares the story behind his survey tool, when and how to survey your users, and many other tips for making the most of the survey data.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
Episode 385: A Tech Stack to Help You Beat the iOS Changes with Chris Mercer
Podcast episode
Episode 385: A Tech Stack to Help You Beat the iOS Changes with Chris Mercer
byPerpetual Traffic
0 ratings
0% found this document useful
Setting the Standard: Impact of Method Standardization in Chromatography
Podcast episode
Setting the Standard: Impact of Method Standardization in Chromatography
byThe Analytical Wavelength
0 ratings
0% found this document useful
Build Your Own Data Pipeline - Andreas Kretz
Podcast episode
Build Your Own Data Pipeline - Andreas Kretz
byDataTalks.Club
0 ratings
0% found this document useful
Analytics for a Better World - Parvathy Krishnan
Podcast episode
Analytics for a Better World - Parvathy Krishnan
byDataTalks.Club
0 ratings
0% found this document useful
Dashboards in IT Ops, and the benefits Opening up your IT infrastructure to all
Podcast episode
Dashboards in IT Ops, and the benefits Opening up your IT infrastructure to all
byTechnology Now
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
45. Using Data In Your 2nd Semester Programming
Podcast episode
45. Using Data In Your 2nd Semester Programming
byCounselor Chat Podcast
0 ratings
0% found this document useful
11: Which MCAT Materials are the Best for Me?: Session 11 Your questions, answered here on the OldPreMeds Podcast. Ryan and Rich again dive into the forums over at OldPreMeds.org where they pull a question and deliver the answers right on to you. Today, they discuss about the best resource...
Podcast episode
11: Which MCAT Materials are the Best for Me?: Session 11 Your questions, answered here on the OldPreMeds Podcast. Ryan and Rich again dive into the forums over at OldPreMeds.org where they pull a question and deliver the answers right on to you. Today, they discuss about the best resource...
byOldPreMeds Podcast
0 ratings
0% found this document useful

Skip carousel

Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
How Mature Is Your Organisation With Regards To Digital And Web Analytics?
NZ Marketing
Article
How Mature Is Your Organisation With Regards To Digital And Web Analytics?
Jun 9, 2021
1 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
How To Train Computers Faster For ‘Extreme’ Datasets
Futurity
Article
How To Train Computers Faster For ‘Extreme’ Datasets
Dec 12, 2019
4 min read
The Infrastructure of an AI Factory
Techfastly
Article
The Infrastructure of an AI Factory
Mar 3, 2021
Data is a crucial element for machine learning algorithms. It can be considered as a fuel of AI factories. Collection of useful data and feeding it into frameworks and models is the foremost step. Data acts as a case or example that the algorithms re
1 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Leadership Forum: Making Digital Transformation A Reality
Rotman Management
Article
Leadership Forum: Making Digital Transformation A Reality
Jan 1, 2018
Glenda Crisp Senior Vice President and Chief Data Officer, TD Bank Group + Connie Bonello Associate Partner, Financial Services, IBM Canada IN MOST OF TODAY’S ORGANIZATIONS, data underpins every transaction, operation and interaction. And yet, the ab
8 min read
Q&A
Rotman Management
Article
Q&A
May 1, 2023
Describe the capability that companies like Netflix, UPS, Amazon and Caesars Entertainment have in common. These are all leading firms in their industries with respect to leveraging analytics as a source of competitive advantage. We now have so much
7 min read
Pivoting To First-party Data
NZ Marketing
Article
Pivoting To First-party Data
Jun 9, 2021
5 min read
There’s A New Career In Town
True Love
Article
There’s A New Career In Town
Oct 21, 2019
2 min read
Help Yourself To Avoid These Pitfalls
MacLife
Article
Help Yourself To Avoid These Pitfalls
Dec 11, 2018
GETTING UP TO full speed with the Shortcuts app takes time, and you’ll inevitably make a few mistakes along the way. Having to troubleshoot your efforts doesn’t mean you’ve failed — with years of experience, even professional programmers do this. Tak
2 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
Data Driven Trails
Australian Mountain Bike
Article
Data Driven Trails
Mar 20, 2022
2 min read
Creating Value From Data: A MONETIZATION FRAMEWORK
Rotman Management
Article
Creating Value From Data: A MONETIZATION FRAMEWORK
Sep 1, 2023
9 min read
LIKE A PRO… Master Of Technology
Cycling Plus
Article
LIKE A PRO… Master Of Technology
Aug 5, 2020
2 min read
Professor Newman On... The Trials Of Product Testing
Amateur Photographer
Article
Professor Newman On... The Trials Of Product Testing
Jun 25, 2019
Pity the product reviewer. Whatever they do, there will be a group of people who will insist that they have been unfair to their favoured product. One wonders why, if someone already knows how good a product is, they would be consulting a review, but
2 min read
Essentially There Many Fundamental Questions
The European Business Review
Article
Essentially There Many Fundamental Questions
May 25, 2021
1 What we are trying to assess? The answer appears to be no: selectors are still interested in an individual’s ability, personality and motivation as well as their integrity and health. Whilst new concepts appear every so often (e.g agility, resilien
1 min read
Conquering Compliance Issues
Shop Talk
Article
Conquering Compliance Issues
Jun 1, 2022
3 min read
Analytics Adolescent Or Mature Metrics Master?
NZ Marketing
Article
Analytics Adolescent Or Mature Metrics Master?
Jun 25, 2020
An analytics company that specialise in advanced web analytics, Absolute Analytics, operates on the premise that every business has the right to actionable data and measurable results. From audit stage and set-up to training and reporting, these Alph
4 min read
Federated Learning Uses The Data Right On Our Devices
Futurity
Article
Federated Learning Uses The Data Right On Our Devices
Jul 21, 2022
2 min read
How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
Better Together: Behavioural Science + Data Science
Rotman Management
Article
Better Together: Behavioural Science + Data Science
May 1, 2020
IMAGINE THIS SCENARIO: You are designing a new customer experience to drive a shift in customer behaviour. You have reviewed the reports and dashboards describing current behaviour. You have asked customers how they felt and incorporated their feedba
5 min read
How Spooky Science Helps Us Peer Inside The Planets
All About Space
Article
How Spooky Science Helps Us Peer Inside The Planets
Dec 3, 2020
An assistant professor of computational science at the EPFL research centre in Lausanne, Switzerland, involved in the current research on metallic hydrogen. Could you explain how the machine-learning techniques used in your research work? Why were th
1 min read
Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
The Race To Exascale Supercomputers
Maximum PC
Article
The Race To Exascale Supercomputers
Jun 21, 2022
9 min read
The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
CULTURE SHIFT – An Indispensable Shift To Building An AI-Powered Organisation
Techfastly
Article
CULTURE SHIFT – An Indispensable Shift To Building An AI-Powered Organisation
May 3, 2021
5 min read

Related categories

Skip carousel

Reviews for Machine Learning with R

Rating: 3.7777777777777777 out of 5 stars

4/5

9 ratings0 reviews

Book preview

Machine Learning with R - Brett Lantz

Machine Learning with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Introducing Machine Learning

The origins of machine learning

Uses and abuses of machine learning

Ethical considerations

How do machines learn?

Abstraction and knowledge representation

Generalization

Assessing the success of learning

Steps to apply machine learning to your data

Choosing a machine learning algorithm

Thinking about the input data

Thinking about types of machine learning algorithms

Matching your data to an appropriate algorithm

Using R for machine learning

Installing and loading R packages

Installing an R package

Installing a package using the point-and-click interface

Loading an R package

Summary

2. Managing and Understanding Data

R data structures

Vectors

Factors

Lists

Data frames

Matrixes and arrays

Managing data with R

Saving and loading R data structures

Importing and saving data from CSV files

Importing data from SQL databases

Exploring and understanding data

Exploring the structure of data

Exploring numeric variables

Measuring the central tendency – mean and median

Measuring spread – quartiles and the five-number summary

Visualizing numeric variables – boxplots

Visualizing numeric variables – histograms

Understanding numeric data – uniform and normal distributions

Measuring spread – variance and standard deviation

Exploring categorical variables

Measuring the central tendency – the mode

Exploring relationships between variables

Visualizing relationships – scatterplots

Examining relationships – two-way cross-tabulations

Summary

3. Lazy Learning – Classification Using Nearest Neighbors

Understanding classification using nearest neighbors

The kNN algorithm

Calculating distance

Choosing an appropriate k

Preparing data for use with kNN

Why is the kNN algorithm lazy?

Diagnosing breast cancer with the kNN algorithm

Step 1 – collecting data

Step 2 – exploring and preparing the data

Transformation – normalizing numeric data

Data preparation – creating training and test datasets

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Transformation – z-score standardization

Testing alternative values of k

Summary

4. Probabilistic Learning – Classification Using Naive Bayes

Understanding naive Bayes

Basic concepts of Bayesian methods

Probability

Joint probability

Conditional probability with Bayes' theorem

The naive Bayes algorithm

The naive Bayes classification

The Laplace estimator

Using numeric features with naive Bayes

Example – filtering mobile phone spam with the naive Bayes algorithm

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – processing text data for analysis

Data preparation – creating training and test datasets

Visualizing text data – word clouds

Data preparation – creating indicator features for frequent words

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Summary

5. Divide and Conquer – Classification Using Decision Trees and Rules

Understanding decision trees

Divide and conquer

The C5.0 decision tree algorithm

Choosing the best split

Pruning the decision tree

Example – identifying risky bank loans using C5.0 decision trees

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – creating random training and test datasets

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Boosting the accuracy of decision trees

Making some mistakes more costly than others

Understanding classification rules

Separate and conquer

The One Rule algorithm

The RIPPER algorithm

Rules from decision trees

Example – identifying poisonous mushrooms with rule learners

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Summary

6. Forecasting Numeric Data – Regression Methods

Understanding regression

Simple linear regression

Ordinary least squares estimation

Correlations

Multiple linear regression

Example – predicting medical expenses using linear regression

Step 1 – collecting data

Step 2 – exploring and preparing the data

Exploring relationships among features – the correlation matrix

Visualizing relationships among features – the scatterplot matrix

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Model specification – adding non-linear relationships

Transformation – converting a numeric variable to a binary indicator

Model specification – adding interaction effects

Putting it all together – an improved regression model

Understanding regression trees and model trees

Adding regression to trees

Example – estimating the quality of wines with regression trees and model trees

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Visualizing decision trees

Step 4 – evaluating model performance

Measuring performance with mean absolute error

Step 5 – improving model performance

Summary

7. Black Box Methods – Neural Networks and Support Vector Machines

Understanding neural networks

From biological to artificial neurons

Activation functions

Network topology

The number of layers

The direction of information travel

The number of nodes in each layer

Training neural networks with backpropagation

Modeling the strength of concrete with ANNs

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Understanding Support Vector Machines

Classification with hyperplanes

Finding the maximum margin

The case of linearly separable data

The case of non-linearly separable data

Using kernels for non-linear spaces

Performing OCR with SVMs

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Summary

8. Finding Patterns – Market Basket Analysis Using Association Rules

Understanding association rules

The Apriori algorithm for association rule learning

Measuring rule interest – support and confidence

Building a set of rules with the Apriori principle

Example – identifying frequently purchased groceries with association rules

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – creating a sparse matrix for transaction data

Visualizing item support – item frequency plots

Visualizing transaction data – plotting the sparse matrix

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Sorting the set of association rules

Taking subsets of association rules

Saving association rules to a file or data frame

Summary

9. Finding Groups of Data – Clustering with k-means

Understanding clustering

Clustering as a machine learning task

The k-means algorithm for clustering

Using distance to assign and update clusters

Choosing the appropriate number of clusters

Finding teen market segments using k-means clustering

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – dummy coding missing values

Data preparation – imputing missing values

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Summary

10. Evaluating Model Performance

Measuring performance for classification

Working with classification prediction data in R

A closer look at confusion matrices

Using confusion matrices to measure performance

Beyond accuracy – other measures of performance

The kappa statistic

Sensitivity and specificity

Precision and recall

The F-measure

Visualizing performance tradeoffs

ROC curves

Estimating future performance

The holdout method

Cross-validation

Bootstrap sampling

Summary

11. Improving Model Performance

Tuning stock models for better performance

Using caret for automated parameter tuning

Creating a simple tuned model

Customizing the tuning process

Improving model performance with meta-learning

Understanding ensembles

Bagging

Boosting

Random forests

Training random forests

Evaluating random forest performance

Summary

12. Specialized Machine Learning Topics

Working with specialized data

Getting data from the Web with the RCurl package

Reading and writing XML with the XML package

Reading and writing JSON with the rjson package

Reading and writing Microsoft Excel spreadsheets using xlsx

Working with bioinformatics data

Working with social network data and graph data

Improving the performance of R

Managing very large datasets

Making data frames faster with data.table

Creating disk-based data frames with ff

Using massive matrices with bigmemory

Learning faster with parallel computing

Measuring execution time

Working in parallel with foreach

Using a multitasking operating system with multicore

Networking multiple workstations with snow and snowfall

Parallel cloud computing with MapReduce and Hadoop

GPU computing

Deploying optimized learning algorithms

Building bigger regression models with biglm

Growing bigger and faster random forests with bigrf

Training and evaluating models in parallel with caret

Summary

Index

Machine Learning with R

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: October 2013

Production Reference: 1211013

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78216-214-8

www.packtpub.com

Cover Image by Abhishek Pandey (<abhishek.pandey1210@gmail.com>)

Credits

Author

Brett Lantz

Reviewers

Jia Liu

Mzabalazo Z. Ngwenya

Abhinav Upadhyay

Acquisition Editor

James Jones

Lead Technical Editor

Azharuddin Sheikh

Technical Editors

Pooja Arondekar

Pratik More

Anusri Ramchandran

Harshad Vairat

Project Coordinator

Anugya Khurana

Proofreaders

Simran Bhogal

Ameesha Green

Paul Hindle

Indexer

Tejal Soni

Graphics

Ronak Dhruv

Production Coordinator

Nilesh R. Mohite

Cover Work

Nilesh R. Mohite

About the Author

Brett Lantz has spent the past 10 years using innovative data methods to understand human behavior. A sociologist by training, he was first enchanted by machine learning while studying a large database of teenagers' social networking website profiles. Since then, he has worked on interdisciplinary studies of cellular telephone calls, medical billing data, and philanthropic activity, among others. When he's not spending time with family, following college sports, or being entertained by his dachshunds, he maintains dataspelunking.com, a website dedicated to sharing knowledge about the search for insight in data.

This book could not have been written without the support of my family and friends. In particular, my wife Jessica deserves many thanks for her patience and encouragement throughout the past year. My son Will (who was born while Chapter 10 was underway), also deserves special mention for his role in the writing process; without his gracious ability to sleep through the night, I could not have strung together a coherent sentence the next morning. I dedicate this book to him in the hope that one day he is inspired to follow his curiosity wherever it may lead.

I am also indebted to many others who supported this book indirectly. My interactions with educators, peers, and collaborators at the University of Michigan, the University of Notre Dame, and the University of Central Florida seeded many of the ideas I attempted to express in the text. Additionally, without the work of researchers who shared their expertise in publications, lectures, and source code, this book might not exist at all. Finally, I appreciate the efforts of the R team and all those who have contributed to R packages, whose work ultimately brought machine learning to the masses.

About the Reviewers

Jia Liu holds a Master's degree in Statistics from the University of Maryland, Baltimore County, and is presently a PhD candidate in statistics from Iowa State University. Her research interests include mixed-effects model, Bayesian method, Boostrap method, reliability, design of experiments, machine learning and data mining. She has two year's experience as a student consultant in statistics and two year's internship experience in agriculture and pharmaceutical industry.

Mzabalazo Z. Ngwenya has worked extensively in the field of statistical consulting and currently works as a biometrician. He holds an MSc in Mathematical Statistics from the University of Cape Town and is at present studying for a PhD (at the School of Information Technology, University of Pretoria), in the field of Computational Intelligence. His research interests include statistical computing, machine learning, and spatial statistics. Previously, he was involved in reviewing Learning RStudio for R Statistical Computing (Van de Loo and de Jong, 2012), and R Statistical Application Development by Example beginner's guide (Prabhanjan Narayanachar Tattar , 2013).

Abhinav Upadhyay finished his Bachelor's degree in 2011 with a major in Information Technology. His main areas of interest include machine learning and information retrieval.

In 2011, he worked for the NetBSD Foundation as part of the Google Summer of Code program. During that period, he wrote a search engine for Unix manual pages. This project resulted in a new implementation of the apropos utility for NetBSD.

Currently, he is working as a Development Engineer for SocialTwist. His day-to-day work involves writing system level tools and frameworks to manage the product infrastructure.

He is also an open source enthusiast and quite active in the community. In his free time, he maintains and contributes to several open source projects.

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Copy and paste, print and bookmark content

On demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

Machine learning, at its core, is concerned with algorithms that transform information into actionable intelligence. This fact makes machine learning well-suited to the present day era of Big Data. Without machine learning, it would be nearly impossible to keep up with the massive stream of information.

Given the growing prominence of R—a cross-platform, zero-cost statistical programming environment—there has never been a better time to start using machine learning. R offers a powerful but easy-to-learn set of tools that can assist you with finding data insights.

By combining hands-on case studies with the essential theory that you need to understand how things work under the hood, this book provides all the knowledge that you will need to start applying machine learning to your own projects.

What this book covers

Chapter 1, Introducing Machine Learning, presents the terminology and concepts that define and distinguish machine learners, as well as a method for matching a learning task with the appropriate algorithm.

Chapter 2, Managing and Understanding Data, provides an opportunity to get your hands dirty working with data in R. Essential data structures and procedures used for loading, exploring, and understanding data are discussed.

Chapter 3, Lazy Learning – Classification Using Nearest Neighbors, teaches you how to understand and apply a simple yet powerful learning algorithm to your first machine learning task: identifying malignant samples of cancer.

Chapter 4, Probabilistic Learning – Classification Using Naive Bayes, reveals the essential concepts of probability that are used in cutting-edge spam filtering systems. You'll learn the basics of text mining in the process of building your own spam filter.

Chapter 5, Divide and Conquer – Classification Using Decision Trees and Rules, explores a couple of learning algorithms whose predictions are not only accurate but easily explained. We'll apply these methods to tasks where transparency is important.

Chapter 6, Forecasting Numeric Data – Regression Methods, introduces machine learning algorithms used for making numeric predictions. As these techniques are heavily embedded in the field of statistics, you will also learn the essential metrics needed to make sense of numeric relationships.

Chapter 7, Black Box Methods – Neural Networks and Support Vector Machines, covers two extremely complex yet powerful machine learning algorithms. Though the mathematics may appear intimidating, we will work through examples that illustrate their inner workings in simple terms.

Chapter 8, Finding Patterns – Market Basket Analysis Using Association Rules, exposes the algorithm for the recommendation systems used at many retailers. If you've ever wondered how retailers seem to know your purchasing habits better than you know them yourself, this chapter will reveal their secrets.

Chapter 9, Finding Groups of Data – Clustering with k-means, is devoted to a procedure that locates clusters of related items. We'll utilize this algorithm to identify segments of profiles within a web-based community.

Chapter 10, Evaluating Model Performance, provides information on measuring the success of a machine learning project, and obtaining a reliable estimate of the learner's performance on future data.

Chapter 11, Improving Model Performance, reveals the methods employed by the teams found at the top of machine learning competition leader boards. If you have a competitive streak, or simply want to get the most out of your data, you'll need to add these techniques to your repertoire.

Chapter 12, Specialized Machine Learning Topics, explores the frontiers of machine learning. From working with Big Data to making R work faster, the topics covered will help you push the boundaries of what is possible with R.

What you need for this book

The examples in this book were written for and tested with R Version 2.15.3 on both Microsoft Windows and Mac OS X, though they are likely to work with any recent version of R.

Who this book is for

This book is intended for anybody hoping to use data for action. Perhaps you already know a bit about machine learning, but have never used R; or perhaps you know a little R but are new to machine learning. In any case, this book will get you up and running quickly. It would be helpful to have a bit of familiarity with basic math and programming concepts, but no prior experience is required. You need only curiosity.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: To fit a linear regression model to data with R, the lm() function can be used.

Any command-line input or output is written as follows:

> pairs.panels(insurance[c(age, bmi, children, charges)])

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: Instead, ham messages use words such as can, sorry, need, and time.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <feedback@packtpub.com>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <questions@packtpub.com> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Introducing Machine Learning

If science fiction stories are to be believed, teaching machines to learn will inevitably lead to apocalyptic wars between machines and their makers. In the early stages, computers are taught to play simple games of tic-tac-toe and chess. Later, machines are given control of traffic lights and communications, followed by military drones and missiles. The machines' evolution takes an ominous turn once the computers become sentient and learn how to teach themselves. Having no more need for human programmers, humankind is then deleted.

Thankfully, at the time of this writing, machines still require user input.

Your impressions of machine learning may be very heavily influenced by these types of mass media depictions of artificial intelligence. And even though there may be a hint of truth to such tales; in reality, machine learning is focused on more practical applications. The task of teaching a computer to learn is tied more closely to a specific problem that would be a computer that can play games, ponder philosophy, or answer trivial questions. Machine learning is more like training an employee than raising a child.

Putting these stereotypes aside, by the end of this chapter, you will have gained a far more nuanced understanding of machine learning. You will be introduced to the fundamental concepts that define and differentiate the most commonly used machine learning approaches.

You will learn:

The origins and practical applications of machine learning

How knowledge is defined and represented by computers

The basic concepts that differentiate machine learning approaches

In a single sentence, you could say that machine learning provides a set of tools that use computers to transform data into actionable knowledge. To learn more about how the process works, read on.

The origins of machine learning

Since birth, we are inundated with data. Our body's sensors—the eyes, ears, nose, tongue, and nerves—are continually assailed with raw data that our brain translates into sights, sounds, smells, tastes, and textures. Using language, we are able to share these experiences with others.

The earliest databases recorded information from the observable environment. Astronomers recorded patterns of planets and stars; biologists noted results from experiments crossbreeding plants and animals; and cities recorded tax payments, disease outbreaks, and populations. Each of these required a human being to first observe and second, record the observation. Today, such observations are increasingly automated and recorded systematically in ever-growing computerized databases.

The invention of electronic sensors has additionally contributed to an increase in the richness of recorded data. Specialized sensors see, hear, smell, or taste. These sensors process the data far differently than a human being would, and in many ways, this is a benefit. Without the need for translation into human language, the raw sensory data remains objective.

Tip

It is important to note that although a sensor does not have a subjective component to its observations, it does not necessarily report truth (if such a concept can be defined). A camera taking photographs in black and white might provide a far different depiction of its environment than one shooting pictures in color. Similarly, a microscope provides a far different depiction of reality than a telescope.

Between databases and sensors, many aspects of our lives are recorded. Governments, businesses, and individuals are recording and reporting all manners of information from the monumental to the mundane. Weather sensors record temperature and pressure data, surveillance cameras watch sidewalks and subway tunnels, and all manner of electronic behaviors are monitored: transactions, communications, friendships, and many others.

This deluge of data has led some to state that we have entered an era of Big Data, but this may be a bit of a misnomer. Human beings have always been surrounded by data. What makes the current era unique is that we have easy data. Larger and more interesting data sets are increasingly accessible through the tips of our fingers, only a web search away. We now live in a period with vast quantities of data that can be directly processed by machines. Much of this information has the potential to inform decision making, if only there was a systematic way of making sense from it all.

The field of study interested in the development of computer algorithms for transforming data into intelligent action is known as machine learning. This field originated in an environment where the available data, statistical methods, and computing power rapidly and simultaneously evolved. Growth in data necessitated additional computing power, which in turn spurred the development of statistical methods for analyzing large datasets. This created a cycle of advancement allowing even larger and more interesting data to be collected.

A closely related sibling of machine learning, data mining, is concerned with the generation of novel insight from large databases (not to be confused with the pejorative term data mining, describing the practice of cherry-picking data to support a theory). Although there is some disagreement over how widely the two fields overlap, a potential point of distinction is that machine learning tends to be focused on performing a known task, whereas data mining is about the search for hidden nuggets of information. For instance, you might use machine learning to teach a robot to drive a car, whereas you would utilize data mining to learn what type of cars are the safest.

Tip

Machine learning algorithms are virtually a prerequisite for data mining but the opposite is not true. In other words, you can apply machine learning to tasks that do not involve data mining, but if you are using data mining methods, you are almost certainly using machine learning.

Uses and abuses of machine learning

At its core, machine learning is primarily interested in making sense of complex data. This is a broadly applicable mission, and largely application agnostic. As you might expect, machine learning is used widely. For instance, it has been used to:

Predict the outcomes of elections

Identify and filter spam messages from e-mail

Foresee criminal activity

Automate traffic signals according to road conditions

Produce financial estimates of storms and natural disasters

Examine customer churn

Create auto-piloting planes and auto-driving cars

Identify individuals with the capacity to donate

Target advertising to specific types of consumers

For now, don't worry about exactly how the machines learn to perform these tasks; we will get into the specifics later. But across each of these contexts, the process is the same. A machine learning algorithm takes data and identifies patterns that can be used for action. In some cases, the results are so successful that they seem to reach near-legendary status.

One possibly apocryphal tale is of a large retailer in the United States, which employed machine learning to identify expectant mothers for targeted coupon mailings. If mothers-to-be were targeted with substantial discounts, the retailer hoped they would become loyal customers who would then continue to purchase profitable items like diapers, formula, and toys.

By applying machine learning methods to purchase data, the retailer believed it had learned some useful patterns. Certain items, such as prenatal vitamins, lotions, and washcloths could be used to identify with a high degree of certainty not only whether a woman was pregnant, but also when the baby was due.

After using this data for a promotional mailing, an angry man contacted the retailer and demanded to know why his teenage daughter was receiving coupons for maternity items. He was furious that the merchant seemed to be encouraging teenage pregnancy. Later on, as a manager called to offer an apology, it was the father that ultimately apologized; after confronting his daughter, he had discovered that she was indeed pregnant.

Whether completely true or not, there is certainly an element of truth to the preceding tale. Retailers, do in fact, routinely analyze their customers' transaction data. If you've ever used a shopper's loyalty card at your grocer, coffee shop, or another retailer, it is likely that your purchase data is being used for machine learning.

Retailers use machine learning methods for advertising, targeted promotions, inventory management, or the layout of the items in the store. Some retailers have even equipped checkout lanes with devices that print coupons for promotions based on the items in the current transaction. Websites also routinely do this to serve advertisements based on your web browsing history. Given the data from many individuals, a machine learning algorithm learns typical patterns of behavior that can then be used to make recommendations.

Despite being familiar with the machine learning methods working behind the scenes, it still feels a bit like magic when a retailer or website seems to know me better than I know myself. Others may be less thrilled to discover that their data is being used in this manner. Therefore, any person wishing to utilize machine learning or data mining would be remiss not to at least briefly consider the ethical implications of the art.

Ethical considerations

Due to the relative youth of machine learning as a discipline and the speed at which it is progressing, the associated legal issues and social norms are often quite uncertain and constantly in flux. Caution should be exercised when obtaining or analyzing data in order to avoid breaking laws, violating terms of service or data use agreements, abusing the trust, or violating privacy of the customers or the public.

Tip

The informal corporate motto of Google, an organization, which collects perhaps more data on individuals than any other, is don't be evil. This may serve as a reasonable starting point for forming your own ethical guidelines, but it may not be sufficient.

Certain jurisdictions may prevent you from using racial, ethnic, religious, or other protected class data for business reasons, but keep in mind that excluding this data from your analysis may not be enough—machine learning algorithms might inadvertently learn this information independently. For instance, if a certain segment of people generally live in a certain region, buy a certain product, or otherwise behave in a way that uniquely identifies them as a group, some machine learning algorithms can infer the protected information from seemingly innocuous data. In such cases, you may need to fully de-identify these people by excluding any potentially identifying data in addition to the protected information.

Apart from the legal consequences, using data inappropriately may hurt your bottom line. Customers may feel uncomfortable or become spooked if aspects of their lives they consider private are made public. Recently, several high-profile web applications have experienced a mass exodus of users who felt exploited when the applications' terms of service agreements changed and their data was used for purposes beyond what the users had originally agreed upon. The fact that privacy expectations differ by context, by age cohort, and by locale, adds complexity to deciding the appropriate use of personal data. It would be wise to consider the cultural implications of your work before you begin on your project.

Tip

The fact that you can use data for a particular end does not always mean that you should.

How do machines learn?

A commonly cited formal definition of machine learning, proposed by computer scientist Tom M. Mitchell, says that a machine is said to learn if it is able to take experience and utilize it such that its performance improves up on similar experiences in the future. This definition is fairly exact, yet says little about how machine learning techniques actually learn to transform data into actionable knowledge.

Tip

Although it is not strictly necessary to understand the theoretical basis of machine learning prior to using it, this foundation provides an insight into the distinctions among machine learning algorithms. Because machine learning algorithms are modeled in many ways on human minds, you may even discover yourself examining your own mind in a different light.

Regardless of whether the learner is a human or a machine, the basic learning process is similar. It can be divided into three components as follows:

Data input: It utilizes observation, memory storage, and recall to provide a factual basis for further reasoning.

Abstraction: It involves the translation of data into broader representations.

Generalization: It uses abstracted data to form a basis for action.

To better understand the learning process, think about the last time you studied for a difficult test, perhaps for a university final exam or a career certification. Did you wish for an eidetic (that is, photographic) memory? If so, you may be disappointed to learn that perfect recall is unlikely to save you much effort. Without a higher understanding, your knowledge is limited exactly to the data input, meaning only what you had seen before and nothing more. Therefore, without knowledge of all the questions that could appear on the exam, you would be stuck attempting to memorize answers to every question that could conceivably be asked. Obviously, this is an unsustainable strategy.

Instead, a better strategy is to spend time selectively managing only a smaller set of key ideas. The commonly used learning strategies of

Enjoying the preview?

Page 1 of 1

Machine Learning with R

About this ebook

Brett Lantz

Read more from Brett Lantz

Related authors

Related to Machine Learning with R

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Machine Learning with R

What did you think?

Book preview

Machine Learning with R - Brett Lantz

Table of Contents

Machine Learning with R

Machine Learning with R

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers and more

Why Subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Chapter 1. Introducing Machine Learning

The origins of machine learning

Tip

Tip

Uses and abuses of machine learning

Ethical considerations

Tip

Tip

How do machines learn?

Tip