Ebook477 pages2 hours

Mastering Python for Data Science

Name: Mastering Python for Data Science
Brand: Packt Publishing
Rating: 3.0 (1 reviews)

By Samir Madhavan

Rating: 3 out of 5 stars

3/5

()

Read preview

About this ebook

Explore the world of data science through Python and learn how to make sense of data

About This Book

Master data science methods using Python and its libraries
Create data visualizations and mine for patterns
Advanced techniques for the four fundamentals of Data Science with Python - data mining, data analysis, data visualization, and machine learning

Who This Book Is For

If you are a Python developer who wants to master the world of data science then this book is for you. Some knowledge of data science is assumed.

What You Will Learn

Manage data and perform linear algebra in Python
Derive inferences from the analysis by performing inferential statistics
Solve data science problems in Python
Create high-end visualizations using Python
Evaluate and apply the linear regression technique to estimate the relationships among variables.
Build recommendation engines with the various collaborative filtering algorithms
Apply the ensemble methods to improve your predictions
Work with big data technologies to handle data at scale

In Detail

Data science is a relatively new knowledge domain which is used by various organizations to make data driven decisions. Data scientists have to wear various hats to work with data and to derive value from it. The Python programming language, beyond having conquered the scientific community in the last decade, is now an indispensable tool for the data science practitioner and a must-know tool for every aspiring data scientist. Using Python will offer you a fast, reliable, cross-platform, and mature environment for data analysis, machine learning, and algorithmic problem solving.

This comprehensive guide helps you move beyond the hype and transcend the theory by providing you with a hands-on, advanced study of data science.

Beginning with the essentials of Python in data science, you will learn to manage data and perform linear algebra in Python. You will move on to deriving inferences from the analysis by performing inferential statistics, and mining data to reveal hidden patterns and trends. You will use the matplot library to create high-end visualizations in Python and uncover the fundamentals of machine learning. Next, you will apply the linear regression technique and also learn to apply the logistic regression technique to your applications, before creating recommendation engines with various collaborative filtering algorithms and improving your predictions by applying the ensemble methods.

Finally, you will perform K-means clustering, along with an analysis of unstructured data with different text mining techniques and leveraging the power of Python in big data analytics.

Style and approach

This book is an easy-to-follow, comprehensive guide on data science using Python. The topics covered in the book can all be used in real world scenarios.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateAug 31, 2015

ISBN9781784392628

Author

Samir Madhavan

Samir Madhavan has been working in the field of data science since 2010. He is an industry expert on machine learning and big data. He has also reviewed R Machine Learning Essentials by Packt Publishing. He was part of the ubiquitous Aadhar project of the Unique Identification Authority of India, which is in the process of helping every Indian get a unique number that is similar to a social security number in the United States. He was also the first employee of Flutura Decision Sciences and Analytics and is a part of the core team that has helped scale the number of employees in the company to 50. His company is now recognized as one of the most promising Internet of Things―Decision Sciences companies in the world.

Related authors

Skip carousel

Related to Mastering Python for Data Science

Related ebooks

Skip carousel

Python Data Science Essentials - Second Edition
Ebook
Python Data Science Essentials - Second Edition
byBoschetti Alberto
Rating: 4 out of 5 stars
4/5
Python Data Science Essentials
Ebook
Python Data Science Essentials
byBoschetti Alberto
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis
Ebook
Python Data Analysis
byIvan Idris
Rating: 4 out of 5 stars
4/5
Learning Predictive Analytics with Python
Ebook
Learning Predictive Analytics with Python
byKumar Ashish
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis - Second Edition
Ebook
Python Data Analysis - Second Edition
byArmando Fandango
Rating: 0 out of 5 stars
0 ratings
Learning pandas - Second Edition
Ebook
Learning pandas - Second Edition
byHeydt Michael
Rating: 4 out of 5 stars
4/5
Learning pandas
Ebook
Learning pandas
byHeydt Michael
Rating: 4 out of 5 stars
4/5
Regression Analysis with Python
Ebook
Regression Analysis with Python
byBoschetti Alberto
Rating: 0 out of 5 stars
0 ratings
R for Data Science
Ebook
R for Data Science
byDan Toomey
Rating: 5 out of 5 stars
5/5
Getting Started with Python Data Analysis
Ebook
Getting Started with Python Data Analysis
byVo.T.H Phuong
Rating: 0 out of 5 stars
0 ratings
Principles of Data Science
Ebook
Principles of Data Science
bySinan Ozdemir
Rating: 4 out of 5 stars
4/5
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
Ebook
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
byStefanie Molin
Rating: 0 out of 5 stars
0 ratings
Building Machine Learning Systems with Python
Ebook
Building Machine Learning Systems with Python
byWilli Richert
Rating: 4 out of 5 stars
4/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Practical Data Science Cookbook - Second Edition
Ebook
Practical Data Science Cookbook - Second Edition
byTony Ojeda
Rating: 0 out of 5 stars
0 ratings
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Mastering Social Media Mining with Python
Ebook
Mastering Social Media Mining with Python
byMarco Bonzanini
Rating: 5 out of 5 stars
5/5
Learning Data Mining with Python - Second Edition
Ebook
Learning Data Mining with Python - Second Edition
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Web Scraping with Python
Ebook
Web Scraping with Python
byRichard Lawson
Rating: 4 out of 5 stars
4/5
Mastering Data Mining with Python – Find patterns hidden in your data
Ebook
Mastering Data Mining with Python – Find patterns hidden in your data
byMegan Squire
Rating: 0 out of 5 stars
0 ratings
Python Data Structures and Algorithms
Ebook
Python Data Structures and Algorithms
byBenjamin Baka
Rating: 5 out of 5 stars
5/5
Python Deep Learning
Ebook
Python Deep Learning
byValentino Zocca
Rating: 5 out of 5 stars
5/5
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
Ebook
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
byMatt Harrison
Rating: 5 out of 5 stars
5/5
Practical Machine Learning
Ebook
Practical Machine Learning
byGollapudi Sunila
Rating: 2 out of 5 stars
2/5
Artificial Intelligence with Python
Ebook
Artificial Intelligence with Python
byPrateek Joshi
Rating: 4 out of 5 stars
4/5
Mastering Python Data Analysis
Ebook
Mastering Python Data Analysis
byMagnus Vilhelm Persson
Rating: 0 out of 5 stars
0 ratings
Advanced Machine Learning with Python
Ebook
Advanced Machine Learning with Python
byJohn Hearty
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Designing Machine Learning Systems with Python
Ebook
Designing Machine Learning Systems with Python
byDavid Julian
Rating: 0 out of 5 stars
0 ratings
R Machine Learning By Example
Ebook
R Machine Learning By Example
byDipanjan Sarkar
Rating: 0 out of 5 stars
0 ratings

Data Visualization For You

Skip carousel

How to be Clear and Compelling with Data: Principles, Practice and Getting Beyond the Basics
Ebook
How to be Clear and Compelling with Data: Principles, Practice and Getting Beyond the Basics
byJohn J Burrett
Rating: 0 out of 5 stars
0 ratings
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
Functional Aesthetics for Data Visualization
Ebook
Functional Aesthetics for Data Visualization
byVidya Setlur
Rating: 0 out of 5 stars
0 ratings
DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Winning The Room: Creating and Delivering an Effective Data-Driven Presentation
Ebook
Winning The Room: Creating and Delivering an Effective Data-Driven Presentation
byBill Franks
Rating: 0 out of 5 stars
0 ratings
The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios
Ebook
The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios
bySteve Wexler
Rating: 4 out of 5 stars
4/5
D3.js in Action: Data visualization with JavaScript
Ebook
D3.js in Action: Data visualization with JavaScript
byElijah Meeks
Rating: 0 out of 5 stars
0 ratings
Cool Infographics: Effective Communication with Data Visualization and Design
Ebook
Cool Infographics: Effective Communication with Data Visualization and Design
byRandy Krum
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals
Ebook
Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals
byBrent Dykes
Rating: 4 out of 5 stars
4/5
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
Ebook
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
byJoshua N. Milligan
Rating: 0 out of 5 stars
0 ratings
Teach Yourself VISUALLY Power BI
Ebook
Teach Yourself VISUALLY Power BI
byAlexander Loth
Rating: 0 out of 5 stars
0 ratings
Learning PySpark
Ebook
Learning PySpark
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
Getting to Know ArcGIS Desktop 10.8
Ebook
Getting to Know ArcGIS Desktop 10.8
byMichael Law
Rating: 4 out of 5 stars
4/5
#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time
Ebook
#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time
byEva Murray
Rating: 0 out of 5 stars
0 ratings
Python For Beginners.Learn Data Science in 5 Days the Smart Way and Remember it Longer. With Easy Step by Step Guidance & Hands on Examples. (Python Crash Course-Programming for Beginners): Python for Beginners
Ebook
Python For Beginners.Learn Data Science in 5 Days the Smart Way and Remember it Longer. With Easy Step by Step Guidance & Hands on Examples. (Python Crash Course-Programming for Beginners): Python for Beginners
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Mastering Excel: Excel Apps
Ebook
Mastering Excel: Excel Apps
byMark Moore
Rating: 3 out of 5 stars
3/5
Creating Data Stories with Tableau Public
Ebook
Creating Data Stories with Tableau Public
byOhmann Ashley
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis Cookbook
Ebook
Practical Data Analysis Cookbook
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
Advanced Analytics with Power BI and Excel: Learn powerful visualization and data analysis techniques using Microsoft BI tools along with Python and R
Ebook
Advanced Analytics with Power BI and Excel: Learn powerful visualization and data analysis techniques using Microsoft BI tools along with Python and R
byDejan Sarka
Rating: 0 out of 5 stars
0 ratings
The Chicago Guide to Writing About Numbers
Ebook
The Chicago Guide to Writing About Numbers
byJane E. Miller
Rating: 0 out of 5 stars
0 ratings
Learning pandas - Second Edition
Ebook
Learning pandas - Second Edition
byHeydt Michael
Rating: 4 out of 5 stars
4/5
Google Analytics 4 Migration Quick Guide 2022: Universal Analytics disappears in July 2023 - are you ready?
Ebook
Google Analytics 4 Migration Quick Guide 2022: Universal Analytics disappears in July 2023 - are you ready?
byWynne Pirini
Rating: 0 out of 5 stars
0 ratings
Excel for Beginners 2023: A Step-by-Step and Comprehensive Guide to Master the Basics of Excel, with Formulas, Functions, & Charts
Ebook
Excel for Beginners 2023: A Step-by-Step and Comprehensive Guide to Master the Basics of Excel, with Formulas, Functions, & Charts
byGerald Stroud
Rating: 0 out of 5 stars
0 ratings
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
Ebook
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
byMatt Goldwasser
Rating: 0 out of 5 stars
0 ratings
Smart Data Discovery Using SAS Viya: Powerful Techniques for Deeper Insights
Ebook
Smart Data Discovery Using SAS Viya: Powerful Techniques for Deeper Insights
byFelix Liao
Rating: 0 out of 5 stars
0 ratings
Learning Tableau
Ebook
Learning Tableau
byJoshua N. Milligan
Rating: 0 out of 5 stars
0 ratings
Mastering Text Mining with R
Ebook
Mastering Text Mining with R
byAvinash Paul
Rating: 0 out of 5 stars
0 ratings
GIS Tutorial for ArcGIS Pro 2.8
Ebook
GIS Tutorial for ArcGIS Pro 2.8
byWilpen L. Gorr
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
Podcast episode
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
Advantages of Completing Small Python Projects
Podcast episode
Advantages of Completing Small Python Projects
byThe Real Python Podcast
0 ratings
0% found this document useful
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
Podcast episode
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
byDataFramed
100%
100% found this document useful
Unraveling Python's Syntax to Its Core With Brett Cannon
Podcast episode
Unraveling Python's Syntax to Its Core With Brett Cannon
byThe Real Python Podcast
100%
100% found this document useful
Improving the Learning Experience on Real Python
Podcast episode
Improving the Learning Experience on Real Python
byThe Real Python Podcast
0 ratings
0% found this document useful
#63 The Past and Present of Data Science
Podcast episode
#63 The Past and Present of Data Science
byDataFramed
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
Podcast episode
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
byMaking Data Simple
0 ratings
0% found this document useful
Anaconda + Pyston and more: with Peter Wang, CEO of Anaconda
Podcast episode
Anaconda + Pyston and more: with Peter Wang, CEO of Anaconda
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
#059 - 10 Python clean code tips drawn from code reviews
Podcast episode
#059 - 10 Python clean code tips drawn from code reviews
byPybites Podcast
0 ratings
0% found this document useful
#40 Becoming a Data Scientist
Podcast episode
#40 Becoming a Data Scientist
byDataFramed
100%
100% found this document useful
Exploring the Zen of Python & pandas Features for Finance
Podcast episode
Exploring the Zen of Python & pandas Features for Finance
byThe Real Python Podcast
0 ratings
0% found this document useful
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
Podcast episode
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
byAnalytics on Fire
0 ratings
0% found this document useful
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
Podcast episode
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
byDataFramed
0 ratings
0% found this document useful
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
Podcast episode
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
byDataFramed
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Ep. 145 - Laura Anne Edwards, DATA OASIS founder, NASA Datanaut, TED Resident & SheCanHackIT on Sustainable Innovation and Big Data: Laura Anne Edwards is founder of DATA OASIS and serves as a NASA Datanaut, TED Resident and with SheCanHackIT. Brian Ardinger, Inside Outside Innovation founder, talks with Laura Anne about sustainable innovation and big data. Important Take Aways: Su
Podcast episode
Ep. 145 - Laura Anne Edwards, DATA OASIS founder, NASA Datanaut, TED Resident & SheCanHackIT on Sustainable Innovation and Big Data: Laura Anne Edwards is founder of DATA OASIS and serves as a NASA Datanaut, TED Resident and with SheCanHackIT. Brian Ardinger, Inside Outside Innovation founder, talks with Laura Anne about sustainable innovation and big data. Important Take Aways: Su
byInside Outside Innovation
0 ratings
0% found this document useful
Agile Applied AI Research with Parvez Ahammad - #492: Today we’re joined by Parvez Ahammad, head of data science applied research at LinkedIn. In our conversation, Parvez shares his interesting take on organizing principles for his organization, starting with how data science teams are broadly...
Podcast episode
Agile Applied AI Research with Parvez Ahammad - #492: Today we’re joined by Parvez Ahammad, head of data science applied research at LinkedIn. In our conversation, Parvez shares his interesting take on organizing principles for his organization, starting with how data science teams are broadly...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
Podcast episode
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
byAI Live & Unbiased
0 ratings
0% found this document useful
Building A Data Mesh Platform At PayPal: There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. In this episode he shares that journey and the combination of technical and organizational challenges that he encountered in the process.
Podcast episode
Building A Data Mesh Platform At PayPal: There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. In this episode he shares that journey and the combination of technical and organizational challenges that he encountered in the process.
byData Engineering Podcast
0 ratings
0% found this document useful
How ChatGPT Can Supercharge Your L&D With Ross Stevenson
Podcast episode
How ChatGPT Can Supercharge Your L&D With Ross Stevenson
byThe Learning & Development Podcast
0 ratings
0% found this document useful
Analytics for a Better World - Parvathy Krishnan
Podcast episode
Analytics for a Better World - Parvathy Krishnan
byDataTalks.Club
0 ratings
0% found this document useful
Product Owners in Data Science - Anna Hannemann
Podcast episode
Product Owners in Data Science - Anna Hannemann
byDataTalks.Club
0 ratings
0% found this document useful
1425: Exasol - How Data Visualization Is Highlighting Gender Inequality: Exasol, Technology Evangelist, Eva Murray
Podcast episode
1425: Exasol - How Data Visualization Is Highlighting Gender Inequality: Exasol, Technology Evangelist, Eva Murray
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
Quantifying The Return On Investment For Your Data Team: As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
Podcast episode
Quantifying The Return On Investment For Your Data Team: As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
byData Engineering Podcast
0 ratings
0% found this document useful
Let The Whole Team Participate In Data With The Quilt Versioned Data Hub: Data is a team sport, but it's often difficult for everyone on the team to participate. For a long time the mantra of data tools has been "by developers, for developers", which automatically excludes a large portion of the business members who play a crucial role in the success of any data project. Quilt Data was created as an answer to make it easier for everyone to contribute to the data being used by an organization and collaborate on its application. In this episode Aneesh Karve shares the journey that Quilt has taken to provide an approachable interface for working with versioned data in S3 that empowers everyone to collaborate.
Podcast episode
Let The Whole Team Participate In Data With The Quilt Versioned Data Hub: Data is a team sport, but it's often difficult for everyone on the team to participate. For a long time the mantra of data tools has been "by developers, for developers", which automatically excludes a large portion of the business members who play a crucial role in the success of any data project. Quilt Data was created as an answer to make it easier for everyone to contribute to the data being used by an organization and collaborate on its application. In this episode Aneesh Karve shares the journey that Quilt has taken to provide an approachable interface for working with versioned data in S3 that empowers everyone to collaborate.
byData Engineering Podcast
0 ratings
0% found this document useful
Exploring The Nuances Of Building An Intential Data Culture: The ecosystem for data professionals has matured to the point that there are a large and growing number of distinct roles. With the scope and importance of data steadily increasing it is important for organizations to ensure that everyone is aligned and operating in a positive environment. To help facilitate the nascent conversation about what constitutes an effective and productive data culture, the team at Data Council have dedicated an entire conference track to the subject. In this episode Pete Soderling and Maggie Hays join the show to explore this topic and their experience preparing for the upcoming conference.
Podcast episode
Exploring The Nuances Of Building An Intential Data Culture: The ecosystem for data professionals has matured to the point that there are a large and growing number of distinct roles. With the scope and importance of data steadily increasing it is important for organizations to ensure that everyone is aligned and operating in a positive environment. To help facilitate the nascent conversation about what constitutes an effective and productive data culture, the team at Data Council have dedicated an entire conference track to the subject. In this episode Pete Soderling and Maggie Hays join the show to explore this topic and their experience preparing for the upcoming conference.
byData Engineering Podcast
0 ratings
0% found this document useful
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
Podcast episode
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
byData Engineering Podcast
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
DJANGO Create A Database-driven Website
Linux Format
Article
DJANGO Create A Database-driven Website
Jun 4, 2019
The Django web framework was named after the famous guitarist Django Reinhardt and was first created by web developers at a small newspaper in Kansas. The main goals of Django is to enable fast development of complex websites with database needs. It
7 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
AppleMagazine
Article
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
Apr 28, 2023
4 min read
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
TechLife News
Article
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
Apr 29, 2023
4 min read
Questions for Angela Zutavern, Machine Intelligence Expert, Booz Allen Hamilton
Rotman Management
Article
Questions for Angela Zutavern, Machine Intelligence Expert, Booz Allen Hamilton
Jan 1, 2018
You believe that the world of leadership has hit an inflection point. How so? As useful as popular mental models and heuristics are, machine models now outstrip human performance in about half of the portfolio of cognitive tasks. Going forward, we wi
6 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
Things Get Strange When AI Starts Training Itself
The Atlantic
Article
Things Get Strange When AI Starts Training Itself
Feb 16, 2024
7 min read
Q&A
Rotman Management
Article
Q&A
May 1, 2023
Describe the capability that companies like Netflix, UPS, Amazon and Caesars Entertainment have in common. These are all leading firms in their industries with respect to leveraging analytics as a source of competitive advantage. We now have so much
7 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
Family History In The AI Era
Family Tree UK
Article
Family History In The AI Era
Apr 12, 2024
7 min read
01 Ready Or Not, AI Is Here To Assist You
HWM Singapore
Article
01 Ready Or Not, AI Is Here To Assist You
Jul 11, 2023
4 min read
The Infrastructure of an AI Factory
Techfastly
Article
The Infrastructure of an AI Factory
Mar 3, 2021
Data is a crucial element for machine learning algorithms. It can be considered as a fuel of AI factories. Collection of useful data and feeding it into frameworks and models is the foremost step. Data acts as a case or example that the algorithms re
1 min read
The Big Tech Boost
Business Today
Article
The Big Tech Boost
Jan 5, 2024
5 min read
The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
The Art Of AI Maturity: Five Success Factors
Rotman Management
Article
The Art Of AI Maturity: Five Success Factors
Jan 1, 2023
TODAY, MUCH OF WHAT WE TAKE FOR GRANTED in our daily lives stems from machine learning. Every time you use a wayfinding app to get from point A to point B, use dictation to convert speech to text, or unlock your phone using face ID, you’re relying on
10 min read
Leadership Forum: Investing in Disruption
Rotman Management
Article
Leadership Forum: Investing in Disruption
Jan 1, 2019
10 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
CULTURE SHIFT – An Indispensable Shift To Building An AI-Powered Organisation
Techfastly
Article
CULTURE SHIFT – An Indispensable Shift To Building An AI-Powered Organisation
May 3, 2021
5 min read
Finding Your Data
APC
Article
Finding Your Data
Sep 9, 2019
4 min read
Why Your Organisation Needs To Lift Its Data Game
NZBusiness and Management
Article
Why Your Organisation Needs To Lift Its Data Game
Oct 22, 2019
From problems stemming from the recent New Zealand census to data collected by Facebook, data has been in the news a lot lately. It may seem obvious that large organisations such as Statistics New Zealand and Facebook need to continually improve thei
3 min read
Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
Adoption of Cognitive Computing Across Various Industries
Techfastly
Article
Adoption of Cognitive Computing Across Various Industries
Dec 1, 2021
5 min read

Related categories

Skip carousel

Reviews for Mastering Python for Data Science

Rating: 3 out of 5 stars

3/5

1 rating0 reviews

Book preview

Mastering Python for Data Science - Samir Madhavan

Mastering Python for Data Science

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started with Raw Data

The world of arrays with NumPy

Creating an array

Mathematical operations

Array subtraction

Squaring an array

A trigonometric function performed on the array

Conditional operations

Matrix multiplication

Indexing and slicing

Shape manipulation

Empowering data analysis with pandas

The data structure of pandas

Series

DataFrame

Panel

Inserting and exporting data

CSV

XLS

JSON

Database

Data cleansing

Checking the missing data

Filling the missing data

String operations

Merging data

Data operations

Aggregation operations

Joins

The inner join

The left outer join

The full outer join

The groupby function

Summary

2. Inferential Statistics

Various forms of distribution

A normal distribution

A normal distribution from a binomial distribution

A Poisson distribution

A Bernoulli distribution

A z-score

A p-value

One-tailed and two-tailed tests

Type 1 and Type 2 errors

A confidence interval

Correlation

Z-test vs T-test

The F distribution

The chi-square distribution

Chi-square for the goodness of fit

The chi-square test of independence

ANOVA

Summary

3. Finding a Needle in a Haystack

What is data mining?

Presenting an analysis

Studying the Titanic

Which passenger class has the maximum number of survivors?

What is the distribution of survivors based on gender among the various classes?

What is the distribution of nonsurvivors among the various classes who have family aboard the ship?

What was the survival percentage among different age groups?

Summary

4. Making Sense of Data through Advanced Visualization

Controlling the line properties of a chart

Using keyword arguments

Using the setter methods

Using the setp() command

Creating multiple plots

Playing with text

Styling your plots

Box plots

Heatmaps

Scatter plots with histograms

A scatter plot matrix

Area plots

Bubble charts

Hexagon bin plots

Trellis plots

A 3D plot of a surface

Summary

5. Uncovering Machine Learning

Different types of machine learning

Supervised learning

Unsupervised learning

Reinforcement learning

Decision trees

Linear regression

Logistic regression

The naive Bayes classifier

The k-means clustering

Hierarchical clustering

Summary

6. Performing Predictions with a Linear Regression

Simple linear regression

Multiple regression

Training and testing a model

Summary

7. Estimating the Likelihood of Events

Logistic regression

Data preparation

Creating training and testing sets

Building a model

Model evaluation

Evaluating a model based on test data

Model building and evaluation with SciKit

Summary

8. Generating Recommendations with Collaborative Filtering

Recommendation data

User-based collaborative filtering

Finding similar users

The Euclidean distance score

The Pearson correlation score

Ranking the users

Recommending items

Item-based collaborative filtering

Summary

9. Pushing Boundaries with Ensemble Models

The census income dataset

Exploring the census data

Hypothesis 1: People who are older earn more

Hypothesis 2: Income bias based on working class

Hypothesis 3: People with more education earn more

Hypothesis 4: Married people tend to earn more

Hypothesis 5: There is a bias in income based on race

Hypothesis 6: There is a bias in the income based on occupation

Hypothesis 7: Men earn more

Hypothesis 8: People who clock in more hours earn more

Hypothesis 9: There is a bias in income based on the country of origin

Decision trees

Random forests

Summary

10. Applying Segmentation with k-means Clustering

The k-means algorithm and its working

A simple example

The k-means clustering with countries

Determining the number of clusters

Clustering the countries

Summary

11. Analyzing Unstructured Data with Text Mining

Preprocessing data

Creating a wordcloud

Word and sentence tokenization

Parts of speech tagging

Stemming and lemmatization

Stemming

Lemmatization

The Stanford Named Entity Recognizer

Performing sentiment analysis on world leaders using Twitter

Summary

12. Leveraging Python in the World of Big Data

What is Hadoop?

The programming model

The MapReduce architecture

The Hadoop DFS

Hadoop's DFS architecture

Python MapReduce

The basic word count

A sentiment score for each review

The overall sentiment score

Deploying the MapReduce code on Hadoop

File handling with Hadoopy

Pig

Python with Apache Spark

Scoring the sentiment

The overall sentiment

Summary

Index

Mastering Python for Data Science

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: August 2015

Production reference: 1260815

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78439-015-0

www.packtpub.com

Credits

Author

Samir Madhavan

Reviewers

Sébastien Celles

Robert Dempsey

Maurice HT Ling

Ratanlal Mahanta

Yingssu Tsai

Commissioning Editor

Pramila Balan

Acquisition Editor

Sonali Vernekar

Content Development Editor

Arun Nadar

Technical Editor

Chinmay S. Puranik

Copy Editor

Sonia Michelle Cheema

Project Coordinator

Neha Bhatnagar

Proofreader

Safis Editing

Indexer

Monica Ajmera Mehta

Graphics

Disha Haria

Jason Monteiro

Production Coordinator

Arvindkumar Gupta

Cover Work

Arvindkumar Gupta

About the Author

Samir Madhavan has been working in the field of data science since 2010. He is an industry expert on machine learning and big data. He has also reviewed R Machine Learning Essentials by Packt Publishing. He was part of the ubiquitous Aadhar project of the Unique Identification Authority of India, which is in the process of helping every Indian get a unique number that is similar to a social security number in the United States. He was also the first employee of Flutura Decision Sciences and Analytics and is a part of the core team that has helped scale the number of employees in the company to 50. His company is now recognized as one of the most promising Internet of Things—Decision Sciences companies in the world.

I would like to thank my mom, Rajasree Madhavan, and dad, P Madhavan, for all their support. I would also like to thank Srikanth Muralidhara, Krishnan Raman, and Derick Jose, who gave me the opportunity to start my career in the world of data science.

About the Reviewers

Sébastien Celles is a professor of applied physics at Universite de Poitiers (working in the thermal science department). He has used Python for numerical simulations, data plotting, data predictions, and various other tasks since the early 2000s. He is a member of PyData and was granted commit rights to the pandas DataReader project. He is also involved in several open source projects in the scientific Python ecosystem.

Sebastien is also the author of some Python packages available on PyPi, which are as follows:

openweathermap_requests: This is a package used to fetch data from OpenWeatherMap.org using Requests and Requests-cache and to get pandas DataFrame with weather history

pandas_degreedays: This is a package used to calculate degree days (a measure of heating or cooling) from the pandas time series of temperature

pandas_confusion: This is a package used to manage confusion matrices, plot and binarize them, and calculate overall and class statistics

There are some other packages authored by him, such as pyade, pandas_datareaders_unofficial, and more

He also has a personal interest in data mining, machine learning techniques, forecasting, and so on. You can find more information about him at http://www.celles.net/wiki/Contact or https://www.linkedin.com/in/sebastiencelles.

Robert Dempsey is a leader and technology professional, specializing in delivering solutions and products to solve tough business challenges. His experience of forming and leading agile teams combined with more than 15 years of technology experience enables him to solve complex problems while always keeping the bottom line in mind.

Robert founded and built three start-ups in the tech and marketing fields, developed and sold two online applications, consulted for Fortune 500 and Inc. 500 companies, and has spoken nationally and internationally on software development and agile project management.

He's the founder of Data Wranglers DC, a group dedicated to improving the craft of data wrangling, as well as a board member of Data Community DC. He is currently the team leader of data operations at ARPC, an econometrics firm based in Washington, DC.

In addition to spending time with his growing family, Robert geeks out on Raspberry Pi's, Arduinos, and automating more of his life through hardware and software.

Maurice HT Ling has been programming in Python since 2003. Having completed his PhD in bioinformatics and BSc (Hons) in molecular and cell biology from The University of Melbourne, he is currently a research fellow at Nanyang Technological University, Singapore. He is also an honorary fellow of The University of Melbourne, Australia. Maurice is the chief editor of Computational and Mathematical Biology and coeditor of The Python Papers. Recently, he cofounded the first synthetic biology start-up in Singapore, called AdvanceSyn Pte. Ltd., as the director and chief technology officer. His research interests lie in life itself, such as biological life and artificial life, and artificial intelligence, which use computer science and statistics as tools to understand life and its numerous aspects. In his free time, Maurice likes to read, enjoy a cup of coffee, write his personal journal, or philosophize on various aspects of life. His website and LinkedIn profile are http://maurice.vodien.com and http://www.linkedin.com/in/mauriceling, respectively.

Ratanlal Mahanta is a senior quantitative analyst. He holds an MSc degree in computational finance and is currently working at GPSK Investment Group as a senior quantitative analyst. He has 4 years of experience in quantitative trading and strategy development for sell-side and risk consultation firms. He is an expert in high frequency and algorithmic trading.

He has expertise in the following areas:

Quantitative trading: This includes FX, equities, futures, options, and engineering on derivatives

Algorithms: This includes Partial Differential Equations, Stochastic Differential Equations, Finite Difference Method, Monte-Carlo, and Machine Learning

Code: This includes R Programming, C++, Python, MATLAB, HPC, and scientific computing

Data analysis: This includes big data analytics (EOD to TBT), Bloomberg, Quandl, and Quantopian

Strategies: This includes Vol Arbitrage, Vanilla and Exotic Options Modeling, trend following, Mean reversion, Co-integration, Monte-Carlo Simulations, ValueatRisk, Stress Testing, Buy side trading strategies with high Sharpe ratio, Credit Risk Modeling, and Credit Rating

He has already reviewed Mastering Scientific Computing with R, Mastering R for Quantitative Finance, and Machine Learning with R Cookbook, all by Packt Publishing.

You can find out more about him at https://twitter.com/mahantaratan.

Yingssu Tsai is a data scientist. She holds degrees from the University of California, Berkeley, and the University of California, Los Angeles.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

Data science is an exciting new field that is used by various organizations to perform data-driven decisions. It is a combination of technical knowledge, mathematics, and business. Data scientists have to wear various hats to work with data and derive some value out of it. Python is one of the most popular languages among all the languages used by data scientists. It is a simple language to learn and is used for purposes, such as web development, scripting, and application development to name a few.

The ability to perform data science using Python is very powerful as it helps clean data at a raw level to create advanced machine learning algorithms that predict customer churns for a retail company. This book explains various concepts of data science in a structured manner with the application of these concepts on data to see how to interpret results. The book provides a good base for understanding the advanced topics of data science and how to apply them in a real-world scenario.

What this book covers

Chapter 1, Getting Started with Raw Data, teaches you the techniques of handling unorganized data. You'll also learn how to extract data from different sources, as well as how to clean and manipulate it.

Chapter 2, Inferential Statistics, goes beyond descriptive statistics, where you'll learn about inferential statistics concepts, such as distributions, different statistical tests, the errors in statistical tests, and confidence intervals.

Chapter 3, Finding a Needle in a Haystack, explains what data mining is and how it can be utilized. There is a lot of information in data but finding meaningful information is an art.

Chapter 4, Making Sense of Data through Advanced Visualization, teaches you how to create different visualizations of data. Visualization is an integral part of data science; it helps communicate a pattern or relationship that cannot be seen by looking at raw data.

Chapter 5, Uncovering Machine Learning, introduces you to the different techniques of machine learning and how to apply them. Machine learning is the new buzzword in the industry. It's used in activities, such as Google's driverless cars and predicting the effectiveness of marketing campaigns.

Chapter 6, Performing Predictions with a Linear Regression, helps you build a simple regression model followed by multiple regression models along with methods to test the effectiveness of the models. Linear regression is

Enjoying the preview?

Page 1 of 1

Mastering Python for Data Science

About this ebook

Samir Madhavan

Related authors

Related to Mastering Python for Data Science

Related ebooks

Data Visualization For You

Related podcast episodes

Related articles

Related categories

Reviews for Mastering Python for Data Science

What did you think?

Book preview

Mastering Python for Data Science - Samir Madhavan

Table of Contents

Mastering Python for Data Science

Mastering Python for Data Science

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers