Ebook439 pages3 hours

Preparing Data for Analysis with JMP

Name: Preparing Data for Analysis with JMP
Author: Robert Carver
ISBN: 9781635261486

By Robert Carver

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Access and clean up data easily using JMP®!

Data acquisition and preparation commonly consume approximately 75% of the effort and time of total data analysis. JMP provides many visual, intuitive, and even innovative data-preparation capabilities that enable you to make the most of your organization's data.

Preparing Data for Analysis with JMP® is organized within a framework of statistical investigations and model-building and illustrates the new data-handling features in JMP, such as the Query Builder. Useful to students and programmers with little or no JMP experience, or those looking to learn the new data-management features and techniques, it uses a practical approach to getting started with plenty of examples. Using step-by-step demonstrations and screenshots, this book walks you through the most commonly used data-management techniques that also include lots of tips on how to avoid common problems.

With this book, you will learn how to:

Manage database operations using the JMP Query Builder
Get data into JMP from other formats, such as Excel, csv, SAS, HTML, JSON, and the web
Identify and avoid problems with the help of JMP’s visual and automated data-exploration tools
Consolidate data from multiple sources with Query Builder for tables
Deal with common issues and repairs that include the following tasks:
- reshaping tables (stack/unstack)
- managing missing data with techniques such as imputation and Principal Components Analysis
- cleaning and correcting dirty data
- computing new variables
- transforming variables for modelling
- reconciling time and date
Subset and filter your data
Save data tables for exchange with other platforms

Skip carousel

LanguageEnglish

PublisherSAS Institute

Release dateMay 1, 2017

ISBN9781635261486

Author

Robert Carver

Robert Carver was brought up in Cyprus, Turkey and India. Educated at the Scuola Medici, Florence, and Durham University, where he read Oriental Studies and Politics, he taught English in a maximum security gaol in Australia and worked as a BBC World Service reporter in Eastern Europe and the Levant. Four of his plays have been broadcast by the BBC. He has written for the Sunday Times, the Observer, the Daily Telegraph and other papers and is the author of The Accursed Mountains (Flamingo 1999)

Related to Preparing Data for Analysis with JMP

Related ebooks

Skip carousel

JMP Essentials: An Illustrated Guide for New Users, Third Edition
Ebook
JMP Essentials: An Illustrated Guide for New Users, Third Edition
byCurt Hinrichs
Rating: 0 out of 5 stars
0 ratings
PROC DOCUMENT by Example Using SAS
Ebook
PROC DOCUMENT by Example Using SAS
byMichael Tuchman
Rating: 0 out of 5 stars
0 ratings
SAS Statistics by Example
Ebook
SAS Statistics by Example
byRon Cody
Rating: 5 out of 5 stars
5/5
JMP Start Statistics: A Guide to Statistics and Data Analysis Using JMP, Sixth Edition
Ebook
JMP Start Statistics: A Guide to Statistics and Data Analysis Using JMP, Sixth Edition
byJohn Sall
Rating: 0 out of 5 stars
0 ratings
PROC SQL: Beyond the Basics Using SAS, Third Edition
Ebook
PROC SQL: Beyond the Basics Using SAS, Third Edition
byKirk Paul Lafler
Rating: 0 out of 5 stars
0 ratings
JMP for Mixed Models
Ebook
JMP for Mixed Models
byRuth Hummel
Rating: 0 out of 5 stars
0 ratings
SAS Certification Prep Guide: Statistical Business Analysis Using SAS9
Ebook
SAS Certification Prep Guide: Statistical Business Analysis Using SAS9
byJoni N. Shreve, PhD
Rating: 0 out of 5 stars
0 ratings
Applied Econometrics with SAS: Modeling Demand, Supply, and Risk
Ebook
Applied Econometrics with SAS: Modeling Demand, Supply, and Risk
byBarry K. Goodwin
Rating: 5 out of 5 stars
5/5
Applied Data Mining for Forecasting Using SAS
Ebook
Applied Data Mining for Forecasting Using SAS
byTim Rey
Rating: 0 out of 5 stars
0 ratings
Fundamentals of Programming in SAS: A Case Studies Approach
Ebook
Fundamentals of Programming in SAS: A Case Studies Approach
byJames Blum
Rating: 0 out of 5 stars
0 ratings
SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics
Ebook
SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics
byDaniel J. Denis
Rating: 0 out of 5 stars
0 ratings
Pharmaceutical Quality by Design Using JMP: Solving Product Development and Manufacturing Problems
Ebook
Pharmaceutical Quality by Design Using JMP: Solving Product Development and Manufacturing Problems
byRob Lievense
Rating: 5 out of 5 stars
5/5
Survival Analysis Using SAS: A Practical Guide, Second Edition
Ebook
Survival Analysis Using SAS: A Practical Guide, Second Edition
byPaul D. Allison
Rating: 0 out of 5 stars
0 ratings
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
Ebook
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
byLisa Fine
Rating: 0 out of 5 stars
0 ratings
SAS Programming for Enterprise Guide Users, Second Edition
Ebook
SAS Programming for Enterprise Guide Users, Second Edition
byNeil Constable
Rating: 0 out of 5 stars
0 ratings
JMP for Basic Univariate and Multivariate Statistics: Methods for Researchers and Social Scientists, Second Edition
Ebook
JMP for Basic Univariate and Multivariate Statistics: Methods for Researchers and Social Scientists, Second Edition
byAnn Lehman, PhD
Rating: 0 out of 5 stars
0 ratings
JSL Companion: Applications of the JMP Scripting Language, Second Edition
Ebook
JSL Companion: Applications of the JMP Scripting Language, Second Edition
byTheresa Utlaut
Rating: 0 out of 5 stars
0 ratings
Elementary Statistics Using SAS
Ebook
Elementary Statistics Using SAS
bySandra D. Schlotzhauer
Rating: 0 out of 5 stars
0 ratings
Categorical Data Analysis Using SAS, Third Edition
Ebook
Categorical Data Analysis Using SAS, Third Edition
byMaura E. Stokes
Rating: 0 out of 5 stars
0 ratings
SAS Viya: The R Perspective
Ebook
SAS Viya: The R Perspective
byYue Qi
Rating: 0 out of 5 stars
0 ratings
R: Unleash Machine Learning Techniques
Ebook
R: Unleash Machine Learning Techniques
byBrett Lantz
Rating: 0 out of 5 stars
0 ratings
Data Quality for Analytics Using SAS
Ebook
Data Quality for Analytics Using SAS
byGerhard Svolba
Rating: 4 out of 5 stars
4/5
Interactive Reports in SAS® Visual Analytics: Advanced Features and Customization
Ebook
Interactive Reports in SAS® Visual Analytics: Advanced Features and Customization
byNicole Ball
Rating: 0 out of 5 stars
0 ratings
Power Query for Power BI and Excel
Ebook
Power Query for Power BI and Excel
byChristopher Webb
Rating: 0 out of 5 stars
0 ratings
Exercises and Projects for The Little SAS Book, Sixth Edition
Ebook
Exercises and Projects for The Little SAS Book, Sixth Edition
byRebecca A. Ottesen
Rating: 0 out of 5 stars
0 ratings
Design and Analysis of Experiments by Douglas Montgomery: A Supplement for Using JMP
Ebook
Design and Analysis of Experiments by Douglas Montgomery: A Supplement for Using JMP
byHeath Rushing
Rating: 0 out of 5 stars
0 ratings
Mastering Predictive Analytics with R
Ebook
Mastering Predictive Analytics with R
byRui Miguel Forte
Rating: 4 out of 5 stars
4/5
End-to-End Data Science with SAS: A Hands-On Programming Guide
Ebook
End-to-End Data Science with SAS: A Hands-On Programming Guide
byJames Gearheart
Rating: 0 out of 5 stars
0 ratings
R Object-oriented Programming
Ebook
R Object-oriented Programming
byKelly Black
Rating: 3 out of 5 stars
3/5
Building Better Models with JMP Pro
Ebook
Building Better Models with JMP Pro
byJim Grayson
Rating: 0 out of 5 stars
0 ratings

Applications & Software For You

Skip carousel

Adobe Photoshop: A Complete Course and Compendium of Features
Ebook
Adobe Photoshop: A Complete Course and Compendium of Features
byStephen Laskevitch
Rating: 5 out of 5 stars
5/5
The Best Hacking Tricks for Beginners
Ebook
The Best Hacking Tricks for Beginners
byRAJ TYAGI
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT
Ebook
Mastering ChatGPT
byCharles J. Jones
Rating: 0 out of 5 stars
0 ratings
Kodi User Manual: Watch Unlimited Movies & TV shows for free on Your PC, Mac or Android Devices
Ebook
Kodi User Manual: Watch Unlimited Movies & TV shows for free on Your PC, Mac or Android Devices
byKazi Muhith
Rating: 0 out of 5 stars
0 ratings
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
YouTube Channels For Dummies
Ebook
YouTube Channels For Dummies
byRob Ciampa
Rating: 3 out of 5 stars
3/5
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Adobe Illustrator: A Complete Course and Compendium of Features
Ebook
Adobe Illustrator: A Complete Course and Compendium of Features
byJason Hoppe
Rating: 0 out of 5 stars
0 ratings
Blender 3D Basics Beginner's Guide Second Edition
Ebook
Blender 3D Basics Beginner's Guide Second Edition
byGordon Fisher
Rating: 5 out of 5 stars
5/5
Adobe Premiere Pro: A Complete Course and Compendium of Features
Ebook
Adobe Premiere Pro: A Complete Course and Compendium of Features
byBen Goldsmith
Rating: 0 out of 5 stars
0 ratings
Hacks for TikTok: 150 Tips and Tricks for Editing and Posting Videos, Getting Likes, Keeping Your Fans Happy, and Making Money
Ebook
Hacks for TikTok: 150 Tips and Tricks for Editing and Posting Videos, Getting Likes, Keeping Your Fans Happy, and Making Money
byKyle Brach
Rating: 5 out of 5 stars
5/5
2022 Adobe® Premiere Pro Guide For Filmmakers and YouTubers
Ebook
2022 Adobe® Premiere Pro Guide For Filmmakers and YouTubers
byScott Bradley
Rating: 5 out of 5 stars
5/5
iPhone Photography For Dummies
Ebook
iPhone Photography For Dummies
byMark Hemmings
Rating: 0 out of 5 stars
0 ratings
Logic Pro X For Dummies
Ebook
Logic Pro X For Dummies
byGraham English
Rating: 0 out of 5 stars
0 ratings
Affinity Photo How To
Ebook
Affinity Photo How To
byRobin Whalley
Rating: 0 out of 5 stars
0 ratings
FL Studio Cookbook
Ebook
FL Studio Cookbook
byShaun Friedman
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
iPhone Photography: A Ridiculously Simple Guide To Taking Photos With Your iPhone
Ebook
iPhone Photography: A Ridiculously Simple Guide To Taking Photos With Your iPhone
byScott La Counte
Rating: 0 out of 5 stars
0 ratings
Adobe InDesign CC: A Complete Course and Compendium of Features
Ebook
Adobe InDesign CC: A Complete Course and Compendium of Features
byStephen Laskevitch
Rating: 0 out of 5 stars
0 ratings
Hilarious Jokes for Minecrafters: Mobs, Creepers, Skeletons, and More
Ebook
Hilarious Jokes for Minecrafters: Mobs, Creepers, Skeletons, and More
byMichele C. Hollow
Rating: 1 out of 5 stars
1/5
GarageBand Basics: The Complete Guide to GarageBand: Music
Ebook
GarageBand Basics: The Complete Guide to GarageBand: Music
byAventuras De Viaje
Rating: 0 out of 5 stars
0 ratings
Canon EOS Rebel T3/1100D For Dummies
Ebook
Canon EOS Rebel T3/1100D For Dummies
byJulie Adair King
Rating: 5 out of 5 stars
5/5
Six Figure Blogging In 3 Months
Ebook
Six Figure Blogging In 3 Months
byShekhar Mishra
Rating: 4 out of 5 stars
4/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
iPhone X Hacks, Tips and Tricks: Discover 101 Awesome Tips and Tricks for iPhone XS, XS Max and iPhone X
Ebook
iPhone X Hacks, Tips and Tricks: Discover 101 Awesome Tips and Tricks for iPhone XS, XS Max and iPhone X
byDavid Cromwell
Rating: 3 out of 5 stars
3/5
Book Design Made Simple
Ebook
Book Design Made Simple
byFiona Raven
Rating: 0 out of 5 stars
0 ratings
GarageBand For Dummies
Ebook
GarageBand For Dummies
byBob LeVitus
Rating: 5 out of 5 stars
5/5
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
Ebook
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
byPaul Richards
Rating: 0 out of 5 stars
0 ratings
Vocal Rescue: Rediscover the Beauty, Power and Freedom in Your Singing
Ebook
Vocal Rescue: Rediscover the Beauty, Power and Freedom in Your Singing
byLois Alba
Rating: 4 out of 5 stars
4/5
Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online
Ebook
Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online
byCrystalynn Shelton
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
Podcast episode
Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
byData Engineering Podcast
0 ratings
0% found this document useful
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
Podcast episode
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
byData Engineering Podcast
0 ratings
0% found this document useful
Build Better Tests For Your dbt Projects With Datafold And data-diff: Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff.
Podcast episode
Build Better Tests For Your dbt Projects With Datafold And data-diff: Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff.
byData Engineering Podcast
0 ratings
0% found this document useful
Ep. 65 - Data Modeling
Podcast episode
Ep. 65 - Data Modeling
byWhat's Your Baseline? Enterprise Architecture & Business Process Management Demystified
0 ratings
0% found this document useful
#08 - Tech stack: Metabase, Superset, Redash, Grafana
Podcast episode
#08 - Tech stack: Metabase, Superset, Redash, Grafana
byTOPP - The Open Podcast Podcast
0 ratings
0% found this document useful
?ThursdAI - LAION down, OpenChat beats GPT3.5, Apple is showing where it's going, Midjourney v6 is here & Suno can make music!
Podcast episode
?ThursdAI - LAION down, OpenChat beats GPT3.5, Apple is showing where it's going, Midjourney v6 is here & Suno can make music!
byThursdAI - The top AI news from the past week
0 ratings
0% found this document useful
[Bite] Documenting Data Science Projects
Podcast episode
[Bite] Documenting Data Science Projects
byDataCafé
0 ratings
0% found this document useful
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
Podcast episode
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
byData Engineering Podcast
0 ratings
0% found this document useful
Unpacking The Seven Principles Of Modern Data Pipelines: Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data. In this episode Ariel Pohoryles explains what they are and how they work together to increase your chances of success.
Podcast episode
Unpacking The Seven Principles Of Modern Data Pipelines: Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data. In this episode Ariel Pohoryles explains what they are and how they work together to increase your chances of success.
byData Engineering Podcast
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
Podcast episode
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
Justin Dux - Be The Match: Sign-up for The Technopath Way Weekly Newsletter here: technopath.ac-page.com/the-technopath-way-sign-up Be The Match How can I check if I’m still registered in the database if I signed up years ago? You can call this number to find out: 1 (800)...
Podcast episode
Justin Dux - Be The Match: Sign-up for The Technopath Way Weekly Newsletter here: technopath.ac-page.com/the-technopath-way-sign-up Be The Match How can I check if I’m still registered in the database if I signed up years ago? You can call this number to find out: 1 (800)...
byThe Technopath Way: Productivity through tech for nonprofits
0 ratings
0% found this document useful
95: Battle of the CDPs: Packaged vs. Composable, 10 experts weigh in
Podcast episode
95: Battle of the CDPs: Packaged vs. Composable, 10 experts weigh in
byHumans of Martech
0 ratings
0% found this document useful
Episode 524. Adam Braff, ChatGPT Code Interpreter: Show Notes: In this episode, Will Bachman talks to Adam Braff, a former McKinsey partner who specializes in data analytics. Adam has been using chat GPT to explore how this powerful tool can be harnessed for data analysis. He explores the implications...
Podcast episode
Episode 524. Adam Braff, ChatGPT Code Interpreter: Show Notes: In this episode, Will Bachman talks to Adam Braff, a former McKinsey partner who specializes in data analytics. Adam has been using chat GPT to explore how this powerful tool can be harnessed for data analysis. He explores the implications...
byUnleashed - How to Thrive as an Independent Professional
0 ratings
0% found this document useful
Data Observability - Barr Moses
Podcast episode
Data Observability - Barr Moses
byDataTalks.Club
0 ratings
0% found this document useful
BAM 068: Tagging, Data Models, and Data Normalization: Lately, there have been a ton of questions about data tagging and data normalization. But what does that actually mean? This is a super important topic especially when you consider that having common data formats is required to implement analytics,...
Podcast episode
BAM 068: Tagging, Data Models, and Data Normalization: Lately, there have been a ton of questions about data tagging and data normalization. But what does that actually mean? This is a super important topic especially when you consider that having common data formats is required to implement analytics,...
byThe Smart Buildings Academy Podcast | Teaching You Building Automation, Systems Integration, and Information Technology
0 ratings
0% found this document useful
Spam Filtering with Naive Bayes: Today's spam filters are advanced data driven tools. They rely on a variety of techniques to effectively and often seamlessly filter out junk email from good email. Whitelists, blacklists, traffic analysis, network analysis, and a variety of other...
Podcast episode
Spam Filtering with Naive Bayes: Today's spam filters are advanced data driven tools. They rely on a variety of techniques to effectively and often seamlessly filter out junk email from good email. Whitelists, blacklists, traffic analysis, network analysis, and a variety of other...
byData Skeptic
0 ratings
0% found this document useful
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
Podcast episode
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
byMLOps.community
0 ratings
0% found this document useful
Data Access Control with lakeFS’s Adi Polak: Data access control is becoming increasingly important as more and more sensitive data is being stored and processed by businesses and organizations. In this episode, the VP of Developer Experience at lakeFS, Adi Polak, joins to help define data acce...
Podcast episode
Data Access Control with lakeFS’s Adi Polak: Data access control is becoming increasingly important as more and more sensitive data is being stored and processed by businesses and organizations. In this episode, the VP of Developer Experience at lakeFS, Adi Polak, joins to help define data acce...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
Reconciling The Data In Your Databases With Datafold: A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.
Podcast episode
Reconciling The Data In Your Databases With Datafold: A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.
byData Engineering Podcast
0 ratings
0% found this document useful
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
Podcast episode
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
Podcast episode
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
byData Engineering Podcast
0 ratings
0% found this document useful
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
Podcast episode
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
byData Engineering Podcast
0 ratings
0% found this document useful
Mastering Algorithms and Data Structures - Marcello La Rocca
Podcast episode
Mastering Algorithms and Data Structures - Marcello La Rocca
byDataTalks.Club
0 ratings
0% found this document useful
WBSP204: Grow Your Business by Understanding the Nuances of a Multi-Site ERP Implementation w/ Bob Feathers
Podcast episode
WBSP204: Grow Your Business by Understanding the Nuances of a Multi-Site ERP Implementation w/ Bob Feathers
byWBSRocks: Business Growth with ERP and Digital Transformation
0 ratings
0% found this document useful
Information Extraction from Natural Document Formats with David Rosenberg - TWiML Talk #126: In this episode, I’m joined by David Rosenberg, d…
Podcast episode
Information Extraction from Natural Document Formats with David Rosenberg - TWiML Talk #126: In this episode, I’m joined by David Rosenberg, d…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
66: A guide to data models and dynamic dashboards for marketers
Podcast episode
66: A guide to data models and dynamic dashboards for marketers
byHumans of Martech
0 ratings
0% found this document useful
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
Podcast episode
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
byData Engineering Podcast
0 ratings
0% found this document useful
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
Podcast episode
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
byData Engineering Podcast
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful

Skip carousel

Why I Hate Monte Carlo Analysis and Other Financial Projections
Kiplinger
Article
Why I Hate Monte Carlo Analysis and Other Financial Projections
Aug 14, 2019
I'm not a fan of financial plans that use straight-line projections or Monte Carlo risk analysis to support investment proposals. Here's why: They can lull people into a false sense of security or lead them to believe that the stock market is the ans
5 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
MARIADB Optimise And Control Your Databases
Linux Format
Article
MARIADB Optimise And Control Your Databases
Jul 30, 2019
9 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
Q&A
Rotman Management
Article
Q&A
May 1, 2023
Describe the capability that companies like Netflix, UPS, Amazon and Caesars Entertainment have in common. These are all leading firms in their industries with respect to leveraging analytics as a source of competitive advantage. We now have so much
7 min read
The 10 Must-Have Utilities for macOS Sierra
MacWorld
Article
The 10 Must-Have Utilities for macOS Sierra
Jan 24, 2017
12 min read
Contributing For Non - Coders
Linux Format
Article
Contributing For Non - Coders
Jan 10, 2023
9 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
Family History Software: An Introduction
Family Tree UK
Article
Family History Software: An Introduction
Feb 11, 2020
5 min read
Make AI Work For You
Linux Format
Article
Make AI Work For You
Apr 2, 2024
8 min read
An Expert Speaks Up on What You Should Know About Programming Languages
Entrepreneur
Article
An Expert Speaks Up on What You Should Know About Programming Languages
Oct 1, 2015
1 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
How We Tested…
Linux Format
Article
How We Tested…
Aug 24, 2021
To explore the capabilities of these terminal-based browsers, we used them to access popular websites and tried to download data where applicable. We also attempted tasks that you would normally carry out on those sites. Once we had accessed a site,
1 min read
20 Computer Performance Tips
Music Tech Focus
Article
20 Computer Performance Tips
Sep 1, 2016
7 min read
CalicoPie Family Historian 7
Computeractive
Article
CalicoPie Family Historian 7
Mar 24, 2021
SOFTWARE | £60 from Family Historian Store www.snipca.com/37615 If you’ve ever researched your family tree, you’ll know it’s much harder than the BBC’s celebrity genealogy programme Who Do You Think You Are? makes it appear. You’ll certainly need to
2 min read
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Business Today
Article
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Jan 20, 2023
2 min read
Software Subscription Overload: Which Services Are Worth Paying For?
PCWorld
Article
Software Subscription Overload: Which Services Are Worth Paying For?
Aug 2, 2022
6 min read
Arq 7 Backup: Uniquely Versatile Local And Online Backup
PCWorld
Article
Arq 7 Backup: Uniquely Versatile Local And Online Backup
Aug 1, 2023
4 min read
Your Data Is Yours
Linux Format
Article
Your Data Is Yours
May 4, 2021
“It’s nice to let someone else take over occasionally. For example, you might want to hand over database management to a service provider to free up capacity and focus on key business initiatives. Managed databases are increasingly popular – in our l
1 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
Mastering PDFs: How To Create, Convert, And Search
PCWorld
Article
Mastering PDFs: How To Create, Convert, And Search
Jan 3, 2024
7 min read
One Tree To Rule Them All
Family Tree
Article
One Tree To Rule Them All
Apr 19, 2022
7 min read
20 Computer Performance Tips
Music Tech Focus
Article
20 Computer Performance Tips
Jun 2, 2016
7 min read
Benchmark your SSD
APC
Article
Benchmark your SSD
Nov 2, 2020
4 min read
Finding Your Data
APC
Article
Finding Your Data
Sep 9, 2019
4 min read
Mailserver
Linux Format
Article
Mailserver
Jun 27, 2023
4 min read
The Best Free Backup Software And Services: Reviews And Buying Advice
PCWorld
Article
The Best Free Backup Software And Services: Reviews And Buying Advice
Mar 6, 2018
9 min read
Password Managers
Linux Format
Article
Password Managers
Feb 6, 2024
14 min read

Related categories

Skip carousel

Reviews for Preparing Data for Analysis with JMP

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Preparing Data for Analysis with JMP - Robert Carver

Chapter 1: Data Management in the Analytics Process

Introduction

A Continuous Process

Asking Questions that Data Can Help to Answer

Sourcing Relevant Data

Reproducibility

Combining and Reconciling Multiple Sources

Identifying and Addressing Data Issues

Data Requirements Shaped by Modeling Strategies

Plan of the Book

Conclusion

References

Introduction

Although reliable estimates are difficult to come by, there seems to be consensus that data preparation—locating, assembling, reconciling, merging, cleaning, and so on—consumes something like 80% of the time required for a statistical project (Press 2016). In comparison to the literature about building statistical models and performing analysis, there are relatively few books written on the topic of data preparation. (Some good examples include McCallum 2013, Osborne 2013, and Svolba, 2006.) This book addresses just that—the unglamorous, time-consuming, laborious, and sometimes dreaded dirty work of statistical investigations. This work is variously known as data wrangling, data cleaning, the janitorial work, or simply data management. Equally important, this book is about data management using JMP, so that JMP users can do nearly all of the wrangling tasks using one software environment, without having to hop off and onto different platforms.

This first chapter places data management and preparation in the context of a larger investigative process. In addition, it introduces some over-arching themes and assumptions that run through the eleven chapters to follow. As a starting point, let’s understand that the work of data management occurs within a process and often within an organizational context.

A Continuous Process

Successful analyses require a disciplined process similar to the one shown here in Figure 1.1 (SAS, 2016). In such a process, a question gives rise to fact-gathering, analysis, and implementation of a solution. The solution, in turn, is monitored and gives rise to further questions. Though process is often portrayed as circular, this particular depiction has the added appeal of connoting an infinite loop with data playing a central role. However, as we’ll see in the pages that follow, the step identified as Prepare is not drawn remotely to scale. That’s the bit that reportedly can require 80% of the time in a project, and will consume an even larger proportion of the pages in this book.

Figure 1.1: One Model of the Analytics Life Cycle (SAS)

For the most part, this book breaks down that single step into component parts, explaining obstacles that arise and offering techniques to address them. JMP is particularly well suited to expediting the smaller steps that make up the work of preparation. In addition, we’ll step into the exploratory stage in later chapters, cycling back as exploration reveals the need for additional pre-processing and preparation. In any event, though, the process properly starts with questions that can be addressed with data.

Asking Questions That Data Can Help to Answer

One underlying assumption in this book is that we undertake statistical investigations because someone has questions (a) that might be answered via empirical investigation and (b) whose answer potentially has measurable value to an organization, society, or an individual. In other words, we’ll assume that the questions matter to someone. That is, there are benefits to be reaped from finding answers. This also implies that those asking the questions are likely to be attuned to the costs of securing the data and performing relevant analysis.

If you are reading this book, you probably have questions that are important to your work or areas of interest. Which environmental hazards have the greatest impact on respiratory health? Which customers will most likely prefer product A over product B? Do tax cuts encourage employers to create new jobs?

Although the questions driving a study often arise within an organization, the data most germane to the question might reside inside or outside the organization. In this discussion, let’s refer to the organization asking the questions as the client. The client might have some of the needed data within its own files and databases (you hope the data are in digital form). Other data might be freely available in the public domain, while other data might be available for a price. Yet other data could be proprietary belonging to competitors, and still other information might not yet have been gathered or curated by anyone.

Early in the process, the analyst or analysis team need to determine the specific data that will serve to address the questions posed or build the models that will have value. This requires domain knowledge, an understanding of what is organizationally feasible and an awareness of what data are available. Perhaps hardest of all, it can require a knowledge of what the team does not know. For some problems, the point of the analysis is feature selection—that is, identifying the variables that matter most, that explain or predict a dependent variable or outcome. These problems cannot be resolved through software, but the search for relevant data certainly can be either facilitated or impeded by the software.

Sourcing Relevant Data

Once the analysts have initial ideas about the types of data to obtain, the challenges of locating pertinent data can be considerable. As noted, the organization’s own data stores might not have exactly what is needed. Sometimes, organizational dynamics or legal or ethical considerations can impede access.

If we’re lucky, external data reside in the public domain and are easily accessible. In other cases, the data are proprietary and belong to a competitor or to an entity that will make them available at a price. Worst of all, sometimes the data simply do not exist in any accessible form, so the analytics team will either need to gather data or use proxy variables, near enough, if you will, to represent the constructs of interest in a study.

This book has little to say about the many hurdles of data procurement. Chapter 3 discusses some of them, but in general this is an area where domain knowledge is key. The analytics team and the client organization need to know where to look for relevant data.

Reproducibility

As you begin to extract data, it is quite important to document the process in detail for the sake of reproducibility. Some projects are one-time, unique tasks, but for others there is value in being able to reproduce all of the steps taken. Where there are considerations of intellectual property rights, this is critical. If you simply want the ability to audit the task for completeness and critical evaluation, a full record and audit trail is indispensable.

The individual most likely to want to reproduce the process in the future is you. Whether others will one day retrace your footsteps, it is altogether likely that you or your team will go back to fetch additional variables, or to build another project on the foundation of this one. So, as the saying goes, be kind to your future self and document (Gandrud 2015, p. 7).

As we’ll see throughout the book, documentation is quite straightforward within a JMP session. What’s more, we can document selectively preserving results and scripts that we ultimately judge worth saving without having to enshrine every error and false start.

Combining and Reconciling Multiple Sources

Many data modeling projects call for data from multiple sources. In addition to locating all of the variables that you might use, analysts often confront the need to reconcile disparities across data sources. Nonstandard abbreviations or coding schemes, different representations of times and dates, and varying units of measurement abound. Before all of the data can be assembled into a single, well-organized table suitable for analysis, the differences need to be ironed out.

Because the irregularities can present themselves at different stages of the preparation phase, they are discussed in several chapters. Chapter 5 through 8 cover many of the aspects of combining and reconciling data.

Identifying and Addressing Data Issues

Once data sources have been identified and targeted, you must consider issues of data integrity. It is the rare data source that supplies complete, accurate, timely raw tabulated data. For many analytical purposes, we’ll want (or require) data tables that are in third normal form – variables in columns and observations in rows, but some data sources won’t be organized that way. Tables will have missing cell, sparse arrays (mostly zeros), or erroneous data values. There will be outliers, skewed distributions, non-linear relationships, and so on.

The later chapters devote considerable attention to (a) ways of detecting such data problems and (b) addressing them. Here again, JMP has extensive functionality to ease the way forward. Note also, that our goal is to address the issues. That is, we cannot always resolve or eliminate the problems. There are generally ways to mitigate data issues when they cannot be directly cured. These are among the matters taken up in Chapters 9 through 11.

Data Requirements Shaped by Modeling Strategies

The analysis plan for a project influences, or should influence, the data management and preparation activities. Modeling methodologies have their own requirements for the organization of a data table, for units of analysis, and for data types. As a simple and familiar example, models built on paired observations will expect to find data pairs in separate columns.

Hence, you need to be constantly mindful of the stages to follow when preparing a set of data for analysis. Data preparation happens within the context of the full investigative cycle for a reason, and that goes beyond variable selection. Chapters 7 and Chapters 9 through 11 touch on these issues.

Plan of the Book

The investigative process is neither linear nor purely sequential, though we depict it as a logical sequence. That notwithstanding, analysts regularly need to loop back to an earlier phase along the way to a successful conclusion.

In a similar way, books do need to present information in sequence. But readers are free to depart from the plan, to iterate, and to bypass materials that are already familiar. Still, it’s helpful to have a mental map of the plan before deviating from it.

I’ve developed the book in three main sections. Part I consists of Chapters 1 through 3. This part builds a foundation that explains the investigative cycle, introduces common methods used to organize raw data, and reviews the array of common challenges arising in different data sources.

Part II gets down to the how of acquiring and structuring a collection of variables for building models. Chapters 4 through 8 present some background concepts and theory, and work through several examples of importing data into JMP from foreign sources and then combining disparate elements into single JMP data tables. Running through Part II is a comprehensive illustrative example about the competitive success of world nations in the summer Olympic Games.

Lastly, Part III (Chapters 9 through 12) covers approaches to some of the thorniest data preparation problems: how to detect problematic data, how to deal with missing data, and how to transform and otherwise modify variables for analysis. Chapter 12 reverses, in a sense, the work of data acquisition by demonstrating ways to share or export data and results to platforms other than JMP.

Conclusion

The main take-away from this chapter is that data management is a large and complex piece of a larger and more complex process. It might not be the most glamorous part of modeling and data analytics, but it is as essential to success as proper soil preparation is to abundant crop yields in agriculture. This chapter has described the process with broad strokes and outlined the tasks the lie ahead.

Before we begin the search for data to wrangle and manage, it’s useful to understand some things about the underlying structure of data that we might find. The next chapter reviews and explains the most common ways of organizing and representing raw data.

References

Asay, Matt. 2016 NoSQL keeps rising, but relational databases still dominate big data. TechRepublic.com, April 5, 2016. Available at http://www.techrepublic.com/article/nosql-keeps-rising-but-relational-databases-still-dominate-big-data/.

Carver, Robert., Michelle Everson, John Gabrosek, Nicholas Horton, Robin Lock, Megan Mocko, Allan Rossman, Ginger Holmes Rowell, Paul Velleman, Jeffrey Witmer, and Beverly Wood. 2016. Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report 2016. Alexandria VA: American Statistical Association.

Gandrud, Christopher. 2015. Reproducible Research with R and RStudio, 2nd Edition. Boca Raton, FL: CRC Press.

McCallum, Q. Ethan, ed. 2013. Bad Data Handbook. Sebastopol, CA: O’Reilly Media.

Osborne, Jason W. 2013. Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data. Thousand Oaks CA: Sage.

Press, Gil. 2016. Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes, March 23, 2016. Available online at http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#4fe4cf1e7f75.

SAS Institute Inc. 2016. SAS Institute white paper. Managing the Analytical Life Cycle for Decisions at Scale: How to Go From Data to Decisions as Quickly as Possible. Cary, NC: SAS Institute Inc.

Svolba, Gerhard. 2006. Data Preparation for Analytics Using SAS. Cary, NC: SAS Institute Inc.

Tintle, Nathan., Beth L. Chance, George W. Cobb, Allan J. Rossman, Soma Roy, and Jill VanderStoep. 2014. Introduction to Statistical Investigations. Hoboken, NJ, Wiley.

Wild, C. J., and M. Pfannkuch. 1999. Statistical Thinking in Empirical Enquiry. International Statistical Review, 67(3), 223-265.

Chapter 2: Data Management Foundations

Introduction

Matching Form to Function

JMP Data Tables

Data Types and Modeling Types

Data Types

Modeling Types

Basics of Relational Databases

Conclusion

References

Introduction

Data analysis and modeling requires raw data in digital form. At some point in the investigative process, a person or an automated sensor captured data from a real world event or entity and entered and recorded it digitally. People make decisions about what to capture, when to capture, how much to capture and how best to organize data, and these decisions can be based on numerous considerations. This chapter introduces some of the decision alternatives as well as some of the basic building blocks of data representation, and we will treat these topics in greater depth in Part II of the book. The fundamental concept here is that the transition between real world events and entities to digital format requires choices and data management.

For most analytic purposes, we organize data into a two-dimensional tabular format in which each row is an individual observation or case, and each column contains a measurement or attribute pertaining to that individual. There are times, though, when this is not the only option or even the preferred option. Hence, decisions about the structure and content of a data table are driven by the information needs of a particular investigation as well as the intended analytic purposes.

Some investigators are in a position to develop, gather, and structure data from scratch, while others might avail themselves of digital data that was gathered and stored by others. For some studies, the necessary data might be sitting neatly in a single accessible table while for other studies the investigator must combine and cull data from multiple sources. The tasks begin with an understanding of your investigative goals.

Matching Form to Function

As an organizing example (to be revisited in this and later chapters) we will consider data related to members of the U.S. House of Representatives. As a starting point, visit http://www.house.gov/representatives/ for a list of the 435 individuals currently serving in the House. Figure 2.1 shows a portion of a directory in June 2016 (U.S. House 2016).

Figure 2.1 Directory of Members of the U.S. House of Representatives

For each state the website identifies the representative for each congressional district, listing party affiliation, office location, phone number and committee assignments. Within the list, the names are hyperlinked to individual web pages for each member, and these in turn contain varying information about pending legislation, voting records, news articles, videos of speeches and appearances, and links for constituent services and social media among other items. Some of the information might be standard for all members (for example, office locations and contact information), representatives customize their sites as they see fit.

The variety of data and quantity of data behind this directory page are rich, and the table layouts vary as we explore the links. Some of the data is relatively stable over time and some can change frequently. Some changes occur on a regular schedule, such as those related to election outcomes. Other dynamic information happens when it happens, such as news events.

The designers of the directory page in Figure 2.1 made choices about which data elements to display, in what sequence, what to abbreviate and what to spell out. Their purpose was presumably to allow for flexible access to individual members of the House, but a prospective analyst or modeler might want all or part of the related data for another well- or loosely defined purpose. A static or interactive report will have one set of data requirements and challenges, while a statistical modeling project will likely call for different data requirements. Data mining projects or business intelligence applications have yet different goals and therefore different data needs.

In this chapter we’ll introduce some basic ways of classifying and organizing data. Later chapters revisit these options in greater depth and provide relevant JMP instructions.

JMP Data Tables

In the JMP environment the fundamental framework for data organization is the data table. A JMP data table is a two dimensional grid in which each column is typically a variable and each row an observation or case. JMP recognizes that data elements have unique properties and can play particular roles within an analysis. Some distinctions are rooted in considerations of efficient data storage, and others constrain or define

Enjoying the preview?

Page 1 of 1

Preparing Data for Analysis with JMP

About this ebook

Robert Carver

Read more from Robert Carver

Related authors

Related to Preparing Data for Analysis with JMP

Related ebooks

Applications & Software For You

Related podcast episodes

Related articles

Related categories

Reviews for Preparing Data for Analysis with JMP

What did you think?

Book preview

Preparing Data for Analysis with JMP - Robert Carver

Introduction

A Continuous Process

Asking Questions That Data Can Help to Answer

Sourcing Relevant Data

Reproducibility

Combining and Reconciling Multiple Sources

Identifying and Addressing Data Issues

Data Requirements Shaped by Modeling Strategies

Plan of the Book

Conclusion

References

Introduction

Matching Form to Function

JMP Data Tables