You are on page 1of 5

Centre for Development of Advanced Computing

INDUSTRIAL TRAINING IN BIG DATA &DATA ANALYTICS

Course Name Big Data & Data Analytics

Duration 6 Months (4 hours/day)


Prerequisites Basic familiarity of the Operating System Concepts & Linux
usage.
Candidates should have knowledge of Fundamentals of
Programming, Java/Python/C, C++ Programming Languages
knowledge is a plus, but not extremely required.

Who Should Attend? Engineering undergraduates Students or Students pursuing


MCA, M.Sc(CS/IT)
What to expect? Candidate will have hands-on with the Data Science basics,
concepts and application of Data analysis.
Hands-on with various Data Analytics Algorithms with Big
data infrastructure for analysis which can be applied on real-
world problem.
At the end of the course, candidate should be able to
conceptualize & design the approach for the data analysis
problems.
Course Focus The theoretical and practical mix of the Data Analytics & Big
Data course has the following objectives:
To learn the fundamental concepts of Statistics
To learn the basics of programming on R tool
To use advanced analytical tools/decision-
making tools/operation research techniques to
analyse the complex problems and get ready to
develop such new technological advancement.
(R, Weka etc.)
To learn the fundamental for preparing of data-
set for analysis.
To learn and understand the concepts of machine
learning & data mining algorithms
To explore the fundamental concepts of big data
analytics
To develop in-depth knowledge and
understanding of the big data analytics domain
Centre for Development of Advanced Computing

To learn and analyse the big data using


intelligent techniques
To understand the applications using Map
Reduce Concepts
To analyse and solve problems conceptually and
practically from diverse industries, such as
government manufacturing, retail, education,
banking/finance, healthcare and pharmaceutical.
To undertake consulting projects with significant
data analysis component for better
understanding of the theoretical concepts from
statistics, economics and related disciplines.
Hardware Multi-core 64-bit CPU System,
Requirements Minimum 4GB RAM,
Minimum 20 GB Hard disk space.
Software Red Hat Linux or Ubuntu or Windows
Requirements Java 1.7 or latest version
Virtual Machine Software
Course Coordinator Mr. Sanjay Madan
Faculty Members Mr. Rakesh Kumar Sehgal
Mr. Sanjay Madan
Mr. Saurabh Chamotra
Mr. Sanjeev Kumar
Ms. KratiPaliwal
Ms. Tamanna Goyal
Centre for Development of Advanced Computing

INDUSTRIAL TRAINING IN BIG DATA & DATA ANALYTICS


(6 Months Training Program)
Course Contents:

1. Descriptive & Inferential Statistics


Basic Statistics Measures of Central Tendencies and Variance
Probability Distribution Normal Distribution, Central Limit Theorem
Inferential Statistics Sampling, Concept of Hypothesis Testing
Statistical Methods
i. Z/t-tests (One sample, independent, paired),
ii. Regression, ANOVA (Analysis of Variances),
iii. Correlation and Chi-square.

2. Introduction to R
Data types, Sub setting, Writing Data, Reading Tabular Data files, Creating a
Vector & Vector operations, Initializing Data frame, Control Structures &
Functions, Loop functions & Debugging
Statistics in R Computing basic Statistics, Comparing means of two samples,
Testing a correlation for significance, Classical Tests (t, z, F), ANOVA
Data Visualization in R Creating bar chart, dot plot, Creating a scatter plot, pie
chart, creating a histogram and box plot
Statistical Modelling in R & Data mining in R

3. Data Mining & Processing Data


Introduction to Data Mining, Data Mining Techniques
Data Cleaning, Data Transformation, Data reduction
Task Relevant Data & Visualization Techniques
Decision Trees Introduction & Applications
Types of Decision Tree Algorithms

4. Machine Learning
Introduction to machine learning
Regression Least Squares, Ridge Regression, Lasso Regression, k-nearest
neighbors Regression & Classification
Supervised Learning Discriminative Algorithms (Linear & Quadratic), Generative
Algorithms, Support Vector Machines, Learning Theory, Regularization & Model
Selection, Perceptron Algorithm
Centre for Development of Advanced Computing

Ensemble Methods Random Forest, Neural Networks, Deep learning


Unsupervised Learning k-means Clustering, Associative Rule Mining, The EM
Algorithm, Factor Analysis, Principal Components Analysis

5. Introduction to Big Data


Big Data Ecosystem
Industries using Big data
Big Data Applications
Challenges when Managing & Analyzing Big Data
Key Components in Big Data Analytic Environment
Introduction to NoSQL Databases

6. Introduction to Hadoop& HDFS


Big Data Hadoop Stack
The Apache Hadoop Framework: Basic Modules
Overview of Hadoop based Applications and Services
HDFS Architecture, Configuration, Design & Role in Hadoop, performance &
tuning, HDFS Access, Commands, APIs and Applications
Hadoop Hardware and Software Requirements
Hadoop Installation

7. Hadoop MapReduce Framework


Introduction to MapReduce
Map/Reduce Framework
Data flow in MapReduce
Introduction to Apache Hive,Apache Pig and HBase
Analyzing data with Pig Pig architecture, program structure & execution
process, Joins & filtering using Pig, Group & co-group, Schema merging &
redefining functions.

8. Introduction to Spark
Introduction to Spark, Components of Spark Unified Stack
Resilient Distributed Dataset (RDD)
Creation of Parallelized Collection & External Datasets, RDD operations
Usage of SparkContext, submission of application to the cluster

9. Tools & Techniques for Analyzing Big Data


Analyzing Big data using MapReduce BI Tools
Exploratory graph Analysis and visualizations
Centre for Development of Advanced Computing

Analyzing Big Data using Self-service BI Tools, e.g. Impala, Hive, Stinger etc.
Big Data Analytics query performance enablers
Managing stream computing in a Big Data environment
Various techniques for streaming analytics

10. Business Analytics


Consumption of Analytics, From Creation to Consumption & How to make it
Consumable.
Understanding of Business pain points with different types of Analytic
applications.
Financial Services, Healthcare, Telecom, Manufacturing Demand forecasting,
steps in hypothesis creation, identify reports and deliverables, data privacy and
security

11. Industrial Relevant Project

You might also like