About Data Science Training in Bangalore
Datascience Course Content & Curriculum
Module 1 – Getting started with Data Science and Recommender Systems
What is Data Sceince?
Reasons to use Data Science
Evaluation of Input Data
Statistical and analytical methods to work with data
Machine Learning basics
Introduction to Recommender systems
Apache Mahout Overview
Module 2 – Reasons to Use, Project Lifecycle
What is Data Science?
What Kind of Problems can you solve?
Data Science Project Life Cycle
Data Science-Basic Principles
Understanding Data- Attributes in a Data, Different types of Variables
Build the Variable type Hierarchy
Two Dimensional Problem
Co-relation b/w the Variables- explain using Paint Tool
Outliers, Outlier Treatment
Boxplot, How to Draw a Boxplot
Module 3 – Acquiring Data
Discussion on Boxplot- also Explain
Example to understand variable Distributions
What is Percentile? – Example using Rstudio tool
How do we identify outliers?
How do we handle outliers?
Outlier Treatment: Using Capping/Flooring General Method
Distribution- What is Normal Distribution?
Why Normal Distribution is so popular?
Module 4 – Machine Learning in Data Science
Discussion about Boxplot and Outlier
Goal: Increase Profits of a Store
Areas of increasing the efficiency
Business Problem: To maximize shop Profits
What are Interlinked variables
What is Strategy
Interaction b/w the Variables
Relation b/w Variables
What is Hypothesis?
Interpret the Correlation
Module 5 –Statistical and analytical methods dealing with data, Implementation of Recommenders using Apache Mahout and Transforming Data
Correlation b/w Nominal Variables
What is Expected Value?
What is Mean?
How Expected Value is differ from Mean
Experiment – Controlled Experiment, Uncontrolled Experiment
Degree of Freedom
Dependency b/w Nominal Variable & Continuous Variable
Extrapolation and Interpolation
Univariate Analysis for Linear Regression
Building Model for Linear Regression
Pattern of Data means?
Data Processing Operation
What is sampling?
Stratified Sampling Technique
Disproportionate Sampling Technique
Balanced Allocation-part of Disproportionate Sampling
2 angels of Data Science-Statistical Learning, Machine Learning
Module 6 – Testing and Assessment, Production Deployment and More
Multi variable analysis
Simple linear regration
Speculation vs. claim(Query)
Step to test your hypothesis
Generate null hypothesis
Testing the hypothesis
Hypothesis testing explanation by example
Histogram of mean value
Revisit CHI-SQUARE independence test
Correlation between Nominal Variable
Module 7 – Business Algorithms, Simple approaches to Prediction, Building model, Model deployment
Importance of Algorithms
Supervised and Unsupervised Learning
Various Algorithms on Business
Simple approaches to Prediction
Steps in Model Building
Sample the data
What is K?
Find the accuracy
Deploy the model
Module 8 – Getting started with Segmentation of Prediction and Analysis
Cluster and Clustering with Example
Data Points, Grouping Data Points
Horizontal & Vertical Slicing
Criteria for take into Consideration before doing Clustering
Clustering & Classification: Exclusive Clustering, Overlapping Clustering, Hierarchy Clustering
Simple Approaches to Prediction
Different types of Distances: 1.Manhattan, 2.Euclidean, 3.Consine Similarity
Clustering Algorithm in Mahout
Nearest Neighbor Prediction
Nearest Neighbor Analysis
Module 9 – Integration of R and Hadoop
How R is typically used
Features of R
Introduction to Big data
Ways to connect with R and Hadoop
Steps for Installing RIMPALA
How to create IMPALA packages