About Data Science Training in Bangalore
Curriculum
1.
Introduction to Probability and Statistics for Data Science
This module aims at preparing you for the essential skill of
thinking like a statistician. This module will enable you to change your
analytical thinking process, and you will begin to start looking at data and
numbers from a different perspective. This is a fundamental module and strong
concepts in this area will enable you to differentiate yourself as a Data
Scientist. This module covers • Probability
theory and related algorithms • Descriptive
statistical methods • Inferential
statistical methods From a tools perspective, you will gain confidence with
tools like R and Excel Fundamentals of Probability • Introduction to random variables • Probability theory • Conditional
probability • Bayes Theorem The
Concept of a data set • Understanding
the properties of an attribute: Central tendencies (Mean, Median, Mode); • Measures of spread (Range, Variance,
Standard Deviation) • Basics of
Probability Distributions; Expectation and Variance of a variable Probability
distribution and differences between discrete and continuous distributions • Discrete probability distributions: Binomial,
Poisson • Continuous probability distributions: Normal distribution;
t-distribution. Procedure for gaining inference about populations from samples.
Understand the data attributes, distributions, sample vs population Procedure
for statistical testing • Extend the
understanding to analyze relationships between variables • How to conduct statistical hypothesis
testing and introduction to various methods such as chi-square test, t-test,
z-test, F-test and ANOVA • Covariance
and Correlation and a Precursor to Regression • Hands-on Implementation in R
2.Essential mathematical concepts:
•Vectors, Matrices, Eigen values, Eigen vectors,
Orthogonality, etc. • Kernel tricks,
kernel functions, PCA, SVD, LSA • Hands-on
implementation in R
3.Essential Engineering skills for Data
Science
Data
preprocessing techniques • Python and R basics • Database Concepts • String and
list objects • Exception handling • Understanding of data structures,
functions, control structures, data manipulations, date and string
manipulations • Pre-processing techniques: Binning, Filling missing values,
Standardization and Normalization, Type conversions, train-test Data split,
ROCR1
4.Data
Exploration, Data Visualizations and Data Story
•
Need for Visualizations • How to tell a Data Story • Communicating with data:
Issues and guiding principles; Primary ingredients of data visualizaon; How to
pick visual encodings such as color, shape, size; Which chart to use when; How
to accommodate more than 2 dimensions • A case highlighting the transition from
a simple chart to a powerful visualization, complete with storytelling • Using
R-ggplots and Qliksens for visualizations
5.Introduction
to Planning and Architecting Data Science Solutions
Introduction
to Planning and Architecting Data Science Solutions • Frameworks to analyze a
data science problem • How to choose an error metrics • What are the efficient
ways to present results of data Science and data Analytics • What are different
forms in which data is available
6.Introduction
to Machine Learning - Methods and Algorithms
Fundamentals of Linear regression. • Linear
regression Relationship between multiple variables: regression (Linear, Multi
variate Linear Regression) in prediction. • Understanding the summary output of
Linear Regression • Residual Analysis • Identifying significant features,
feature reduction using AIC, multicollinearity check, observing influential
points. • Non-normality and Heteroscedasticity • Hypothesis testing of
regression Model • Confidence intervals of Slope • R-square and goodness of fit
• Influential observations- leverage of Multiple linear Regression • Polynomial
Regression • Categorical Variables in Regression • Hands-on Linear Regression
Introduction and deep dive into logistic regression and the important concept
of ROC curves • Logistic Regression • ROC curves • Logistic regression in
classification; output interpretations • Hands-on logistic Regression Time
Series Analysis • Decomposition of Time Series • Trend and Seasonality detection
and forecasting • Smothering Techniques • Understanding ACF & PCF plots •
ARIMA Modeling • Holt-Winter Method Principles and ideas in the field of Data
Mining • Rule patterns, construction of rule-based classifier from data,
turning trees into rules, rule growing strategy, rule evaluation and stopping
criteria, several business metrics such as action ability, explicability and
later turns towards association rules and cover them in detail. • Indirect from
decision trees • Direct: Sequential covering • Market Basket Analysis, Apriori,
Recommendation engines, Association Rules • How to combine clustering and
classification • How to measure the quality of clustering – outlier analysis •
Association Analysis • FP Trees • Hands-on with R Introduction and deep dive
into logistic regression and the important concept of ROC curves • Top
Induction of decision trees (TDIDT) • Attribution selection based on
information theory approach • Recursive partitioning (binary search) • Id3,
C4.5, C5.0 for pattern recognition problems, avoiding over fitting, converting
trees to rules • Hands-on with R Distance-based classifiers • K-Nearest
Neighbor algorithm • Aspects to consider while designing K-Nearest Neighbor •
Hands-on example of K-Nearest Neighbor using R • Collaborative filtering Neural
networks • Perceptron and Single Layer Neural Network. • Back Propagation
algorithm and a typical Feed Forward Neural Net. • Hands-on with R with a Case.
Support vector machines (SVM). • Linear learning machines and kernel space, making
kernels and working in feature space. • SVM algorithm and comparison with
Neural Nets • Demonstrate the working of SVM classification problems using a
business case in R. Ensemble methods • Bagging and boosting and its impact on
bias and variance • C 5.0 boosting • Random Forest • AdaBoost • Gradient
boosting machines Unsupervised learning algorithm-Clustering • Different
clustering methods; review of several distance measures • Iterative
distance-based clustering • Dealing with continuous, categorical values in
K-Means • Constructing a hierarchical cluster, K-medoids, k-mode and
density-based clustering to handle different types in practice • Test for
stability check of clusters • Hands on implementation of each of these methods
will be conducted in R. Bayesian belief nets, Naïve Bayes, popular techniques
to handle Overfitting and Under fitting • Introduction to generative techniques
• Bayesian belief nets (BBN) • Naïve Bayes- a special case of BBN • Hands-on
Naïve Bayes in R • How to avoid Overfitting and Under fitting • Refresher on
all the machine learning algorithms
7. Text
Mining and Natural language Processing
Text processing algorithms Basics of search
engines • Introduction to the Fundamentals to the information retrieval;
Language modeling • N-gram models of language Smoothing and probabilistic
language models • Query likelihood model • 2-stage smoothing • Text Indexing
and Crawling • Inverted Indexes • Boolean query processing • Handling phrase
queries • Proximity queries • Crawling Relevance Ranking • Need for Relevance
Ranking • TF and IDF • Thinking about the math behind the text; • Properties of
words; Vector Space Model • Evaluation metrics for Ranking Link Analysis
Algorithms • PageRank • HITS • Topic-sensitive PageRank • Spam Detection
Algorithms Natural Language Processing • Stemming, phrase identification, word
sense disambiguation • POS tagging Parsing and semantic structures Conference
resolution Named Entity Recognition • What is NER? • Possible applications of
NER • Evaluation and testing • NER methods
8.DEEP LEARNING USING TENSORFLOW
•
Basics of neural network • Linear algebra • Implementation of neural network in
Vanilla • Basics of TensorFlow • Convolutional neural networks (CNNs) •
Recurrent neural networks (RNNs) • Generative models • Semi-supervised learning
using GAN • Seq-to-seq model • Encoder and decoder