Sunday 19 June 2016

Data Science Course Content

Please find the attached data science course content.  

Course Name: Data Science 

Duration: 75 to 95 hours, an hour everyday, 3 Months.

Fee: $1200. Instalments will be accepted. 

Domain KT: Banking, Retail, Telecom and Insurance
​​
Training Mode: Online through gotomeeting

Projects: Sample projects will be assigned along with R , Python and Spark code support

Payment Mode: Credit/Debit Cards/Online Transfers will be accepted. 

Job support, Interview and Resume building support will be provided.


Data Science by B N Reddy
About this Course
In this course you will understand all basics to advanced statistics and learn how to program in R & Python and how to use R & Python for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing, which includes programming in R & Python, reading data into R & Python, accessing R packages & Python data science library and frameworks, writing R & Python functions, debugging, profiling R & Python code. Topics in statistical data analysis will provide working examples.
What are the pre-requisites?
There are no pre-requisites. No prior knowledge of Statistics, the language of R, Python or analytic techniques is required.
This course covers from basic to advanced Statistics and Machine Learning Techniques
Duration
75 to 95 Hours, An Hour Session Every Day

1.             Introduction to Data Science

·      Introduction to Data Science, Tables, Database, ETL, EDW and Data Mining
·      What is Data Science?
·      Popular Tools
·      Role of Data Scientist
·      Analytics Methodology






Data Science Techniques of Math and Statistics


2.             Descriptive and Inferential Statistics

Statistics is concerned with the scientific method by which information is collected, organized, analyzed and interpreted for the purpose of description and decision-making.
There are two subdivisions of statistical method.
(a) Descriptive Statistics - It deals with the presentation of numerical facts, or data, in 
either tables or graphs form, and with the methodology of analyzing the data. 

(b) Inferential Statistics - It involves techniques for making inferences about the whole population on the basis of observations obtained from samples. 


Ø Samples and Populations
§  Sample Statistics
§  Estimations of Population Parameters
§  Random and Non-random Sampling
§  Sampling Distributions
§  The Central limit Theorem
§  Degree of Freedom
Ø Percentiles and Quartiles
Ø Measures of Central Tendency
§  Mean
§  Median
§  Mode
Ø Measures of Variability/Dispersions
§  Range
§  IQR
§  Variance
§  Standard Deviation
Ø Skewness and Kurtosis
Ø Probability Distributions
§  Events, Sample Space and Probabilities
§  Conditional Probabilities
§  Independence of Events
§  Bayes’ Theorem
Ø Random Variable
Ø The Normal Distributions
Ø Confidence Intervals
Ø Hypothesis Testing
§  Null Hypothesis
§  The Significance Level
§  p-value
§  Type I and Type II Errors
Ø Inferential Test Metrics
§  t test
§  f test
§  Z test
§  Chi square test
§  Student test

Ø The Comparison of Two Populations
Ø Analysis of Variance
§  ANOVA Computations
§  Two-way ANOVA

3.             Data Exploration and Dimension Reduction
Ø Data Summaries
Ø Covariance, Correlation, and Distances
Ø Missing Values Handling
Ø Outliers Handling
Ø Principal Component Analysis
Ø Exploratory Factor Analysis

4.             Machine Learning: Introduction and Concepts
Ø Differentiating algorithmic and model based frameworks
Ø Regression
§  Ordinary Least Squares
§  Ridge Regression
§  Lasso Regression
§  K Nearest Neighbours Regression & Classification

5.             Supervised Learning with Regression and Classification
Ø Bias-Variance Dichotomy
Ø Model Validation Approaches
§  Training Set
§  Validation Set
§  Test Set
§  Cross-Validation
Ø Logistic Regression
Ø Linear Discriminant Analysis
Ø Quadratic Discriminant Analysis
Ø Regression and Classification Trees
§  Recursive Portioning
§  Impurity Measures (Entropy and Gini Index)
§  Pruning the Tree
Ø Support Vector Machines
Ø Ensemble Methods
§  Bagging (Parallel Ensemble) – Random Forest
§  Boosting (Sequential Ensemble) – Gradient Boosting
Ø Neural Networks
§  Structure of Neural Network
§  Hidden Layers and Neurons
§  Weights and Transfer Function
Ø Deep learning
§  Integrated best features of both Machine Learning and NN
Ø Forecasting (Time-Series Modelling )
§  Trend and Seasonal Analysis
§  Different Smoothing Techniques
§  ARIMA Modelling
§  ETS Modelling

6.             Unsupervised Learning
Ø Clustering
§  Hierarchical (Agglomerative) Clustering
§  Non-Hierarchical Clustering: The k-Means Algorithm
Ø Associative Rule Mining
§  Aprori Algorithms
§  Frequent Item-sets
§  Support
§  Confidence
§  Lift Ratio
§  Discovering Association Rules

7.             Text Mining
Ø Sentiment Analysis
Ø User Behaviour Analysis
Ø Topic Categorization
Ø Topic Ranking

8.             Recommender Engines:
Ø Collaborative Filtering Recommenders
Ø Content Based Recommenders

Data Science Techniques Implementation by R - Language


9.             Introduction to R Foundation
§  Software Installation on Various Operating Systems
§  Introduction to Real Time Applications
§  Introduction to Popular Packages

10.         R-Analytical Tool (Data Mining / Machine Learning)
Ø Basic Data Types
Ø R Data Structures
§  Vectors
§  Matrix
§  Data Frames
§  List
Ø R Functions
Ø Predictive Modelling Project based on R
Ø Classification Modelling Project based on R
Ø Clustering Project based on R
Ø Association Mining Project based on R
Ø R Visualization Packages
Ø Machine Learning Packages in R

Data Science Techniques Implementation by Python


11.         Python - Getting Started

Ø Installing Python on Windows
Ø Installing Python on Mac and Linux
Ø Introduction to Editors
Ø Installing PyCharm and Sublime Editors

12.         Python Basics

Ø Numbers and Math in Python
Ø Variable and Inputs
Ø Built in Modules and Functions
Ø Save and Run Python Files
Ø Strings
Ø Python List
Ø Python slices and slicing

13.         Python Scientific Libraries for Machine Learning
Ø Scikit-Learn
Ø Numpy
Ø Scipy
Ø Pandas
Ø Matplotlib



14.         Introduction to Data Visualization

Ø Introduction to Data Science and Visualization Tools in Python
Ø Installing and Setting up iPython Notebook
Ø Installing Anaconda and Panda
Ø Setting Up Environment

15.          Learning Numpy

Ø Creating Arrays
Ø Using Arrays and Scalars
Ø Indexing Arrays
Ø Array Transposition
Ø Universal Array Function
Ø Array Processing
Ø Array Input and Ouput

16.         Working with Panda

Ø Series
Ø Data Frames
Ø Index Objects
Ø Reindex
Ø Drop Entry
Ø Selecting Entries
Ø Data Alignment
Ø Rank and Sort
Ø Summary Statistics
Ø Missing Data
Ø Index Hierarchy



17.         Working with Data Part1

Ø Reading and Writing Text Files
Ø Json with Python
Ø HTML with Python
Ø Microsoft Excel Files with Python

18.         Working with Data Part2

Ø Merge, Merge on Index and Concatenate
Ø Combining Data Frames
Ø Reshaping and Pivoting
Ø Duplicating Data Frames
Ø Mapping, Replacing, Rename Index and Binning
Ø Outliers and Permutations

19.         Working with Data Part3

Ø Group by on Data Frames
Ø Group by on Dist Series
Ø Aggregation
Ø Splitting, Applying and Combining
Ø Cross Tabulation




20.         Working with Visualization

Ø Installing Seaborn
Ø Histograms
Ø Kernel Density and Estimate Plots
Ø Combining Plot Styles
Ø Box and Violin Plots
Ø Regression Plots
Ø Heat Maps and Clustered Matrices
Ø Example Projects -15


21.         Machine Learning Language

Ø Introduction
Ø Linear Regression
Ø Logistic Regression
Ø Multi Class Classification – Logistic Regression
Ø Multi Class Classification – Nearest Neighbor
Ø Vector Machines
Ø Naïve Bayes Theory

22.         Prescriptive analytics ( Optimization Techniques)
Ø Analytics through designed experiments
Ø Analytics through  Active learning
Ø Analytics through Reinforcement learning

23.         Data Science based Projects
Ø Cover couple of Real-Time Analytics Projects based on R Script and Python Scientific Libraries.

24.         SPARK MLlib (Scalable Machine Learning)
Ø RDD Concept
Ø Spark MLlib: Data Types, Algorithms, and Utilities

No comments:

Post a Comment