**What is Data Science?**

In this ** Data science online training ** you will understand all basics to advanced statistics and learn how to program in R & Python and how to use R & Python for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language.

The **data science online training in Hyderabad** covers practical issues in statistical computing, which includes programming in R & Python, reading data into R & Python, accessing R packages & Python data science library and frameworks, writing R & Python functions, debugging, profiling R & Python code. Topics in statistical data analysis will provide working examples.

**Who learn This Data Science course?**

- Non-IT Professionals
- Developers
- Non-BI Professionals
- Data Analysts
- Project Managers
- Job seekers
- Graduates

**How I Execute the practicals? **

career It Trainings provide **data science On line training** related software and tools.

**What are the prerequisites for Data Science On line training?**

The **Data science Online training ** course has no pre-requisites. No prior knowledge of Statistics, the language of R, Python or analytic techniques is required. This course covers from basic to advanced Statistics and Machine Learning Techniques.

## Introduction to **Data Science**

- Introduction to Data Science, Tables,Database,ETL, EDW and Data Mining
- What is Data Science?
- Popular Tools
- Role of Data Scientist
- Analytics Methodology

## Descriptive and Inferential Statistics

Statistics is concerned with the scientific method by which information is collected, organized, analyzed and interpreted for the purpose of description and decision-making.

### There are two subdivisions of statistical method

**Descriptive Statistics ** – It deals with the presentation of numerical facts, or data, in either tables or graphs form, and with the methodology of analyzing the data.

**Inferential Statistics ** – It involves techniques for making inferences about the whole population on the basis of observations obtained from samples.

### Samples and Populations

- Sample Statistics
- Estimations of Population Parameters
- Random and Non-random Sampling
- Sampling Distributions
- Degree of Freedom
- The
**Central limit Theorem**

## Percentiles and Quartiles

### Measures of Central Tendency

- Mean
- Median
- Mode

### Measures of Variability/Dispersions

- Range
- IQR
- Variance
- Standard Deviation

### Skewness and Kurtosis

### Probability Distributions

- Events, Sample Space and Probabilities
- Conditional Probabilities
- Independence of Events
**Baye’s Theorem**

- Random Variable
- The Normal Distributions
- Confidence Intervals
- Hypothesis Testing

- Null Hypothesis
- The Significance Level
- p-value
- Type I and Type II Errors

### Inferential Test Metrics

- t test
- f test
- Z test
- Chi square test
- Student test

### The Comparison of Two Populations

### Analysis of Variance

- ANOVA Computations
- Two-way ANOVA

## Data Exploration and Dimension Reduction

- Data Summaries
- Covariance, Correlation, and Distances
- Missing Values Handling
- Outliers Handling
- Principal Component Analysis
- Exploratory Factor Analysis

## Machine Learning:

** Introduction and Concepts : Differentiating algorithmic and model based frameworks **

### Regression

- Ordinary Least Squares
- Ridge Regression
- Lasso Regression
- K Nearest Neighbours Regression & Classification

## Supervised Learning with Regression and Classification

**Bias-Variance Dichotomy****Model Validation Approaches**

- Training Set
- Validation Set
- Test Set
- Cross-Validation

**Logistic Regression**- Linear Discriminant Analysis
- Quadratic Discriminant Analysis

### Regression and Classification Trees

- Recursive Portioning
- Impurity Measures (Entropy and Gini Index)
- Pruning the Tree

### Support Vector Machines

### Ensemble Methods

- Bagging (Parallel Ensemble) – Random Forest
- Boosting (Sequential Ensemble) – Gradient Boosting

### Neural Networks

- Structure of Neural Network
- Hidden Layers and Neurons
- Weights and Transfer Function

### Deep learning

- Integrated best features of both Machine Learning and NN

### Forecasting ( Time-Series Modelling )

- Trend and Seasonal Analysis
- Different Smoothing Techniques
- ARIMA Modelling
- ETS Modelling

## Unsupervised Learning

### Clustering

- Hierarchical (Agglomerative) Clustering
- Non-Hierarchical Clustering: The k-Means Algorithm

### Associative Rule Mining

- Aprori Algorithms
- Frequent Item-sets
- Support
- Confidence
- Lift Ratio
- Discovering Association Rules

## Text Mining

- Sentiment Analysis
- Use Behaviour Analysis
- Topic Categorization
- Topic Ranking

## Recommender Engines:

- Collaborative Filtering Recommenders
- Content Based Recommenders

## Data Science Techniques Implementation by **R – Language**

## Introduction to R Foundation

- Software Installation on Various Operating Systems
- Introduction to Real Time Applications
- Introduction to Popular Packages

## R-Analytical Tool (Data Mining / Machine Learning)

- Basic Data Types
- R Data Structures
- Vectors
- Matrix
- List
- Data Frames
- R Functions
- Predictive Modelling Project based on R
- Classification Modelling Project based on R
- Clustering Project based on R
- Association Mining Project based on R
- R Visualization Packages
- Machine Learning Packages in R

## Python – Getting Started

- Installing Python on Windows
- Installing Python on Mac and Linux
- Introduction to Editors
- Installing PyCharm and Sublime Editors

## Python Basics

- Numbers and Math in Python
- Variable and Inputs
- Built in Modules and Functions
- Save and Run Python Files
- Strings
- Python List
- Python slices and slicing

## Python Scientific Libraries for Machine Learning

- Scikit-Learn
- Numpy
- Scipy
- Pandas
- Matplotlib

## Introduction to Data Visualization

- Introduction to Data Science and Visualization Tools in Python
- Installing and Setting up iPython Notebook
- Installing Anaconda and Panda
- Setting Up Environment

## Learning Numpy

- Creating Arrays
- Using Arrays and Scalars
- Indexing Arrays
- Array Transposition
- Universal Array Function
- Array Processing
- Array Input and Ouput

## Working with Panda

- Series
- Data Frames
- Index Objects
- Reindex
- Drop Entry
- Selecting Entries
- Data Alignment
- Rank and Sort
- Summary Statistics
- Missing Data
- Index Hierarchy

## Working with Data Part 1

- Reading and Writing Text Files
- Json with Python
- HTML with Python
- Microsoft Excel Files with Python

## Working with Data Part 2

- Merge, Merge on Index and Concatenate
- Combining Data Frames
- Reshaping and Pivoting
- Duplicating Data Frames
- Mapping, Replacing, Rename Index and Binning
- Outliers and Permutations

## Working with Data Part 3

- Group by on Data Frames
- Group by on Dist Series
- Aggregation
- Splitting, Applying and Combining
- Cross Tabulation

## Working with Visualization

- Installing Seaborn
- Histograms
- Kernel Density and Estimate Plots
- Combining Plot Styles
- Box and Violin Plots
- Regression Plots
- Box and Violin Plots
- Heat Maps and ClusteredMatrices
- Example Projects-15

## Machine Learning Language

- Introduction
- Linear Regression
- Logistic Regression
- Multi Class Classification – Logistic Regression
- Multi Class Classification – Nearest Neighbor
- Vector Machines
- Naï¿½ve Bayes Theory

## Prescriptive analytics ( Optimization Techniques )

- Introduction
- Analytics through designed experiments
- Analytics through Active learning
- Analytics through Reinforcement learning

## Data Science based Projects

- Cover couple of Real-Time Analytics Projects based on R Script and Python Scientific Libraries.

## SPARK MLlib (Scalable Machine Learning)

- RDD Concept
- Spark MLlib: Data Types, Algorithms, and Utilities