What is Data Science?
In this Data science online training you will understand all basics to advanced statistics and learn how to program in R & Python and how to use R & Python for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language.
The data science online training in Hyderabad covers practical issues in statistical computing, which includes programming in R & Python, reading data into R & Python, accessing R packages & Python data science library and frameworks, writing R & Python functions, debugging, profiling R & Python code. Topics in statistical data analysis will provide working examples.
Who learn This Data Science course?
- Non-IT Professionals
- Developers
- Non-BI Professionals
- Data Analysts
- Project Managers
- Job seekers
- Graduates
How I Execute the practicals?
career It Trainings provide data science On line training related software and tools.
What are the prerequisites for Data Science On line training?
The Data science Online training course has no pre-requisites. No prior knowledge of Statistics, the language of R, Python or analytic techniques is required. This course covers from basic to advanced Statistics and Machine Learning Techniques.
Introduction to Data Science
- Introduction to Data Science, Tables,Database,ETL, EDW and Data Mining
- What is Data Science?
- Popular Tools
- Role of Data Scientist
- Analytics Methodology
Descriptive and Inferential Statistics
Statistics is concerned with the scientific method by which information is collected, organized, analyzed and interpreted for the purpose of description and decision-making.
There are two subdivisions of statistical method
Descriptive Statistics – It deals with the presentation of numerical facts, or data, in either tables or graphs form, and with the methodology of analyzing the data.
Inferential Statistics – It involves techniques for making inferences about the whole population on the basis of observations obtained from samples.
Samples and Populations
- Sample Statistics
- Estimations of Population Parameters
- Random and Non-random Sampling
- Sampling Distributions
- Degree of Freedom
- The Central limit Theorem
Percentiles and Quartiles
Measures of Central Tendency
- Mean
- Median
- Mode
Measures of Variability/Dispersions
- Range
- IQR
- Variance
- Standard Deviation
Skewness and Kurtosis
Probability Distributions
- Events, Sample Space and Probabilities
- Conditional Probabilities
- Independence of Events
- Baye’s Theorem
- Random Variable
- The Normal Distributions
- Confidence Intervals
- Hypothesis Testing
- Null Hypothesis
- The Significance Level
- p-value
- Type I and Type II Errors
Inferential Test Metrics
- t test
- f test
- Z test
- Chi square test
- Student test
The Comparison of Two Populations
Analysis of Variance
- ANOVA Computations
- Two-way ANOVA
Data Exploration and Dimension Reduction
- Data Summaries
- Covariance, Correlation, and Distances
- Missing Values Handling
- Outliers Handling
- Principal Component Analysis
- Exploratory Factor Analysis
Machine Learning:
Introduction and Concepts : Differentiating algorithmic and model based frameworks
Regression
- Ordinary Least Squares
- Ridge Regression
- Lasso Regression
- K Nearest Neighbours Regression & Classification
Supervised Learning with Regression and Classification
- Bias-Variance Dichotomy
- Model Validation Approaches
- Training Set
- Validation Set
- Test Set
- Cross-Validation
- Logistic Regression
- Linear Discriminant Analysis
- Quadratic Discriminant Analysis
Regression and Classification Trees
- Recursive Portioning
- Impurity Measures (Entropy and Gini Index)
- Pruning the Tree
Support Vector Machines
Ensemble Methods
- Bagging (Parallel Ensemble) – Random Forest
- Boosting (Sequential Ensemble) – Gradient Boosting
Neural Networks
- Structure of Neural Network
- Hidden Layers and Neurons
- Weights and Transfer Function
Deep learning
- Integrated best features of both Machine Learning and NN
Forecasting ( Time-Series Modelling )
- Trend and Seasonal Analysis
- Different Smoothing Techniques
- ARIMA Modelling
- ETS Modelling
Unsupervised Learning
Clustering
- Hierarchical (Agglomerative) Clustering
- Non-Hierarchical Clustering: The k-Means Algorithm
Associative Rule Mining
- Aprori Algorithms
- Frequent Item-sets
- Support
- Confidence
- Lift Ratio
- Discovering Association Rules
Text Mining
- Sentiment Analysis
- Use Behaviour Analysis
- Topic Categorization
- Topic Ranking
Recommender Engines:
- Collaborative Filtering Recommenders
- Content Based Recommenders
Data Science Techniques Implementation by R – Language
Introduction to R Foundation
- Software Installation on Various Operating Systems
- Introduction to Real Time Applications
- Introduction to Popular Packages
R-Analytical Tool (Data Mining / Machine Learning)
- Basic Data Types
- R Data Structures
- Vectors
- Matrix
- List
- Data Frames
- R Functions
- Predictive Modelling Project based on R
- Classification Modelling Project based on R
- Clustering Project based on R
- Association Mining Project based on R
- R Visualization Packages
- Machine Learning Packages in R
Python – Getting Started
- Installing Python on Windows
- Installing Python on Mac and Linux
- Introduction to Editors
- Installing PyCharm and Sublime Editors
Python Basics
- Numbers and Math in Python
- Variable and Inputs
- Built in Modules and Functions
- Save and Run Python Files
- Strings
- Python List
- Python slices and slicing
Python Scientific Libraries for Machine Learning
- Scikit-Learn
- Numpy
- Scipy
- Pandas
- Matplotlib
Introduction to Data Visualization
- Introduction to Data Science and Visualization Tools in Python
- Installing and Setting up iPython Notebook
- Installing Anaconda and Panda
- Setting Up Environment
Learning Numpy
- Creating Arrays
- Using Arrays and Scalars
- Indexing Arrays
- Array Transposition
- Universal Array Function
- Array Processing
- Array Input and Ouput
Working with Panda
- Series
- Data Frames
- Index Objects
- Reindex
- Drop Entry
- Selecting Entries
- Data Alignment
- Rank and Sort
- Summary Statistics
- Missing Data
- Index Hierarchy
Working with Data Part 1
- Reading and Writing Text Files
- Json with Python
- HTML with Python
- Microsoft Excel Files with Python
Working with Data Part 2
- Merge, Merge on Index and Concatenate
- Combining Data Frames
- Reshaping and Pivoting
- Duplicating Data Frames
- Mapping, Replacing, Rename Index and Binning
- Outliers and Permutations
Working with Data Part 3
- Group by on Data Frames
- Group by on Dist Series
- Aggregation
- Splitting, Applying and Combining
- Cross Tabulation
Working with Visualization
- Installing Seaborn
- Histograms
- Kernel Density and Estimate Plots
- Combining Plot Styles
- Box and Violin Plots
- Regression Plots
- Box and Violin Plots
- Heat Maps and ClusteredMatrices
- Example Projects-15
Machine Learning Language
- Introduction
- Linear Regression
- Logistic Regression
- Multi Class Classification – Logistic Regression
- Multi Class Classification – Nearest Neighbor
- Vector Machines
- Na�ve Bayes Theory
Prescriptive analytics ( Optimization Techniques )
- Introduction
- Analytics through designed experiments
- Analytics through Active learning
- Analytics through Reinforcement learning
Data Science based Projects
- Cover couple of Real-Time Analytics Projects based on R Script and Python Scientific Libraries.
SPARK MLlib (Scalable Machine Learning)
- RDD Concept
- Spark MLlib: Data Types, Algorithms, and Utilities