Data Science with R and SQL Server (R) – Details

Detaillierter Kursinhalt

Introduction
Introducing data science and R
  • What are statistics, data mining, machine learning…
  • Data science projects and their lifetime
  • Introducing R
  • R tools
  • R data structures
  • Lab 1
Data overview
  • Datasets, cases and variables
  • Types of variables
  • Introductory statistics for discrete variables
  • Descriptive statistics for continuous variables
  • Basic graphs
  • Sampling, confidence level, confidence interval
  • Lab 2
Data preparation
  • Derived variables
  • Missing values and outliers
  • Smoothing and normalization
  • Time series
  • Training and test sets
  • Lab 3
Associations between two variables and visualizations of associations
  • Covariance and correlation
  • Contingency tables and chi-squared test
  • T-test and analysis of variance
  • Bayesian inference
  • Linear models
  • Lab 4
Feature selection and matrix operations
  • Feature selection in linear models
  • Basic matrix algebra
  • Principal component analysis
  • Exploratory factor analysis
  • Lab 5
Unsupervised learning
  • Hierarchical clustering
  • K-means clustering
  • Association rules
  • Lab 6
Supervised learning
  • Neural Networks
  • Logistic Regression
  • Decision and regression trees
  • Random forests
  • Gradient boosting trees
  • K-nearest neighbors
  • Lab 7
Modern topics
  • Support vector machines
  • Time series
  • Text mining
  • Deep learning
  • Reinforcement learning
  • Lab 8
R in SQL Server and MS BI
  • ML Services (In-Database) structure
  • Executing external scripts in SQL Server
  • Storing a model and performing native predictions
  • R in Azure ML and Power BI
  • Lab 9