Machine Learning with R

Artificial Intelligence (AI)

COURSE OVERVIEW

This five-day intensive program is designed for data scientists and analysts who prefer the statistical precision and rich visualization capabilities of R. Moving beyond basic scripts, this course immerses you in the tidymodels ecosystem—the modern standard for modeling in R. You will learn to build rigorous, reproducible pipelines that handle everything from data "recipes" to automated hyperparameter tuning. By the end of the week, you will be able to deploy publication-quality predictive models and explain their inner workings with statistical confidence.

COURSE OBJECTIVES

By the end of this course, participants will be able to:

Master Tidymodels: Navigate the core packages (rsample, parsnip, recipes, workflows) for a seamless ML lifecycle.
Engineer Robust Features: Use "recipes" to automate scaling, encoding, and dimensionality reduction.
Execute Complex Modeling: Deploy Random Forests, XGBoost, and Support Vector Machines with a unified interface.
Evaluate with Precision: Use yardstick to calculate AUC-ROC, F1-score, and RMSE across cross-validated folds.
Tune for Performance: Implement grid search and Bayesian optimization to find the best model parameters.
Explain Model Logic: Utilize DALEX or lime for model interpretability and feature importance.

Duration: 5 Days / 40 Hours

Delivery Method: Classroom-based, Virtual Instructor Led Training

COURSE OUTLINE

Day 1: The Modern R Modeling Framework

Focus: Introduction to Tidymodels and data preparation.

R for ML in 2026: Why R remains the gold standard for statistical learning and research.
Data Splitting with rsample: Creating training/testing sets and handling class imbalance with stratification.
Pre-processing with recipes: Building a blueprint for data cleaning (imputation, normalization, and dummy variables).
Hands-on: Preparing a "dirty" marketing dataset for a predictive classification task.

Day 2: Linear & Logistic Modeling

Focus: Statistical baselines and regression.

Linear Regression: Modeling continuous outcomes and checking for heteroscedasticity.
Logistic Regression: Binary classification and interpreting odds ratios.
Penalized Models: Using Lasso and Ridge (glmnet) to handle high-dimensional data.
Model Workflows: Bundling pre-processing and modeling into a single, clean workflow object.
Hands-on: Predicting insurance costs using multiple regression and interaction terms.

Day 3: Tree-Based Models & Ensembles

Focus: Moving to non-linear, high-performance algorithms.

Decision Trees: Visualizing logic splits and managing tree depth.
Random Forests: Using ranger to build robust ensembles that reduce variance.
Boosted Trees (XGBoost): Sequential learning for top-tier predictive accuracy.
Hands-on: Building a "Credit Default" predictor that identifies high-risk applications.

Day 4: Resampling & Hyperparameter Tuning

Focus: Finding the "Best" version of your model.

Cross-Validation: Using K-Fold and Bootstrapping to ensure model stability.
Grid Search with tune: Defining parameter grids (e.g., number of trees, learning rate).
Racing Methods: Using finetune to efficiently discard underperforming models early.
Hands-on: Tuning a Gradient Boosting model to maximize its AUC score.

Day 5: Model Evaluation, Interpretation & Deployment

Focus: Communicating results and moving to production.

Performance Evaluation: Deep dive into yardstick for multi-class classification and regression metrics.
Model Agnostic Explanations: Using VIP (Variable Importance Plots) to explain why the model makes specific decisions.
Deployment with plumber: Turning your R model into a REST API for integration with other apps.
Final Project: Building an end-to-end "Customer Churn" analysis from raw data to an interpreted API endpoint.

REGISTER NOW