R and tidyverse are very popular for data mining. This repository contains slides and documented R examples to accompany several chapters of the popular data mining textbook:
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne and Vipin Kumar, Introduction to Data Mining, Addison Wesley, 1st or 2nd edition.
The slides and examples are used in my course CS 5/7331 Data Mining taught at SMU and will be regularly updated and improved. The code examples are now compiled into the free online book An R Companion for Introduction to Data Mining which is published under the Creative Commons Attribution-ShareAlike license and you can share and adapt them freely. Please open an issue for corrections or to suggest improvements.
Companion Chapter | Lecture Slides | Free Textbook Chapter |
---|---|---|
1. Introduction | PDF, PowerPoint | - |
2. Data | PDF, PowerPoint | - |
2.5. Exploring Data | PDF, PowerPoint | Web Chapter Exploring Data |
3. Classification: Basic Concepts | PDF, PowerPoint | 3. Classification |
4. Classification: Alternative Techniques | PDF, PowerPoint | - |
5. Association Analysis: Basic Concepts | PDF, PowerPoint | 5. Association Analysis |
6. Association Analysis: Advanced Concepts | - | - |
7. Cluster Analysis: Basic Concepts | PDF, PowerPoint | 7. Cluster Analysis |
8. Regression | - | - |
9. Logistic Regression | - | - |
- Ask the R Wizard (GPT) to explain R code and help with writing code.
You need to install:
Each book chapter will use a set of packages that must be installed. The installation is done directly in R and the installation code can be found at the beginning of each chapter.
The textbook Introduction to Data Mining has been one of the most popular choices for learning and teaching data mining concepts. Some of the most important chapters have been made available for free by the authors on the books's website. One of the authors also provides Python Jupyter notebooks with examples, but complete R code examples were still needed. Given the R community's interest in data analysis, data science, and machine learning, and the broad support of R packages for data mining, there was a noticeable gap that was filled by this learning resource. This resource targets advanced undergraduate and graduate students and can be used as a component for a first introduction to data mining.
- PowerPoint presentation files for a data mining course can be found in the repository directory slides. The slides have an R symbol at the bottom whenever there are R code examples available.
- Datasets for projects can be found at https://www.kaggle.com/datasets
Michael Hahsler (2024). An R Companion for Introduction to Data Mining. figshare. DOI: 10.6084/m9.figshare.26750404, URL: https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/
All code and documents in this repository are licensed under the Creative Commons Attribution-ShareAlike 4.0 International license.
For questions please contact Michael Hahsler.