Skip to content

KhrisTheLearner/R_Dummy_Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Project Title: Customized Dummy Dataset with R

This project is a comprehensive demonstration of how to create a dummy dataset in R. The primary goal is to generate a dataset that closely imitates the structure of real-world data yet is devoid of any sensitive information. Achieving this is especially useful in scenarios where it is necessary to share your R code and the steps undertaken, but revealing the actual data is not an option because of privacy concerns.

Objectives

1. Setting the work environment:

The first step involves defining the working directory, followed by saving the workspace and history. This is a crucial step that ensures all your work is correctly saved and can be accessed later.

2. Creating an empty data frame:

This step involves creating placeholders for the data that will be populated later. This helps in structuring your dataset right from the start.

3. Populating columns:

Here, we fill in sequential numbers and names into the data frame. This is the initial step towards populating our dummy dataset.

4. Assigning sex to names:

This is where we assign a gender to each entry in a random, but controlled manner to reflect the proportion in the original data. This step ensures that the demographic distribution in our dummy dataset matches the original one.

5. Assigning nationalities:

In this step, we allocate nationalities based on the distribution in the original dataset. This helps in maintaining the integrity of the original dataset's demographic distribution.

6. Reviewing the dataset:

After all the assignments and allocations, it's time to check the data frame to ensure everything is in order. This step is crucial in catching any errors or inconsistencies.

7. Removing mistakenly added columns:

Despite our best efforts, mistakes can happen. This step involves checking for and removing any erroneously added columns that might have slipped through.

8. Reordering columns:

To maintain coherence with the original dataset, we reorder the columns to match it. This ensures that anyone referring to the original dataset can easily navigate through our dummy dataset.

9. Exporting the dataset:

The final step involves exporting the dataset as a CSV or Excel file for further use. This makes the dataset easily accessible and usable.

Script

The complete R script for this project is included in the repository. This script provides a detailed walkthrough of each step of the process, from setting up your work environment to exporting the completed dataset.

  • Access R_Dummy_Dataset.R for the script
  • Acess R_Dummy_Dataset.RData to check the work space
  • Acess R_Dummy_Dataset.RHistory to track the history of how this project was built

Outcome

The outcome of this project is a dummy dataset that maintains the structure and distribution of your original data, while still preserving privacy. This is an effective way to share or demonstrate your R skills without compromising sensitive information. By following the steps outlined in this project, you can successfully create a dummy dataset that will not only serve your privacy needs but also help in improving your skills in data handling with R.

How to Exit

Instructions are provided on how to clear the workspace at the end of your session. Following these instructions will ensure that your workspace is clean and ready for your next project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages