- What is Data Visualization
- Creating a Basic Graph
- Using ggplot2 Package
- Understand what is Data Visualization.
- Know basic command to create graph with ggplot2 package.
- Open progress form Class 06 and checked on Study Check-In
- Open R Studio Desktop and setting working directory
- Create Script file name
06_63130500xxx.R
and save in your working directory - Install new packages:
ggplot2
. - Loading package:
dplyr
,readr
- Checked on 0 prerequisite in progress form
Data Visualization (Data Viz) is an interdisciplinary field that deals with the graphic representation of data. It has its roots in the field of Statistics and is therefore generally considered a branch of Descriptive Statistics. However, because both design skills and statistical and computing skills are required to visualize effectively, it is argued by some authors that it is both an Art and a Science. You can learn Fundamentals of Data Visualization in https://clauswilke.com/dataviz/
Before to creating a graph in R you need to know that in this class, I want you to learn basic for creating a graphs only. Understand the data more. Don't focus or serious about design in R. After this part,I will let introduce you to using BI Tools with more easily to create and visualize the data.
In R, graphs are typically created interactively. You can create several graphs. I have summary basic command to create graph with using data starwars
from dplyr package:
Graph | Command |
---|---|
Histogram | hist(starwars$height) |
Density Plot | plot(density(starwars$height,na.rm = T)) |
Scatter Plot | plot(x=starwars$height, y=starwars$mass) |
Bar Chart | barplot(table(starwars$sex)) |
Pie Chart | pie(table(starwars$sex)) |
Box Plot | boxplot(starwars$mass,data=starwars) or boxplot(starwars$height,data=starwars) |
This dataset is about's film and television in Marvel Cinematic Universe. It includes name of movie and years of film that released. Let's using this dataset to explore and create a graph to shown "Number of Released Marvel Movies".
# Load data
Marvels <- read_csv("https://raw.githubusercontent.com/safesit23/INT214-Statistics/main/datasets/Marvels.csv")
Step 1: Creating a basic Bar Graph
# Set data
count_marvel <-table(Marvels$years)
# Create graph
barplot(count_marvel)
Step 2: Adding component of graph
barplot(count_marvel,
main = "Number of Released Marvel Movies", # Title of graphs
xlab = "Year Released", # X-Axis Title
ylab = "Number of Movies", # Y-Axis Title
)
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details. (See more details: ggplot2 Package, ggplot2: elegant graphics for data analysis)
- Data: The dataset being plotted.
- Aesthetics: The scales onto which we map our data.
- Geometries: The visual elements ued for our data.
All ggplot2 plots begin with a call to ggplot(), supplying default data and aesthethic mappings, specified by aes(). You can add layers, scales, coords and facets with symbol +
after ggplot().See more details: Layers
ggplot(dataset,aes(x=colName,y=colName)) + geom_xxxx()
Step 1: Creating a basic Bar Graph
ggplot(Marvels,aes(x=years)) + geom_bar()
Step 2: Adding component of graph
# Save to object
marvel_plot <- ggplot(Marvels,aes(x=years)) + geom_bar()
# Adding component
marvel_plot + ggtitle("Number of Released Marvel Movies") +
xlab("Year Released") + ylab("Number of Movies")
Step 1: Creating a scatter plot
starwars %>% ggplot(aes(x=height,y=mass))+geom_point()
Step 2: Using filter data to expand more details
starwars %>% filter(mass<500) %>% ggplot(aes(x=height,y=mass)) + geom_point()
Step 3: Label gender with color
scat_plot <- starwars %>% filter(mass<500) %>% ggplot(aes(x=height,y=mass))+
geom_point(aes(color=gender))
scat_plot
Step 4: Adding a line of best fit to a plot
scat_plot+geom_smooth() #default value - loess
scat_plot+geom_smooth(method="lm") #linear model
Try another graphs with yourself more: https://www.r-graph-gallery.com/
- Try to Create histogram graph to see the height **Try to use
binwidth = 10
in geometries. (Read about's binwidth more)
// Try with yourself
It has many free online course about Data Visualization. You can study in these course to more understand by learning with yourself.
- Datacamp: Data Visualization in R
- Datacamp: Introduction to Data Visualization with ggplot2
- Datacamp: Intermediate Data Visualization with ggplot2
- codecademy: Fundamentals of Data Visualization with ggplot2
- (Easy) Using datasets that used in class (e.g. survey, cats, mtcars, products) and visualize the data.
- (Advanced) Download full dataset from Stack Overflow Annual Developer Survey and try to transform data and visualize data same as result