Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract y position of stat_dots? #208

Closed
bjedwards opened this issue Jan 16, 2024 · 5 comments
Closed

Extract y position of stat_dots? #208

bjedwards opened this issue Jan 16, 2024 · 5 comments

Comments

@bjedwards
Copy link

bjedwards commented Jan 16, 2024

I am trying to use stat_dots along with the very good ggrepel::geom_text_repel to create a dot plot as well as label the points with a name. Here is a reprex

library(ggplot2)
library(ggdist)
library(ggrepel)

ggplot(mtcars, aes(x=hp/wt, y=factor(cyl))) +
  stat_dots(aes(fill=factor(cyl)), layout='swarm', side='both') +
  geom_text_repel(aes(label=rownames(mtcars)), max.overlaps = Inf, segment.alpha=0.5, min.segment.length = 0, size=8/.pt, seed = 314159)

Which produces this:

mtcars_repel_dots

where the anchor point for ggrepel::geom_text_repel is pointing to the baseline y for the factors rather than the true position of the dots.

I tried to dig around in layer_data and layer_grob to see if I could find the position to extract, as well as trying to pick apart and recalculate the bin_dots function call, but I couldn't quite make it work.

Looking around it seems similar or related to #26 in that "it'd be cool but really hard".

Cheers!

sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.2.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Chicago
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggrepel_0.9.4 ggdist_3.3.0  ggplot2_3.4.4
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.4         dplyr_1.1.3          compiler_4.3.2      
#>  [4] highr_0.10           reprex_2.1.0         tidyselect_1.2.0    
#>  [7] Rcpp_1.0.11          xml2_1.3.5           scales_1.2.1        
#> [10] yaml_2.3.7           fastmap_1.1.1        R6_2.5.1            
#> [13] labeling_0.4.3       generics_0.1.3       curl_5.1.0          
#> [16] distributional_0.3.2 knitr_1.45           tibble_3.2.1        
#> [19] munsell_0.5.0        pillar_1.9.0         rlang_1.1.1         
#> [22] utf8_1.2.4           xfun_0.41            fs_1.6.3            
#> [25] cli_3.6.1            withr_2.5.2          magrittr_2.0.3      
#> [28] digest_0.6.33        grid_4.3.2           rstudioapi_0.15.0   
#> [31] beeswarm_0.4.0       lifecycle_1.0.3      vctrs_0.6.4         
#> [34] evaluate_0.23        glue_1.6.2           farver_2.1.1        
#> [37] fansi_1.0.5          colorspace_2.1-0     rmarkdown_2.25      
#> [40] tools_4.3.2          pkgconfig_2.0.3      htmltools_0.5.7
@mjskay
Copy link
Owner

mjskay commented Jan 17, 2024

Yeah, dot positions can't be known until draw time because they are determined dynamically based on the plot dimensions. To do this you have to pre-determine the binwidth and use a fixed coordinate system for the chart axes.

Here's an example based on this comment: #182 (comment)

library(dplyr)
library(ggplot2)
library(ggdist)

# determine the binwidth
# you could also skip this step and manually specify a binwidth... maxheight 
# here is the max height of the chart assuming y units and x units are square,
# and is intended to get a chart with around a 3/2 aspect ratio 
hp_over_wt = with(mtcars, hp/wt)
binwidth = find_dotplot_binwidth(hp_over_wt, maxheight = 2/3*diff(range(hp_over_wt)), heightratio = 1)

bin_df = bin_dots(x = hp_over_wt, y = 0, binwidth = binwidth, heightratio = 1)

# bin the dots
cbind(mtcars, bin_df) |>
  ggplot(aes(x0 = x, y0 = y / binwidth, a = binwidth/2, b = 1/2, angle = 0)) +
  ggforce::geom_ellipse() +
  ggrepel::geom_text_repel(aes(label = rownames(mtcars), x = x, y = y/binwidth), min.segment.length = 0) +
  coord_fixed(ratio = binwidth)

image

To do multiple groups like in your plot you'd have to do multiple calls to bin_dots, one for each group, and manually offset their y positions.

@mjskay
Copy link
Owner

mjskay commented Jan 17, 2024

Forgot to point out you can supply the layout and side params to bin_dots(), so this is a little closer:

library(dplyr)
library(ggplot2)
library(ggdist)


# determine the binwidth
# you could also skip this step and manually specify a binwidth... maxheight 
# here is the max height of the chart assuming y units and x units are square,
# and is intended to get a chart with around a 3/2 aspect ratio 
binwidth = find_dotplot_binwidth(with(mtcars, hp/wt), maxheight = 2/3*diff(range(hp_over_wt)), heightratio = 1)

bin_df = bin_dots(x = with(mtcars, hp/wt), y = 0, binwidth = binwidth, heightratio = 1, layout = "swarm", side = "both")

# bin the dots
cbind(mtcars, bin_df) |>
  ggplot(aes(x0 = x, y0 = y / binwidth, a = binwidth/2, b = 1/2, angle = 0)) +
  ggforce::geom_ellipse() +
  ggrepel::geom_text_repel(aes(label = rownames(mtcars), x = x, y = y/binwidth), min.segment.length = 0) +
  coord_fixed(ratio = binwidth)

image

@mjskay
Copy link
Owner

mjskay commented Jan 17, 2024

And here's a different version that keeps things stacked within a height of 1 on the y axis if that's easier to use for stacking groups:

library(dplyr)
library(ggplot2)
library(ggdist)


# determine the binwidth
# you could also skip this step and manually specify a binwidth... maxheight 
# here is the max height of the chart assuming y units and x units are square,
# and is intended to get a chart with around a 3/2 aspect ratio 
hr = 3/2/diff(range(hp_over_wt))
binwidth = find_dotplot_binwidth(with(mtcars, hp/wt), maxheight = 1, heightratio = hr)

bin_df = bin_dots(x = with(mtcars, hp/wt), y = 0, binwidth = binwidth, heightratio = hr, side = "both", layout = "swarm")

# bin the dots
cbind(mtcars, bin_df) |>
  ggplot(aes(x0 = x, y0 = y, a = binwidth/2, b = binwidth*hr/2, angle = 0)) +
  ggforce::geom_ellipse() +
  ggrepel::geom_text_repel(aes(label = rownames(mtcars), x = x, y = y), min.segment.length = 0) +
  coord_fixed(ratio = 1/hr)

image

@bjedwards
Copy link
Author

Incredible @mjskay!

For posterity here is where I landed for the original use case.

# Bring in dplyr ggplot tidyr, cause I am lazy
library(tidyverse)
library(ggdist)
ilbrary(ggrepel)
library(ggforce)

# Set up the data frame for factors so we can fix the y axis later
to_calc_bw <- mtcars |>
  mutate(car_name =rownames(mtcars)) |>
  mutate(cyl = factor(cyl))


# Grab the number of factors so we can redo the height
n_factors <- length(levels(to_calc_bw$cyl))


# A nice 16x9 ratio for the final output
w <- 6.5
h <- 6.5/(16/9)

# Calculate the binwidth with the height calculation including how many factors we'll need
hr = (w*n_factors)/h/diff(range(mtcars$hp/mtcars$wt))
binwidth = find_dotplot_binwidth(with(mtcars, hp/wt), maxheight = 1, heightratio = hr)


# Map over `cyl` to get and add the factor level heights
toplot <- to_calc_bw |>
  group_by(cyl) |>
  group_modify(\(df, ...) {
    bin_df <- bin_dots(x = df$hp/df$wt, y = 0, binwidth = binwidth, heightratio = hr, side = "both", layout = "weave") |>
      mutate(y = as.numeric(first(df$cyl)) + y)
    bind_cols(df, bin_df) |>
      select(-cyl)
  }, .keep=TRUE)

# Fudge factor for my own aesthetic
scale_down <- 0.9

# PLOT!
gg <- ggplot(toplot, aes(x0 = x, y0 = y, a = scale_down*binwidth/2, b = scale_down*binwidth*hr/2, angle = 0)) +
  ggforce::geom_ellipse(aes(fill=cyl), color='white', linewidth=0.1) +
  ggrepel::geom_text_repel(aes(label = car_name, x = x, y = y),
                           min.segment.length = 0, size=8/.pt,
                           max.overlaps = Inf, segment.alpha=0.25, force=60) +
  scale_y_continuous(breaks=seq(n_factors), labels=levels(to_calc_bw$cyl)) +
  coord_fixed(ratio = 1/hr) +
  guides(fill='none') +
  labs(x='hp/wt', y = 'cyl') +
  theme_minimal()

# Output at same dimentions
ggsave('test.png', gg, width = w, height = h, units = 'in', bg='white')

test

That reference to #183 helps with another use case I've had as well. Thank you so much for this project it is crazy useful. Cheers, and close this if you'd like, or keep it open in case there is future work to be done

@mjskay
Copy link
Owner

mjskay commented Jan 17, 2024

Awesome, glad this worked!!

I'll close this since I don't think this kind of thing is likely to become easier anytime soon unless I decide it's worth adding a specialized geom for it. It's really down to a limitation of ggplot's drawing model that we can't know dot positions until it's too late.

@mjskay mjskay closed this as completed Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants