Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting pie charts into stacked bar charts #83

Closed
swastis10 opened this issue Apr 20, 2023 · 15 comments
Closed

Converting pie charts into stacked bar charts #83

swastis10 opened this issue Apr 20, 2023 · 15 comments

Comments

@swastis10
Copy link

As we plan to convert pie charts into bar charts, I have a few suggestions:

Existing pie chart example:

image

1. Bar Chart:

A standard bar chart will compare numeric values between levels of a categorical variable which in our case is "Confirmed trip". One bar will be plotted for each level of the categorical variable, each bar’s length indicating numeric value. As we are looking at numeric values across one categorical variable("Confirmed mode"), I feel using a bar chart is a better option than using a stacked bar chart. We can have something like -
image
I will also add proportions on top of each bar or change the y-axis to percentages.

2. Stacked Bar Chart:

In stacked bar chart, we will have only one bar ( representing confirmed mode) divided into a number of sub-bars stacked end to end, each one corresponding to a level of the second categorical variable(different confirmed modes are (Bus, walk, e-bike).

It will also not be straightforward to compare other division's values across the one bar we have (confirmed trip) except the one division plotted against the base line.

It will look something like this -
image
or
image

@shankari any thoughts?

@shankari
Copy link
Contributor

shankari commented Apr 20, 2023

which in our case is "Confirmed trip".

It is actually "Confirmed mode" or "Confirmed purpose" as the case may be.

As we are looking at numeric values across one categorical variable("Confirmed mode")

As I said in
#82 (comment), I believe that

The general rule of thumb for replacing pie charts is stacked bar graphs, not bar graphs directly
what is important for the pie charts and their replacements is the proportions, not the actual numbers
using bar graphs directly focuses more on the numbers, not on the proportions

I know what stacked bar charts look like. There is an implementation of a visualization using stacked bar charts in #78 and the related paper.

As we are looking at numeric values across one categorical variable("Confirmed mode"),

In our case, however, the numeric values are not as important as the proportion -e.g. the proportions of a whole

I look forward to your recommended design, along with backing references, on how best to proceed.

@shankari
Copy link
Contributor

e.g. search for "how should I replace pie charts?", read through all the results and make a recommendation that matches our specific use case and is well regarded in the literature, along with the justification.

If you find strong evidence that regular bar charts are the best way to show proportions of a whole, I am willing to see it, but I don't think that is the case.

@shankari
Copy link
Contributor

While planning out the replacement, you should also consider that we want to eventually include error bars for these values - e.g instead of having 20% e-bike trips and 30% car trips, we will have 15-25% e-bike trips and 25-35% car trips.
That is the "count every trip" project that Michael is working on.

That is one of the reasons that we are switching away from pie charts - I am not sure how to put error bars on pie charts.
Or if there is a way to have error bars (error wedges) in pie charts, we can stick with them
Maybe https://www.physicsforums.com/threads/pie-chart-error-bars.909288/

@shankari
Copy link
Contributor

visualization is about telling a story. you might want to read some of the material we have published on this to see the story that we want to tell, and then figure out how to represent that story in the graphs. Note again that the story is typically not "participants took 2000 e-bike trips" but more that the program achieved a 30% e-bike mode share.

e.g. https://www.osti.gov/biblio/1778194
https://www.osti.gov/biblio/1841348

I am also sending you to the in-progress version via email.

@swastis10
Copy link
Author

Pie charts vs Bar Charts

One of the major problems with pie charts is that it is almost impossible to directly compare the sizes of two slices unless the differences are very large whereas with bar charts our eyes compare the end points. Because they are aligned at a common baseline, it’s very easy to assess relative size. This makes it easy to see not only which segment is the largest but also how incrementally larger it is than the other segments.

Many articles do suggest using a bar chart as a replacement to pie charts but because our use case show proportions as a whole, stacked bar charts are a better choice. For example - We want to see 'The number of trips for each replaced transport mode for e-bike only'. In this case, we want to know about the transport mode most replaced by e-bike which the biggest stack in a stacked bar chart will tell us at a glance.

After going through the materials provided above I have come up with a few alternatives -

Percentage, or stacked bar charts:

It will be a single bar with each category having a coloured section within the bar accounting for a proportion of the total. The labels can sit inside the chart itself, providing an overall cleaner visual. There is also room for more categories than 3 or 4 as recommended for pie charts. Something like this -
image
image

If our eventual goal is to add errors to the stacked bar chart then we can do something like this in stacked bar chart -
image
Implementation link : https://towardsdatascience.com/the-quick-and-easy-way-to-plot-error-bars-in-python-using-pandas-a4d5cca2695d

Implementation :

I have also experimented with the implementation using python libraries numpy, pandas and matplotlib as mentioned in the article :

@shankari
Copy link
Contributor

@swastis10

All that sounds good wrt stacked bar charts.

But:

@swastis10
Copy link
Author

I would like to take an example and explain the alternatives to pie charts and their pros and cons:

image

Here is a pie chart displaying the distribution of world population in the year 2021. Now we have 3 drawbacks while displaying this information in a pie chart -

  1. We cannot compare size of 2 areas in a pie chart, say, we cannot tell China's population is greater than India by what magnitude.
  2. As you can see in this example that pie charts are not very well suited for displaying large number of characteristics. Not a lot of legends are being displayed in the chart. ('223 others') We either have to increase the size of the legend or hover over it with mouse.
  3. It gives a static overview of distribution. For eg - We can see world population distribution over 2021 but if we want to see trends over time then we will have to create 2 pie charts for 2 different years say, year 2020 and 2021. Now, if we want to compare the population of India in the year 2020 and 2021 we cannot readily do it by looking at 2 different pie charts.

For the human eye it is easier to compare two squared surfaces. Therefore, bar chart and stacked bar charts are 2 options.

Horizontal bar charts:

image

These charts essentially solve issues 1 and 2. Each feature can be displayed in a horizontal bar. The reader can readily understand the population of one country and compare it to another country. If a large number of countries is to be displayed, the number of horizontal bars will simply be more without affecting the reader's ability to read the data.

Tree Maps:

They are good for representing hierarchical data. Our use case does not have dependent data e.g work, home, shopping etc. Also they are not user friendly.

Donut Chart:

It has very similar drawbacks as Pie chart

Stacked bar chart

image

These chart solve issues 1 and 3. Because of squared shapes it is easier to compare surfaces between two countries or between two time frames. In our usecase we do not have a lot of characteristics to display therefore 2 is not really as issue in our case. But there are scenarios where we want to compare 2 pie charts like this. (stacked bar charts are a better choice)
image

As our use case show proportions as a whole, stacked bar charts are a better choice. We can easily get an understanding of how each category contributes to the overall picture. For example - We want to see 'The number of trips for each replaced transport mode for e-bike only'. In this case, we want to know about the transport mode most replaced by e-bike which the biggest stack in a stacked bar chart will tell us at a glance.

The stacked bar chart with just be a single bar showcasing each purpose of the trip stacked onto one another.
image

  • Another example of a stacked bar chart with error bars would be -

image

I will use the implementation in #78

@shankari
Copy link
Contributor

shankari commented May 1, 2023

@swastis10 this still does not address all my questions and raises new ones

New questions:

  1. wrt treemaps "Also they are not user friendly." why? Where is this a quote from?
  2. from the three points that you listed, do we have a lot of (3, comparisons over time)? In which graphs?

existing questions:

  1. I still see only four alternatives discussed (bar, stacked bar, tree and donut), where donut is essentially the same as pie. Are those the main alternatives? please cite your source.
  2. you had an initial concern Converting pie charts into stacked bar charts #83 (comment) that a stacked bar chart will have only one bar. Is that still a concern? How does your design address that?

Concretely, I think you can come up with a design that addresses both new question #2 and existing question number #2 if you think hard enough. LMK if you can't come up with this design and I can suggest it.

@swastis10
Copy link
Author

new questions

  1. wrt treemaps "Also they are not user friendly." why? Where is this a quote from?

As stated in https://www.nngroup.com/articles/treemaps/, bar charts should be preferred over Treemaps if possible.

A treemap is a complex, area-based data visualization for hierarchical data that can be hard to interpret precisely. In many cases, simpler visualizations such as bar charts are preferable.

  1. from the three points that you listed, do we have a lot of (3, comparisons over time)? In which graphs?
    We have done some comparisons over time. For example -

Trends in mode share shifted over time. E-bike mode share dropped as the weather grew colder. But even in the height of winter in Colorado, the e-bike mode share was a respectable 25%

image

Or number of trips for each mode over a time period for different programs.
image

existing questions

  1. I still see only four alternatives discussed (bar, stacked bar, tree and donut), where donut is essentially the same as pie. Are those the main alternatives? please cite your source.

There were other alternatives which did not seem valid in our case. For example :
Packed bubble chart: it is difficult to get exact values or a ranking from the unordered bubble sizes.
Waffle chart: They aren't very common, therefore need to educate users about them

Source: https://slidebazaar.com/blog/alternatives-to-pie-charts/
https://owllytics.com/3-pie-chart-alternatives-guarantee-to-capture-attention-better/
https://www.storytellingwithdata.com/blog/2014/06/alternatives-to-pies

  1. you had an initial concern Converting pie charts into stacked bar charts #83 (comment) that a stacked bar chart will have only one bar. Is that still a concern? How does your design address that?
    The stacked bar chart will have a single bar. Something like this -

image

Here, we can see the number of trips for each purpose and also compare it to the specific mode (say e-bike) and the purposes for it being used.

@shankari
Copy link
Contributor

shankari commented May 2, 2023

@swastis10 for this:

Trends in mode share shifted over time. E-bike mode share dropped as the weather grew colder. But even in the height of winter in Colorado, the e-bike mode share was a respectable 25%

we have timeseries visualizations in the public dashboard.
Are you suggesting that we have stacked bar graphs in addition to the timeseries or instead of the timeseries?

Or number of trips for each mode over a time period for different programs.

Do we have multiple programs in the dashboard now? How would we handle this with the current metrics?

Note that the public dashboard is updated daily.
What concretely do you have in mind? (maybe a quick sketch using powerpoint or figma would be helpful).

Here, we can see the number of trips for each purpose and also compare it to the specific mode (say e-bike) and the purposes for it being used.

This is roughly what I had in mind when I said "Concretely, I think you can come up with a design that addresses both new question #2 and existing question number #2 if you think hard enough. "

But not exactly. What would you do for studies, for example?

and the purposes for it being used.

What about non-purpose pie charts?

@swastis10
Copy link
Author

Are you suggesting that we have stacked bar graphs in addition to the timeseries or instead of the timeseries?

@shankari , because we are working with time series of correlated data points; it is a good practice to plot the data as a line plot. Had the data been uncorrelated, then bar charts were a better option over line plots. So timeseries is good.

Do we have multiple programs in the dashboard now? How would we handle this with the current metrics?

We do not have multiple programs in the dashboard. (Default only) Are we testing studies only?

What concretely do you have in mind? (maybe a quick sketch using powerpoint or figma would be helpful).

I have a design roughly like this on my mind-
image

@shankari
Copy link
Contributor

shankari commented May 2, 2023

@swastis10

Do we have multiple programs in the dashboard now? How would we handle this with the current metrics?

This was a rhetorical question that was designed to guide you towards a better design, as are almost all my questions from the comment. I know that we don't have multiple programs now - I wrote the dashboard code (both in the CanBikeCO and NREL OpenPATH versions).

The goal was to make you think about the second part of that question: "How would we handle this with the current metrics?" We don't have program specific metrics any more.

Rhetorical question: What metrics do we have? How would you want to handle them particularly as it comes to comparisons?

More concretely:

You had said "Or number of trips for each mode over a time period for different programs." as a reason for using stacked bar instead of regular bar graphs.

But we don't have multiple programs in the dashboard now. So we will not want to show the number of trips for each mode over a time period for different programs.

Rhetorical question: What are some metrics that we might be able to compare?

Are we testing studies only?

Read the code. Rhetorical question: Does the code only work with studies?

I have a design roughly like this on my mind-

But this has only one bar. In prior comments, you have said that the advantage of stacked bars over regular bar graphs is that it supports comparisons. But if you don't do any comparisons, then what is the advantage?

I am going to wait for one more response and then just give you a design to implement because I feel like we are just discussing in circles and this is taking too much time.

@swastis10
Copy link
Author

One of the use cases can be -
What kind of trips are e-bikes replacing? For example- cars are majorly being replaced by e-bikes for work trips. This will also help to examine the potential use of e-bikes as an alternative to delivery vehicles for delivery services.
image

or
A comparison between total number of trips vs Mode choice for trips under 10 miles

@shankari
Copy link
Contributor

shankari commented May 3, 2023

@swastis10

much better, but still not a very well-researched or well-understood plan. At this point, I will take over and provide a detailed, step-by-step design and you can focus on the implementation.

At a high level, we need to support both studies and programs. All the e-bike comparisons that you have outlined are only relevant for programs. "What does an e-bike replace" is not relevant unless you are providing e-bikes to the people who are collecting the data.

So the answer to:

Are we testing studies only?

is that OpenPATH is used for both studies and programs. Programs have a mode of interest and studies do not. So you cannot just copy-paste analysis results from a program-oriented paper for your design.

A comparison between total number of trips vs Mode choice for trips under 10 miles

This is much closer to what we are looking for, since it is relevant for both programs and studies.

Proposed design

The current list of metrics for studies is defined in https://github.com/e-mission/em-public-dashboard/blob/main/frontend/metrics_study.html
The current list of metrics for programs is defined in https://github.com/e-mission/em-public-dashboard/blob/main/frontend/metrics_program.html

Common pie charts between studies and programs, grouped into similar bins for comparison:

Group #1 (number of trips):

  • number of trips for each mode
  • number of commute trips for each mode
  • number of trips under 10 miles for each mode

Group #2 (trip mileage):

  • number of miles for each mode
  • number of commute miles for each mode (new metric)
  • number of miles for each mode in trips under 10 miles (new metric)

Group #3 (trip purpose):

  • number of trips for each purpose
  • number of miles for each purpose (new metric)

Additional pie charts only for programs:

  • number of trips for each purpose for the mode of interest
  • number of miles for each purpose for the mode of interest (new metric)

So a basic new design would be:

  • for all deployments, create group 1, group 2 and group 3
  • for programs, also create group 4

A better new design would be:

  • for all deployments, create group 1, group 2 and group 3
  • for programs, create group 4 and also group 5a and 5b that mix and match the purposes

Group 5a:

  • number of trips by purpose for all trips
  • number of trips by purpose for "mode of interest" trips

Group 5b:

  • number of miles by purpose for all trips
  • number of miles by purpose for "mode of interest" trips

This design meets all of the requirements:

  • switches from pie charts to bar charts so that we can include error bars
  • uses stacked bar charts instead regular bar charts so that we can see parts of a whole
  • include multiple stacked bar charts in the same graph so that we can easily compare related metrics
  • bonus: compacts the number of metrics so that we can see more of them at a glance and don't have to select others from the dropdown

You will also need to change the frontend to display the new metrics, update the dropdowns and make sure that the proportions of the boxes match the proportions of the new figures.

@shankari
Copy link
Contributor

Now tracked in #86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants