A list of datasets, resources and tools related to food production in Ghana. Let's begin...
Below is the summary table of the datasets available so far. The table includes the name of the dataset with the R code for cleaning it, the cleaned data. the original "dirty" data and the source.
The goal of this section is to produce clean datasets that match the "tidy data" definition below:
- Each variable (or feature) you measure should be in one column
- Each different observation of that variable should be in a different row
- There should be one table for each “kind” of variable
- If you have multiple tables, they should include a column in the table that allows them to be linked
Hadley Wickham provides a detailed definition and explanation of tidy data and its benefits in this paper. To capture a few of these:
- a user can quickly upload a dataset into many standard tools including Microsoft Excel, R, Pandas and Tableau for analysis and visualisation to name a few.
- a user can easily combine multiple datasets for a more in-depth analysis.
- a user can easily share her data with other collaborators as explained by Jeff Leek here.
Since not everyone can figure out what's going on inside my head if I don't explicitly show how these datasets can be used, I am attempting to write a short list of steps to follow in case you want to use this resource:
- Name Column: Identify a dataset that is of interest to you.
- Cleaned Data: Check out the corresponding cleaned dataset to see if it is still of interest to you. Decide if it contains the variables/columns you're interested in using. You can always change your mind.
- Details Page: Visit the details.md file in the cleaned data folder to read additional notes about the dataset you are interested in.
- Source Document: If you still have questions about the dataset, check out the original source used using the source column in the summary table.
- Dirty Data: Take a look at the dirty data file and do a quick dance that you don't have to work with that format.
- R Code: If you're still bored, check out the R code used in tidying the dataset.
This project is a labour of love stemming from a love for data, food and Ghana. There are three main things I hope to achieve with this:
- Create an repository of open food-related datasets specific to Ghana.
- Document and share the process for collecting, cleaning and releasing these datasets. I hope others can improve the process or simple join in the effort.
- Identify gaps in specific datasets that would be worth covering.
- Produce interesting stories, reports, music, dances, business plans, policies around food in Ghana. After all, we are talking about Ghana and food is a major part of our culture so why not learn more facts about it?
With that said, here's how you can help:
- Review datasets and send in a pull request if you notice any issues. A pull request is just a GitHub fancy way of sharing edits to a document. Learn more about pull requests here. If this is too technical for you, simply send me a tweet.
- Use datasets for your analysis, visualisations, television and radio discussions, school reports, journalism stories, business pitches etc. You get the idea. Once you do, let me know what you're missing or is useful.
- Share dirty data you will like to see cleaned. The only catch for now is the data must be food-related and about Ghana.