Skip to content

Southampton-RSG/openrefine-data-cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Create a Slack Account with us Slack Status

OpenRefine Lesson

This is a lesson on OpenRefine data cleaning tool derived from the Data Carpentry's Data Refine for Ecology.

Dataset

  • The data used in this lesson ata set is derived from The Portal Project Long-term desert ecology project data. This data file was downloaded and then modified specifically for use with OpenRefine.
    • Taxon names were put back into the file.
    • Globally Unique Identifiers (in the form of UUIDs) were added.
  • These modifications were made in order to illustrate some features of Open Refine.
    • Errors were added to the taxon names (scientificName field), to demonstrate OpenRefine's ability to find likely mis-entered data.
    • These errors can be found using clustering algorithms on the scientificName column, showing the power of the algorithms to find discrepancies quickly and making it simple to fix all issues found.

Maintainer(s)

Current maintainers of this lesson are:

Authors

A list of contributors to the lesson can be found in AUTHORS.

Citation

To cite this lesson, please consult with CITATION.