BeetlePalooza Dataset Code

This repository hosts the code and notebooks used to explore and process the BeetlePalooza dataset: 2018 NEON Ethanol-preserved Ground Beetles.

Data Exploration and Analysis

Getting Started

In a fresh python environment, run: pip install -r requirements.txt.

CSVs explored in the notebook are pulled directly from Huggingface through their URL (these are pointing to the particular commit for the version). Adjusted CSVs are saved to a data/ folder which is ignored by git since they are too large (versioning requires git lfs, so they are stored on Hugging Face).

Notebooks

EDA-0-1 gives an initial exploration of the data. It adds and renames some columns in the metadata file for the dataset.
EDA-0-2 explores the variation in the measurements of individuals (provides graphs). It also checks the potential outliers and creates a measurement ID, providing a unique ID for the beetle measurement CSV.
EDA-0-3 fixes the outliers that were mislabeled, then generates individual-based CSVs for segmentation and connection to the individual images to be created from the segmentation process.
EDA-0-4 adds "scientificName", "genus", "species", "NEON_sampleID", and "siteID" columns to the resized beetle metadata file to display alongside the resized images in the dataset viewer on HF.

Metadata

all_measurements is a CSV with all the measurements done by each annotator (each row is a pair of measurements for a single beetle).
individual_metadata_full is a CSV with all the measurements done by Isadora Fluck (each row represents an individual beetle with its pair of elytra measurements). This was created for the segmentation process.
multi_annotator_count is a CSV with counts of annotations per image, the expected number (based on the number of rows and annotators associated with that image), and the maximum individual number provided for that image (if max_individual is less than 99, that is the number of individuals in that image; if it's 99 or greater, then there may be more individuals based on the individual count and numeric export from Zooniverse).

Note that all_measurments.csv and individual_metadata_full.csv are supersets of the individual_metadata.csv in 2018 NEON Ethanol-preserved Ground Beetles (they contributed to its creation from BeetleMeasurements.csv), and are thus reproduced here under the CC BY-SA 4.0 license and should be cited appropriately if re-used.

Segmentation

The segmentation folder contains scripts to leverage the elytra length and width coordinates and Meta's Segment-Anything model to segment beetles out.

To configure your environment using conda run:

cd segmentation
conda env create --file environment.yaml
conda activate beetles

To predict segmentation masks for beetles imaged, run: python3 predict_masks.py --images <path to images> --csv <path to image metadata csv> --results <optional; name for csv of segmentation results>

To remove the background of beetle images using their segmentation masks run:

python3 remove_background.py --images <path to images> --masks <path to segmentation masks>

To crop out individual beetles from images run:

python3 individual_beetles.py --images <path to group_images> --csv <path to metadata/individual_metadata_full.csv>

FYI: The script to crop out individual beetles works well for the images that have coords_pix_length and coords_pix_width information correctly align to beetles. However, there are a couple images where this is not the case, and thus the segmentation of beetles will not result in a nice crop of the individual beetles.

To remove the background from the individual images, run:

python3 remove_individual_background.py --images <path to group_images> --result <path to folder where results will be saved>

To crop out elytra from the individual images, run:

python3 segment_elytra.py --images <path to images> --result <path to folder where results will be saved>

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
metadata		metadata
notebooks		notebooks
segmentation		segmentation
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BeetlePalooza Dataset Code

Data Exploration and Analysis

Getting Started

Notebooks

Metadata

Segmentation

About

Releases

Packages

Contributors 3

Languages

License

Imageomics/2018-NEON-beetles-processing

Folders and files

Latest commit

History

Repository files navigation

BeetlePalooza Dataset Code

Data Exploration and Analysis

Getting Started

Notebooks

Metadata

Segmentation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages