GitHub - UTD-CRSS/exploreapollo-import-audio

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
src		src
.gitignore		.gitignore
README		README

Repository files navigation

This project contains various scripts for uploading data and files.

To use, copy src/config-sample.py to src/config.py, and fill in the
dummy values. Note that values relating to Flickr are only necessary
for ImageUpload.
To run,
python TransferS3Data.py [channel met_start met_end]
python TransferS3Metrics.py
python AudioUpload.py <local base folder> <S3 base folder>
python MetricsUpload.py <local base folder> <S3 base folder>
python ImageUpload.py flickr <Flickr album id> <S3 folder> <Mission name> [media attachable csv]
python ImageUpload.py local <local base folder> <S3 folder> <Mission name> [media attachable csv]
python ImageUpload.py <media attachable csv>
python storyUpload.py <.csv file with extension>

TransferS3Data.py - Finds transcript and wave files in S3 and puts
audio segment and transcript items in the database. TransferS3Data
takes three optional arguments - channel, met_min, and met_max.
Providing these arguments limits the processed files to those referring
to the given channel, and labeled as starting sometime between met_min
and met_max. Note that this will skip any files with data between
met_min and met_max, but begin before met_min!

TransferS3Metrics.py - Finds .json files in S3 and puts Metric items in
the database. TransferS3Metrics takes three optional arguments - channel,
met_min, and met_max. Providing these arguments limits the processed files to those referring
to the given channel, and labeled as starting sometime between met_min
and met_max. Note that this will skip any files with data between
met_min and met_max, but begin before met_min!

AudioUpload.py - Finds transcript and wave files on the local machine,
uploads them to S3, and puts audio segment and transcript items in the
database. AudioUpload uploads all wav and txt files found in the local base folder,
and preserves the directory structure as much as possible.
If the call looks like
python AudioUpload.py basefolder/ audio/
then a file basefolder/subfolder/file.txt is uploaded as
audio/subfolder/file.txt in S3.

MetricsUpload.py - Finds metric files (.json) on the local machine,
uploads them to S3, and puts metric entries into the database..
MetricsUpload finds all .json files found in the local base folder,
and preserves the directory structure as much as possible.
If the call looks like
python MetricsUpload.py basefolder/ metrics/
then a file basefolder/subfolder/file.txt is uploaded as
metrics/subfolder/file.txt in S3.

ImageUpload.py - When given 'flickr' as the first argument, copies a Flickr
album over to S3 and fills in database
information. Providing the optional csv file will also attach the images
to moments and channels specified in the csv file. For an example of such
a csv file, see src/examples/mediaAssociation. The image titles referenced
there must match those found in Flickr. The script will also upload local
files, by changing the first argument to 'local'. Finally, providing only
a csv file as an argument attaches images already in the database to moments
and channels that are already in the database.

storyUpload.py - Takes a csv file (check out template in examples/Splashdown.csv) and
uploads corresponding moments & story. The story name is the name of the .csv file
(Splashdown.csv -> "Splashdown" story). The moments are listed as rows, and there is one
row used for the description of the story itself. You can have empty rows to separate
the moments for readability. The script will not upload the story and the moments
unless each moment has audio/transcript data somewhere in its (met_start,met_end) interval.
Make sure that the .csv file is in the same directory as the storyUpload.py file. So to run
the Splashdown example first you would have to 'cp examples/Splashdown.csv Splashdown.csv'
then 'python3 storyUpload.py Splashdown'

FileManager.py - no longer used.
ExtractMET.py - used ONLY to create a11-timestamps.csv, an attempt to
extract MET times from a webpage. Read the file for more info.

audio-upload.py - meant to iterate through a directory of audio clips and combine two with
corresponding names before taking a clip from a specified time frame and outputting it as two
5-minute-long files. This is done because multi-channel audio only supports the upload of audio
files 5 minutes or less in duration.

Before running the script, use a .env file to set the appropriate AWS access key and secret key.
After writing those into the .env file call 'source .env' to make them accessible in the python
script.

The script takes command line arguments to run. These arguments are as follows: startTime, endTime,
readDirectory, writeDirectory. Start time and end time are ints and read and write directories are strings
that correspond to directories. Read directory is referring to the local directory the audio is stored
in and write directory is the directory in s3 that the audio is being uploaded to. An example command
could look like this: 'python audio-upload.py 1500 2100 audio Lunar_Sighting' Save the output
of the script for the next step.
This script was made very specifically to work with the audio as we were given it in the fall of 2021,
but components of it can be repurposed if this changes.

For more detailed information, check the wiki page on GitHub at https://github.com/UTD-CRSS/exploreapollo-import-audio/wiki