Skip to content

Commit

Permalink
edit argovis section
Browse files Browse the repository at this point in the history
  • Loading branch information
song-sangmin committed Jun 13, 2024
1 parent 1ff11e7 commit 6ea2013
Showing 1 changed file with 233 additions and 27 deletions.
260 changes: 233 additions & 27 deletions notebooks/argo-access.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@
"\n",
"Building upon previous notebook, [Introduction to Argo](notebooks/argo-introduction.ipynb), we next explore how to access Argo data using various methods.\n",
"\n",
"These methods are described fully on their respective websites, linked below. Our goal here is to provide a brief overview of some of the different tools available. \n",
"These methods are described in more detail on their respective websites, linked below. Our goal here is to provide a brief overview of some of the different tools available. \n",
"\n",
"1. Introducing data formats for Argo profiles\n",
"2. Using [Argopy](https://argopy.readthedocs.io/en/latest/user-guide/fetching-argo-data/index.html), a dedicated Python package\n",
"3. Using [Argovis](https://argovis.colorado.edu/argo) for API-based queries \n",
"1. [GO-BGC Toolbox](https://github.com/go-bgc/workshop-python) \n",
"2. [Argopy](https://argopy.readthedocs.io/en/latest/user-guide/fetching-argo-data/index.html), a dedicated Python package\n",
"3. [Argovis](https://argovis.colorado.edu/argo) for API-based queries \n",
"\n",
"<!-- 2. Downloading [monthly snapshots](http://www.argodatamgt.org/Access-to-data/Argo-DOI-Digital-Object-Identifier) using Argo DOI's -->\n",
"<!-- 4. Using the [GO-BGC Toolbox](https://github.com/go-bgc/workshop-python) -->\n",
Expand Down Expand Up @@ -82,7 +82,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -95,6 +95,11 @@
"import xarray as xr\n",
"from datetime import datetime, timedelta\n",
"\n",
"import requests\n",
"import time\n",
"import urllib3\n",
"import shutil\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib.colors as mcolors\n",
"import seaborn as sns\n",
Expand All @@ -107,44 +112,252 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Background: Common xarray formats"
"## 1. Downloading with the GO-BGC Toolbox\n",
"\n",
"In the previous notebook, [Introduction to Argo](notebooks/argo-introduction.ipynb), we saw how Argo synthetic profile ('[sprof](https://archimer.ifremer.fr/doc/00445/55637/)') data is stored in netcdf4 format.\n",
"\n",
"Using the GDAC function allows you to subset and download Sprof's for multiple floats. \n",
"We recommend this tool for users who only need a few profilesd in a specific area of interest. \n",
"Considerations: \n",
"- Easy to use and understand\n",
"- Downloads float data as individual .nc files to your local machine (takes up storage space)\n",
"- Must download all variables available (cannot subset only variables of interest)\n",
"\n",
"The two major functions below are courtesy of the [GO-BGC Toolbox](https://github.com/go-bgc/workshop-python) (Ethan Campbell). A full tutorial is available in the Toolbox.\n"
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"## 1. Using GO-BGC Toolbox GDAC function\n",
"# # Base filepath. Need for Argo GDAC function.z\n",
"# root = '/Users/sangminsong/Library/CloudStorage/OneDrive-UW/Code/2024_Pythia/'\n",
"# profile_dir = root + 'SOCCOM_GO-BGC_LoResQC_LIAR_28Aug2023_netcdf/'\n",
"\n",
"We recommend this tool for users who only need a few profiles in a specific area of interest. \n",
"Considerations: \n",
"- Easy to use and understand\n",
"- Downloads synthetic profiles"
"# # Base filepath. Need for Argo GDAC function.\n",
"root = '../data/'\n",
"profile_dir = root + 'bgc-argo/'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Argo profiles are "
"### 1.0 GO-BGC Toolbox Functions"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": []
"source": [
"# Function to download a single file (From GO-BGC Toolbox)\n",
"def download_file(url_path,filename,save_to=None,overwrite=False,verbose=True):\n",
" \"\"\" Downloads and saves a file from a given URL using HTTP protocol.\n",
"\n",
" Note: If '404 file not found' error returned, function will return without downloading anything.\n",
" \n",
" Arguments:\n",
" url_path: root URL to download from including trailing slash ('/')\n",
" filename: filename to download including suffix\n",
" save_to: None (to download to root Google Drive GO-BGC directory)\n",
" or directory path\n",
" overwrite: False to leave existing files in place\n",
" or True to overwrite existing files\n",
" verbose: True to announce progress\n",
" or False to stay silent\n",
" \n",
" \"\"\"\n",
" urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)\n",
"\n",
" if save_to is None:\n",
" save_to = root #profile_dir # EDITED HERE\n",
"\n",
" try:\n",
" if filename in os.listdir(save_to):\n",
" if not overwrite:\n",
" if verbose: print('>>> File ' + filename + ' already exists. Leaving current version.')\n",
" return\n",
" else:\n",
" if verbose: print('>>> File ' + filename + ' already exists. Overwriting with new version.')\n",
"\n",
" def get_func(url,stream=True):\n",
" try:\n",
" return requests.get(url,stream=stream,auth=None,verify=False)\n",
" except requests.exceptions.ConnectionError as error_tag:\n",
" print('Error connecting:',error_tag)\n",
" time.sleep(1)\n",
" return get_func(url,stream=stream)\n",
"\n",
" response = get_func(url_path + filename,stream=True)\n",
"\n",
" if response.status_code == 404:\n",
" if verbose: print('>>> File ' + filename + ' returned 404 error during download.')\n",
" return\n",
" with open(save_to + filename,'wb') as out_file:\n",
" shutil.copyfileobj(response.raw,out_file)\n",
" del response\n",
" if verbose: print('>>> Successfully downloaded ' + filename + '.')\n",
"\n",
" except:\n",
" if verbose: print('>>> An error occurred while trying to download ' + filename + '.')"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"# Function to download and parse GDAC synthetic profile index file (GO-BGC Toolbox)\n",
"def argo_gdac(lat_range=None,lon_range=None,start_date=None,end_date=None,sensors=None,floats=None,\n",
" overwrite_index=False,overwrite_profiles=False,skip_download=False,\n",
" download_individual_profs=False,save_to=None,verbose=True):\n",
" \"\"\" Downloads GDAC Sprof index file, then selects float profiles based on criteria.\n",
" Either returns information on profiles and floats (if skip_download=True) or downloads them (if False).\n",
"\n",
" Arguments:\n",
" lat_range: None, to select all latitudes\n",
" or [lower, upper] within -90 to 90 (selection is inclusive)\n",
" lon_range: None, to select all longitudes\n",
" or [lower, upper] within either -180 to 180 or 0 to 360 (selection is inclusive)\n",
" NOTE: longitude range is allowed to cross -180/180 or 0/360\n",
" start_date: None or datetime object\n",
" end_date: None or datetime object\n",
" sensors: None, to select profiles with any combination of sensors\n",
" or string or list of strings to specify required sensors\n",
" > note that common options include PRES, TEMP, PSAL, DOXY, CHLA, BBP700,\n",
" PH_IN_SITU_TOTAL, and NITRATE\n",
" floats: None, to select any floats matching other criteria\n",
" or int or list of ints specifying floats' WMOID numbers\n",
" overwrite_index: False to keep existing downloaded GDAC index file, or True to download new index\n",
" overwrite_profiles: False to keep existing downloaded profile files, or True to download new files\n",
" skip_download: True to skip download and return: (, ,\n",
" )\n",
" or False to download those profiles\n",
" download_individual_profs: False to download single Sprof file containing all profiles for each float\n",
" or True to download individual profile files for each float\n",
" save_to: None to download to Google Drive \"/GO-BGC Workshop/Profiles\" directory\n",
" or string to specify directory path for profile downloads\n",
" verbose: True to announce progress, or False to stay silent\n",
"\n",
" \"\"\"\n",
" # Paths\n",
" url_root = 'https://www.usgodae.org/ftp/outgoing/argo/'\n",
" dac_url_root = url_root + 'dac/'\n",
" index_filename = 'argo_synthetic-profile_index.txt'\n",
" if save_to is None: save_to = root\n",
"\n",
" # Download GDAC synthetic profile index file\n",
" download_file(url_root,index_filename,overwrite=overwrite_index)\n",
"\n",
" # Load index file into Pandas DataFrame\n",
" gdac_index = pd.read_csv(root + index_filename,delimiter=',',header=8,parse_dates=['date','date_update'],\n",
" date_parser=lambda x: pd.to_datetime(x,format='%Y%m%d%H%M%S'))\n",
"\n",
" # Establish time and space criteria\n",
" if lat_range is None: lat_range = [-90.0,90.0]\n",
" if lon_range is None: lon_range = [-180.0,180.0]\n",
" elif lon_range[0] > 180 or lon_range[1] > 180:\n",
" if lon_range[0] > 180: lon_range[0] -= 360\n",
" if lon_range[1] > 180: lon_range[1] -= 360\n",
" if start_date is None: start_date = datetime(1900,1,1)\n",
" if end_date is None: end_date = datetime(2200,1,1)\n",
"\n",
" float_wmoid_regexp = r'[a-z]*/[0-9]*/profiles/[A-Z]*([0-9]*)_[0-9]*[A-Z]*.nc'\n",
" gdac_index['wmoid'] = gdac_index['file'].str.extract(float_wmoid_regexp).astype(int)\n",
" filepath_main_regexp = '([a-z]*/[0-9]*/)profiles/[A-Z]*[0-9]*_[0-9]*[A-Z]*.nc'\n",
" gdac_index['filepath_main'] = gdac_index['file'].str.extract(filepath_main_regexp)\n",
" filepath_regexp = '([a-z]*/[0-9]*/profiles/)[A-Z]*[0-9]*_[0-9]*[A-Z]*.nc'\n",
" gdac_index['filepath'] = gdac_index['file'].str.extract(filepath_regexp)\n",
" filename_regexp = '[a-z]*/[0-9]*/profiles/([A-Z]*[0-9]*_[0-9]*[A-Z]*.nc)'\n",
" gdac_index['filename'] = gdac_index['file'].str.extract(filename_regexp)\n",
"\n",
" # Subset profiles based on time and space criteria\n",
" gdac_index_subset = gdac_index.loc[np.logical_and.reduce([gdac_index['latitude'] >= lat_range[0],\n",
" gdac_index['latitude'] <= lat_range[1],\n",
" gdac_index['date'] >= start_date,\n",
" gdac_index['date'] <= end_date]),:]\n",
" if lon_range[1] >= lon_range[0]: # range does not cross -180/180 or 0/360\n",
" gdac_index_subset = gdac_index_subset.loc[np.logical_and(gdac_index_subset['longitude'] >= lon_range[0],\n",
" gdac_index_subset['longitude'] <= lon_range[1])]\n",
" elif lon_range[1] < lon_range[0]: # range crosses -180/180 or 0/360\n",
" gdac_index_subset = gdac_index_subset.loc[np.logical_or(gdac_index_subset['longitude'] >= lon_range[0],\n",
" gdac_index_subset['longitude'] <= lon_range[1])]\n",
"\n",
" # If requested, subset profiles using float WMOID criteria\n",
" if floats is not None:\n",
" if type(floats) is not list: floats = [floats]\n",
" gdac_index_subset = gdac_index_subset.loc[gdac_index_subset['wmoid'].isin(floats),:]\n",
"\n",
" # If requested, subset profiles using sensor criteria\n",
" if sensors is not None:\n",
" if type(sensors) is not list: sensors = [sensors]\n",
" for sensor in sensors:\n",
" gdac_index_subset = gdac_index_subset.loc[gdac_index_subset['parameters'].str.contains(sensor),:]\n",
"\n",
" # Examine subsetted profiles\n",
" wmoids = gdac_index_subset['wmoid'].unique()\n",
" wmoid_filepaths = gdac_index_subset['filepath_main'].unique()\n",
"\n",
" # Just return list of floats and DataFrame with subset of index file, or download each profile\n",
" if not skip_download:\n",
" downloaded_filenames = []\n",
" if download_individual_profs:\n",
" for p_idx in gdac_index_subset.index:\n",
" download_file(dac_url_root + gdac_index_subset.loc[p_idx]['filepath'],\n",
" gdac_index_subset.loc[p_idx]['filename'],\n",
" save_to=save_to,overwrite=overwrite_profiles,verbose=verbose)\n",
" downloaded_filenames.append(gdac_index_subset.loc[p_idx]['filename'])\n",
" else:\n",
" for f_idx, wmoid_filepath in enumerate(wmoid_filepaths):\n",
" download_file(dac_url_root + wmoid_filepath,str(wmoids[f_idx]) + '_Sprof.nc',\n",
" save_to=save_to,overwrite=overwrite_profiles,verbose=verbose)\n",
" downloaded_filenames.append(str(wmoids[f_idx]) + '_Sprof.nc')\n",
" return wmoids, gdac_index_subset, downloaded_filenames\n",
" else:\n",
" return wmoids, gdac_index_subset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.1 Using GDAC function to access Argo subsets"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Base filepath. Need for Argo GDAC function.\n",
"root = '/Users/sangminsong/Library/CloudStorage/OneDrive-UW/Code/2024_Pythia/'\n",
"profile_dir = root + 'SOCCOM_GO-BGC_LoResQC_LIAR_28Aug2023_netcdf/'"
"# dont download, just get wmoids\n",
"# wmoids, gdac_index = argo_gdac(lat_range=lat_bounds,lon_range=lon_bounds,\n",
"# start_date=start_yd,end_date=end_yd,\n",
"# sensors=None,floats=None,\n",
"# overwrite_index=True,overwrite_profiles=False,\n",
"# skip_download=True,download_individual_profs=False,\n",
"# save_to=profile_dir,verbose=True)\n",
"\n",
"# download specific float #5906030 \n",
"wmoids, gdac_index, downloaded_filenames \\\n",
" = argo_gdac(lat_range=None,lon_range=None,\n",
" start_date=None,end_date=None,\n",
" sensors=None,floats=5906030,\n",
" overwrite_index=True,overwrite_profiles=False,\n",
" skip_download=False,download_individual_profs=False,\n",
" save_to=profile_dir,verbose=True)"
]
},
{
Expand All @@ -162,18 +375,11 @@
"# # DSdict['5906030']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using the `Argopy` Python Package"
"## 2. Using the Argopy Python Package"
]
},
{
Expand All @@ -187,7 +393,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Querying Data with `Argovis`"
"## 3. Querying Data with Argovis"
]
},
{
Expand Down

0 comments on commit 6ea2013

Please sign in to comment.