Skip to content

Commit

Permalink
Use temp .netrc file for integration tests
Browse files Browse the repository at this point in the history
Fixes nsidc#806
Fixes nsidc#743
Fixes nsidc#480
  • Loading branch information
chuckwondo committed Sep 21, 2024
1 parent 2939f80 commit be0da17
Show file tree
Hide file tree
Showing 15 changed files with 270 additions and 294 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ htmlcov
dist
site
.coverage
.coverage.*
coverage.xml
.netlify
test.db
Expand Down
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@
instead ([#766](https://github.com/nsidc/earthaccess/issues/766))
([**@Sherwin-14**](https://github.com/Sherwin-14),
[**@chuckwondo**](https://github.com/chuckwondo))
- Use built-in `assert` statement in integration tests
([#743](https://github.com/nsidc/earthaccess/issues/743))
([**@chuckwondo**](https://github.com/chuckwondo))

### Added

Expand All @@ -25,6 +28,9 @@
[**@chuckwondo**](https://github.com/chuckwondo),
[**@mfisher87**](https://github.com/mfisher87),
[**@betolink**](https://github.com/betolink))
- Support use of `NETRC` environment variable to override default `.netrc` file
location ([#480](https://github.com/nsidc/earthaccess/issues/480))
([**@chuckwondo**](https://github.com/chuckwondo))

- Added example PR links to pull request template
([#756](https://github.com/nsidc/earthaccess/issues/756))
Expand All @@ -38,6 +44,9 @@
- Removed Broken Link "Introduction to NASA earthaccess"
([#779](https://github.com/nsidc/earthaccess/issues/779))
([**@Sherwin-14**](https://github.com/Sherwin-14))
- Integration tests no longer clobber existing `.netrc` file
([#806](https://github.com/nsidc/earthaccess/issues/806))
([**@chuckwondo**](https://github.com/chuckwondo))

## [0.10.0] 2024-07-19

Expand Down
62 changes: 39 additions & 23 deletions docs/howto/authenticate.md
Original file line number Diff line number Diff line change
@@ -1,74 +1,90 @@
## Authenticate with Earthdata Login
# Authenticate with Earthdata Login

The first step to use NASA Earthdata is to create an account with Earthdata Login, please follow the instructions at [NASA EDL](https://urs.earthdata.nasa.gov/)
The first step to use NASA Earthdata is to create an account with Earthdata
Login, please follow the instructions at
[NASA EDL](https://urs.earthdata.nasa.gov/)

Once registered, earthaccess can use environment variables, a `.netrc` file or interactive input from a user to login with NASA EDL.
Once registered, earthaccess can use environment variables, a `.netrc` file or
interactive input from a user to login with NASA EDL.

If a strategy is not especified, env vars will be used first, then netrc and finally user's input.
If a strategy is not specified, environment variables will be used first, then
a `.netrc` (if found, see below), and finally a user's input.

```py
import earthaccess

auth = earthaccess.login()
```

If you have a .netrc file with your Earthdata Login credentials
If you have a `.netrc` file (see below) with your Earthdata Login credentials,
you can explicitly specify its use:

```py
auth = earthaccess.login(strategy="netrc")
```

If your Earthdata Login credentials are set as environment variables: EARTHDATA_USERNAME, EARTHDATA_PASSWORD
If your Earthdata Login credentials are set as the environment variables
`EARTHDATA_USERNAME` and `EARTHDATA_PASSWORD`, you can explicitly specify their
use:

```py
auth = earthaccess.login(strategy="environment")
```

If you wish to enter your Earthdata Login credentials when prompted with optional persistence to .netrc
If you wish to enter your Earthdata Login credentials when prompted, with
optional persistence to your `.netrc` file (see below), specify the interactive
strategy:

```py
auth = earthaccess.login(strategy="interactive", persist=True)
```

## Authentication

By default, `earthaccess` with automatically look for your EDL account
credentials in two locations:

### **Authentication**
1. A `.netrc` file: By default, this is either `~/_netrc` (on a Windows system)
or `~/.netrc` (on a non-Windows system). On *any* system, you may override
the default location by setting the `NETRC` environment variable to the path
of your desired `.netrc` file.

By default, `earthaccess` with automatically look for your EDL account credentials in two locations:

1. A `~/.netrc` file
**NOTE**: When setting the `NETRC` environment variable, there is no
requirement to use a specific filename. The name `.netrc` is common, but
used throughout documentation primarily for convenience. The only
requirement is that the *contents* of the file adhere to the
[`.netrc` file format](https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html).
2. `EARTHDATA_USERNAME` and `EARTHDATA_PASSWORD` environment variables

If neither of these options are configured, you can authenticate by calling the `earthaccess.login()` method
and manually entering your EDL account credentials.
If neither of these options are configured, you can authenticate by calling the
`earthaccess.login()` method and manually entering your EDL account credentials.

```python
import earthaccess

earthaccess.login()
```

Note you can pass `persist=True` to `earthaccess.login()` to have the EDL account credentials you enter
automatically saved to a `~/.netrc` file for future use.

Note you can pass `persist=True` to `earthaccess.login()` to have the EDL
account credentials you enter automatically saved to your `.netrc` file (see
above) for future use.

Once you are authenticated with NASA EDL you can:

* Get a file from a DAAC using a `fsspec` session.
* Request temporary S3 credentials from a particular DAAC (needed to download or stream data from an S3 bucket in the cloud).
* Request temporary S3 credentials from a particular DAAC (needed to download or
stream data from an S3 bucket in the cloud).
* Use the library to download or stream data directly from S3.
* Regenerate CMR tokens (used for restricted datasets).

## Earthdata User Acceptance Testing (UAT) environment

### Earthdata User Acceptance Testing (UAT) environment

If your EDL account is authorized to access the User Acceptance Testing (UAT) system,
you can set earthaccess to work with its EDL and CMR endpoints
by setting the `system` argument at login, as follows:
If your EDL account is authorized to access the User Acceptance Testing (UAT)
system, you can set earthaccess to work with its EDL and CMR endpoints by
setting the `system` argument at login, as follows:

```python
import earthaccess

earthaccess.login(system=earthaccess.UAT)

```
4 changes: 3 additions & 1 deletion earthaccess/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
)
from .auth import Auth
from .kerchunk import consolidate_metadata
from .search import DataCollections, DataGranules
from .search import DataCollection, DataCollections, DataGranule, DataGranules
from .services import DataServices
from .store import Store
from .system import PROD, UAT
Expand All @@ -46,7 +46,9 @@
"download",
"auth_environ",
# search.py
"DataGranule",
"DataGranules",
"DataCollection",
"DataCollections",
"DataServices",
# auth.py
Expand Down
53 changes: 42 additions & 11 deletions earthaccess/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,24 @@
logger = logging.getLogger(__name__)


def netrc_path() -> Path:
"""Return the path of the `.netrc` file.
The path may or may not exist.
See [the `.netrc` file](https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html).
Returns:
`Path` of the `NETRC` environment variable, if the value is non-empty;
otherwise, the path of the platform-specific default location:
`~/_netrc` on Windows systems, `~/.netrc` on non-Windows systems.
"""
sys_netrc_name = "_netrc" if platform.system() == "Windows" else ".netrc"
env_netrc = os.environ.get("NETRC")

return Path(env_netrc) if env_netrc else Path.home() / sys_netrc_name


class SessionWithHeaderRedirection(requests.Session):
"""Requests removes auth headers if the redirect happens outside the
original req domain.
Expand Down Expand Up @@ -104,11 +122,12 @@ def login(
if self.authenticated and (system == self.system):
logger.debug("We are already authenticated with NASA EDL")
return self

if strategy == "interactive":
self._interactive(persist)
if strategy == "netrc":
elif strategy == "netrc":
self._netrc()
if strategy == "environment":
elif strategy == "environment":
self._environment()

return self
Expand Down Expand Up @@ -222,25 +241,29 @@ def _interactive(self, persist_credentials: bool = False) -> bool:
if authenticated:
logger.debug("Using user provided credentials for EDL")
if persist_credentials:
logger.info("Persisting credentials to .netrc")
self._persist_user_credentials(username, password)
return authenticated

def _netrc(self) -> bool:
netrc_loc = netrc_path()

try:
my_netrc = Netrc()
my_netrc = Netrc(str(netrc_loc))
except FileNotFoundError as err:
raise FileNotFoundError(f"No .netrc found in {Path.home()}") from err
raise FileNotFoundError(f"No .netrc found at {netrc_loc}") from err
except NetrcParseError as err:
raise NetrcParseError("Unable to parse .netrc") from err
raise NetrcParseError(f"Unable to parse .netrc file {netrc_loc}") from err

if (creds := my_netrc[self.system.edl_hostname]) is None:
return False

username = creds["login"]
password = creds["password"]
authenticated = self._get_credentials(username, password)

if authenticated:
logger.debug("Using .netrc file for EDL")

return authenticated

def _environment(self) -> bool:
Expand Down Expand Up @@ -293,33 +316,41 @@ def _find_or_create_token(self, username: str, password: str) -> Any:

def _persist_user_credentials(self, username: str, password: str) -> bool:
# See: https://github.com/sloria/tinynetrc/issues/34

netrc_loc = netrc_path()
logger.info(f"Persisting credentials to {netrc_loc}")

try:
netrc_path = Path().home().joinpath(".netrc")
netrc_path.touch(exist_ok=True)
netrc_path.chmod(0o600)
netrc_loc.touch(exist_ok=True)
netrc_loc.chmod(0o600)
except Exception as e:
logger.error(e)
return False
my_netrc = Netrc(str(netrc_path))

my_netrc = Netrc(str(netrc_loc))
my_netrc[self.system.edl_hostname] = {
"login": username,
"password": password,
}
my_netrc.save()

urs_cookies_path = Path.home() / ".urs_cookies"

if not urs_cookies_path.exists():
urs_cookies_path.write_text("")

# Create and write to .dodsrc file
dodsrc_path = Path.home() / ".dodsrc"

if not dodsrc_path.exists():
dodsrc_contents = (
f"HTTP.COOKIEJAR={urs_cookies_path}\nHTTP.NETRC={netrc_path}"
f"HTTP.COOKIEJAR={urs_cookies_path}\nHTTP.NETRC={netrc_loc}"
)
dodsrc_path.write_text(dodsrc_contents)

if platform.system() == "Windows":
local_dodsrc_path = Path.cwd() / dodsrc_path.name

if not local_dodsrc_path.exists():
shutil.copy2(dodsrc_path, local_dodsrc_path)

Expand Down
34 changes: 20 additions & 14 deletions earthaccess/kerchunk.py
Original file line number Diff line number Diff line change
@@ -1,34 +1,39 @@
from __future__ import annotations

from typing import Optional, Union

import fsspec
import fsspec.utils
import s3fs

import earthaccess


def _get_chunk_metadata(
granule: earthaccess.results.DataGranule,
fs: fsspec.AbstractFileSystem | s3fs.S3FileSystem,
granule: earthaccess.DataGranule,
fs: fsspec.AbstractFileSystem,
) -> list[dict]:
from kerchunk.hdf import SingleHdf5ToZarr

metadata = []
access = "direct" if isinstance(fs, s3fs.S3FileSystem) else "indirect"

for url in granule.data_links(access=access):
with fs.open(url) as inf:
h5chunks = SingleHdf5ToZarr(inf, url)
m = h5chunks.translate()
metadata.append(m)

return metadata


def consolidate_metadata(
granules: list[earthaccess.results.DataGranule],
kerchunk_options: dict | None = None,
granules: list[earthaccess.DataGranule],
kerchunk_options: Optional[dict] = None,
access: str = "direct",
outfile: str | None = None,
storage_options: dict | None = None,
) -> str | dict:
outfile: Optional[str] = None,
storage_options: Optional[dict] = None,
) -> Union[str, dict]:
try:
import dask

Expand All @@ -44,15 +49,16 @@ def consolidate_metadata(
fs = earthaccess.get_fsspec_https_session()

# Get metadata for each granule
get_chunk_metadata = dask.delayed(_get_chunk_metadata)
chunks = dask.compute(*[get_chunk_metadata(g, fs) for g in granules])
get_chunk_metadata = dask.delayed(_get_chunk_metadata) # type: ignore
chunks = dask.compute(*[get_chunk_metadata(g, fs) for g in granules]) # type: ignore
chunks = sum(chunks, start=[])

# Get combined metadata object
mzz = MultiZarrToZarr(chunks, **(kerchunk_options or {}))
if outfile is not None:
output = fsspec.utils.stringify_path(outfile)
mzz.translate(outfile, storage_options=storage_options or {})
return output
else:

if outfile is None:
return mzz.translate()

output = fsspec.utils.stringify_path(outfile)
mzz.translate(outfile, storage_options=storage_options or {})
return output
2 changes: 1 addition & 1 deletion scripts/integration-test.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env bash

set -x
pytest tests/integration --cov=earthaccess --cov=tests/integration --cov-report=term-missing ${@} --capture=no --tb=native --log-cli-level=INFO
pytest tests/integration --cov=earthaccess --cov=tests/integration --cov-report=term-missing "${@}" --capture=no --tb=native --log-cli-level=INFO
RET=$?
set +x

Expand Down
Loading

0 comments on commit be0da17

Please sign in to comment.