-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reorganized the command-line interface of the Segy module. Split the CLI interfaces into independent commands making it easier to manage. These changes involved creating new files for command-specific functions and deleting obsolete ones. Updated references to the new CLI structure in documentation as well.
- Loading branch information
Altay Sansal
committed
Feb 29, 2024
1 parent
c21e319
commit a55e3c9
Showing
10 changed files
with
250 additions
and
220 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,167 +1,71 @@ | ||
# Command-Line Usage | ||
|
||
## Cloud Connection Strings | ||
## Introduction | ||
|
||
`segy` supports I/O on major cloud service providers. The cloud I/O capabilities are | ||
supported using the [fsspec](https://filesystem-spec.readthedocs.io/) and its specialized | ||
version for: | ||
`segy` comes with a useful CLI tool to interrogate SEG-Y files either on disk | ||
or any remote store. | ||
|
||
- Amazon Web Services (AWS S3) - [s3fs](https://s3fs.readthedocs.io) | ||
- Google Cloud Provider (GCP GCS) - [gcsfs](https://gcsfs.readthedocs.io) | ||
- Microsoft Azure (Datalake Gen2) - [adlfs](https://github.com/fsspec/adlfs) | ||
In the [cli reference] section, you can see all the options. | ||
|
||
Any other file-system supported by `fsspec` (like HTTP or FTP) will also be supported | ||
by `segy`. However, we will focus on the major providers here. | ||
## Command Line Usage | ||
|
||
The protocols that help choose a backend (i.e. `s3://`, `gs://`, or `az://`) can be passed | ||
prepended to the `segy` path. | ||
|
||
The connection string can be passed to the command-line-interface (CLI) using the | ||
`-storage, --storage-options` flag as a JSON string or the Python API with the `storage_options` | ||
keyword argument as a Python dictionary. | ||
|
||
````{warning} | ||
On Windows clients, JSON strings are passed to the CLI with a special escape character. | ||
For instance a JSON string: | ||
```json | ||
{"key": "my_super_private_key", "secret": "my_super_private_secret"} | ||
``` | ||
must be passed with an escape character `\` for inner quotes as: | ||
```shell | ||
"{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}" | ||
``` | ||
whereas, on Linux bash this works just fine: | ||
```shell | ||
'{"key": "my_super_private_key", "secret": "my_super_private_secret"}' | ||
``` | ||
If this done incorrectly, you will get an invalid JSON string error from the CLI. | ||
```` | ||
|
||
### Amazon Web Services | ||
|
||
Credentials can be automatically fetched from pre-authenticated AWS CLI. | ||
See [here](https://s3fs.readthedocs.io/en/latest/index.html#credentials) for the order `s3fs` | ||
checks them. If it is not pre-authenticated, you need to pass `--storage-options`. | ||
|
||
**Prefix:** | ||
`s3://` | ||
|
||
**Storage Options:** | ||
`key`: The auth key from AWS | ||
`secret`: The auth secret from AWS | ||
|
||
Using UNIX: | ||
|
||
```shell | ||
$ segy \ | ||
--uri s3://bucket/prefix/my.segy \ | ||
--storage-options '{"key": "my_super_private_key", "secret": "my_super_private_secret"}' | ||
``` | ||
|
||
Using Windows (note the extra escape characters `\`): | ||
|
||
```shell | ||
$ segy \ | ||
--uri s3://bucket/prefix/my.segy \ | ||
--storage-options "{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}" | ||
``` | ||
|
||
### Google Cloud Provider | ||
|
||
Credentials can be automatically fetched from pre-authenticated `gcloud` CLI. | ||
See [here](https://gcsfs.readthedocs.io/en/latest/#credentials) for the order `gcsfs` | ||
checks them. If it is not pre-authenticated, you need to pass `--storage-options`. | ||
|
||
GCP uses [service accounts](https://cloud.google.com/iam/docs/service-accounts) to pass | ||
authentication information to APIs. | ||
|
||
**Prefix:** | ||
`gs://` or `gcs://` | ||
|
||
**Storage Options:** | ||
`token`: The service account JSON value as string, or local path to JSON | ||
|
||
Using a service account: | ||
SEG-Y provides a convenient command-line-interface (CLI) to do | ||
various tasks. | ||
|
||
```shell | ||
$ segy \ | ||
--uri gs://bucket/prefix/my.segy | ||
--storage-options '{"token": "~/.config/gcloud/application_default_credentials.json"}' | ||
``` | ||
For each command / subcommand you can provide `--help` argument to | ||
get information about usage. | ||
|
||
Using browser to populate authentication: | ||
At the highest level, the `segy` command line offers various options | ||
to choose from. Below you can see the usage for the main entry point. | ||
|
||
```shell | ||
$ segy \ | ||
--uri s3://bucket/prefix/my.segy | ||
--storage-options '{"token": "browser"}' | ||
```{eval-rst} | ||
.. typer:: segy.cli.segy:app | ||
:prog: segy | ||
:width: 90 | ||
:theme: dark | ||
:preferred: svg | ||
``` | ||
|
||
### Microsoft Azure | ||
|
||
There are various ways to authenticate with Azure Data Lake (ADL). | ||
See [here](https://github.com/fsspec/adlfs#details) for some details. | ||
If ADL is not pre-authenticated, you need to pass `--storage-options`. | ||
### Dumping Data | ||
|
||
**Prefix:** | ||
`az://` or `abfs://` | ||
When we use `segy dump` subcommand, we have some options to choose from. | ||
As usual, the `uri` (local or remote paths) will allow us to use the same | ||
toolkit for local and cloud / web files. | ||
|
||
**Storage Options:** | ||
`account_name`: Azure Data Lake storage account name | ||
`account_key`: Azure Data Lake storage account access key | ||
|
||
```shell | ||
$ segy \ | ||
--uri az://bucket/prefix/my.segy | ||
--storage-options '{"account_name": "myaccount", "account_key": "my_super_private_key"}' | ||
```{eval-rst} | ||
.. typer:: segy.cli.segy:app:dump | ||
:width: 90 | ||
:theme: dark | ||
:preferred: svg | ||
``` | ||
|
||
### Advanced Cloud Features | ||
For instance, we can output a basic summary of the file using the `info` | ||
command. | ||
|
||
There are additional functions provided by `fsspec`. These are advanced features and we refer | ||
the user to read `fsspec` [documentation](https://filesystem-spec.readthedocs.io/en/latest/features.html). | ||
Some useful examples are: | ||
```console | ||
$ segy dump info path/to/seismic.segy | ||
|
||
- Caching Files Locally | ||
- Remote Write Caching | ||
- File Buffering and random access | ||
- Mount anything with FUSE | ||
|
||
````{note} | ||
When combining advanced protocols like `simplecache` and using a remote store like `s3` the | ||
URL can be chained like `simplecache::s3://bucket/prefix/file.segy`. When doing this the | ||
`--storage-options` argument must explicitly state parameters for the cloud backend and the | ||
extra protocol. For the above example it would look like this: | ||
```json | ||
{ | ||
"s3": { | ||
"key": "my_super_private_key", | ||
"secret": "my_super_private_secret" | ||
}, | ||
"simplecache": { | ||
"cache_storage": "/custom/temp/storage/path" | ||
} | ||
"uri": "path/to/seismic.segy", | ||
"segyStandard": 0.0, | ||
"numTraces": 17367161, | ||
"samplesPerTrace": 1501, | ||
"sampleInterval": 4000, | ||
"fileSize": 103416.97395706177 | ||
} | ||
``` | ||
|
||
In one line: | ||
```json | ||
{"s3": {"key": "my_super_private_key", "secret": "my_super_private_secret"}, "simplecache": {"cache_storage": "/custom/temp/storage/path"} | ||
``` | ||
```` | ||
|
||
## CLI Reference | ||
|
||
SEG-Y provides a convenient command-line-interface (CLI) to do | ||
various tasks. | ||
This is how we can get three header fields for a few traces. | ||
|
||
For each command / subcommand you can provide `--help` argument to | ||
get information about usage. | ||
```console | ||
$ segy dump trace-header "path/to/seismic.segy" \ | ||
--index 0 --index 5 --index 101 --index 12001 \ | ||
--field trace_seq_line --field trace_no_field_rec | ||
trace_seq_line src_x src_y | ||
|
||
```{eval-rst} | ||
.. click:: segy.__main__:main | ||
:prog: segy | ||
:nested: full | ||
trace_index | ||
0 1 41613223 844759437 | ||
5 6 41608435 844763454 | ||
101 102 41516509 844840591 | ||
12001 1896 39801062 846284951 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
"""Command line interface components.""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
"""Common components for the CLI.""" | ||
|
||
|
||
from __future__ import annotations | ||
|
||
from pathlib import Path | ||
from typing import Annotated | ||
from typing import Optional | ||
from typing import TypeAlias | ||
|
||
import typer | ||
|
||
UriArgument: TypeAlias = Annotated[ | ||
str, typer.Argument(help="Valid URI for loading the SEG-Y file.") | ||
] | ||
|
||
ListOfIntegersOption: TypeAlias = Annotated[ | ||
list[int], typer.Option(help="List of integers.") | ||
] | ||
|
||
ListOfFieldNamesOption: TypeAlias = Annotated[ | ||
Optional[list[str]], typer.Option(default_factory=list, help="List of field names.") | ||
] | ||
|
||
JsonFileOutOption: TypeAlias = Annotated[ | ||
Optional[Path], typer.Option(help="Path for JSON output.") | ||
] | ||
|
||
TextFileOutOption: TypeAlias = Annotated[ | ||
Optional[Path], typer.Option(help="Path for text output.") | ||
] | ||
|
||
|
||
def modify_path( | ||
path: Path, suffix: str, default_extension: str, delimiter: str = "_" | ||
) -> Path: | ||
"""Modify a path with a suffix appended and ensure default extension is honored.""" | ||
new_stem = f"{path.stem}{delimiter}{suffix}" | ||
|
||
if path.suffix: # If there's an existing extension | ||
extension = path.suffix | ||
extension = default_extension if extension != default_extension else extension | ||
new_name = f"{new_stem}{extension}" | ||
else: # If there's no extension | ||
new_name = f"{new_stem}{default_extension}" | ||
|
||
return path.with_name(new_name) |
Oops, something went wrong.