Skip to content

Commit

Permalink
Add awesome cli to segy
Browse files Browse the repository at this point in the history
Reorganized the command-line interface of the Segy module. Split the CLI interfaces into independent commands making it easier to manage. These changes involved creating new files for command-specific functions and deleting obsolete ones. Updated references to the new CLI structure in documentation as well.
  • Loading branch information
Altay Sansal committed Feb 29, 2024
1 parent c21e319 commit a55e3c9
Show file tree
Hide file tree
Showing 10 changed files with 250 additions and 220 deletions.
190 changes: 47 additions & 143 deletions docs/cli_usage.md
Original file line number Diff line number Diff line change
@@ -1,167 +1,71 @@
# Command-Line Usage

## Cloud Connection Strings
## Introduction

`segy` supports I/O on major cloud service providers. The cloud I/O capabilities are
supported using the [fsspec](https://filesystem-spec.readthedocs.io/) and its specialized
version for:
`segy` comes with a useful CLI tool to interrogate SEG-Y files either on disk
or any remote store.

- Amazon Web Services (AWS S3) - [s3fs](https://s3fs.readthedocs.io)
- Google Cloud Provider (GCP GCS) - [gcsfs](https://gcsfs.readthedocs.io)
- Microsoft Azure (Datalake Gen2) - [adlfs](https://github.com/fsspec/adlfs)
In the [cli reference] section, you can see all the options.

Any other file-system supported by `fsspec` (like HTTP or FTP) will also be supported
by `segy`. However, we will focus on the major providers here.
## Command Line Usage

The protocols that help choose a backend (i.e. `s3://`, `gs://`, or `az://`) can be passed
prepended to the `segy` path.

The connection string can be passed to the command-line-interface (CLI) using the
`-storage, --storage-options` flag as a JSON string or the Python API with the `storage_options`
keyword argument as a Python dictionary.

````{warning}
On Windows clients, JSON strings are passed to the CLI with a special escape character.
For instance a JSON string:
```json
{"key": "my_super_private_key", "secret": "my_super_private_secret"}
```
must be passed with an escape character `\` for inner quotes as:
```shell
"{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"
```
whereas, on Linux bash this works just fine:
```shell
'{"key": "my_super_private_key", "secret": "my_super_private_secret"}'
```
If this done incorrectly, you will get an invalid JSON string error from the CLI.
````

### Amazon Web Services

Credentials can be automatically fetched from pre-authenticated AWS CLI.
See [here](https://s3fs.readthedocs.io/en/latest/index.html#credentials) for the order `s3fs`
checks them. If it is not pre-authenticated, you need to pass `--storage-options`.

**Prefix:**
`s3://`

**Storage Options:**
`key`: The auth key from AWS
`secret`: The auth secret from AWS

Using UNIX:

```shell
$ segy \
--uri s3://bucket/prefix/my.segy \
--storage-options '{"key": "my_super_private_key", "secret": "my_super_private_secret"}'
```

Using Windows (note the extra escape characters `\`):

```shell
$ segy \
--uri s3://bucket/prefix/my.segy \
--storage-options "{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"
```

### Google Cloud Provider

Credentials can be automatically fetched from pre-authenticated `gcloud` CLI.
See [here](https://gcsfs.readthedocs.io/en/latest/#credentials) for the order `gcsfs`
checks them. If it is not pre-authenticated, you need to pass `--storage-options`.

GCP uses [service accounts](https://cloud.google.com/iam/docs/service-accounts) to pass
authentication information to APIs.

**Prefix:**
`gs://` or `gcs://`

**Storage Options:**
`token`: The service account JSON value as string, or local path to JSON

Using a service account:
SEG-Y provides a convenient command-line-interface (CLI) to do
various tasks.

```shell
$ segy \
--uri gs://bucket/prefix/my.segy
--storage-options '{"token": "~/.config/gcloud/application_default_credentials.json"}'
```
For each command / subcommand you can provide `--help` argument to
get information about usage.

Using browser to populate authentication:
At the highest level, the `segy` command line offers various options
to choose from. Below you can see the usage for the main entry point.

```shell
$ segy \
--uri s3://bucket/prefix/my.segy
--storage-options '{"token": "browser"}'
```{eval-rst}
.. typer:: segy.cli.segy:app
:prog: segy
:width: 90
:theme: dark
:preferred: svg
```

### Microsoft Azure

There are various ways to authenticate with Azure Data Lake (ADL).
See [here](https://github.com/fsspec/adlfs#details) for some details.
If ADL is not pre-authenticated, you need to pass `--storage-options`.
### Dumping Data

**Prefix:**
`az://` or `abfs://`
When we use `segy dump` subcommand, we have some options to choose from.
As usual, the `uri` (local or remote paths) will allow us to use the same
toolkit for local and cloud / web files.

**Storage Options:**
`account_name`: Azure Data Lake storage account name
`account_key`: Azure Data Lake storage account access key

```shell
$ segy \
--uri az://bucket/prefix/my.segy
--storage-options '{"account_name": "myaccount", "account_key": "my_super_private_key"}'
```{eval-rst}
.. typer:: segy.cli.segy:app:dump
:width: 90
:theme: dark
:preferred: svg
```

### Advanced Cloud Features
For instance, we can output a basic summary of the file using the `info`
command.

There are additional functions provided by `fsspec`. These are advanced features and we refer
the user to read `fsspec` [documentation](https://filesystem-spec.readthedocs.io/en/latest/features.html).
Some useful examples are:
```console
$ segy dump info path/to/seismic.segy

- Caching Files Locally
- Remote Write Caching
- File Buffering and random access
- Mount anything with FUSE

````{note}
When combining advanced protocols like `simplecache` and using a remote store like `s3` the
URL can be chained like `simplecache::s3://bucket/prefix/file.segy`. When doing this the
`--storage-options` argument must explicitly state parameters for the cloud backend and the
extra protocol. For the above example it would look like this:
```json
{
"s3": {
"key": "my_super_private_key",
"secret": "my_super_private_secret"
},
"simplecache": {
"cache_storage": "/custom/temp/storage/path"
}
"uri": "path/to/seismic.segy",
"segyStandard": 0.0,
"numTraces": 17367161,
"samplesPerTrace": 1501,
"sampleInterval": 4000,
"fileSize": 103416.97395706177
}
```

In one line:
```json
{"s3": {"key": "my_super_private_key", "secret": "my_super_private_secret"}, "simplecache": {"cache_storage": "/custom/temp/storage/path"}
```
````

## CLI Reference

SEG-Y provides a convenient command-line-interface (CLI) to do
various tasks.
This is how we can get three header fields for a few traces.

For each command / subcommand you can provide `--help` argument to
get information about usage.
```console
$ segy dump trace-header "path/to/seismic.segy" \
--index 0 --index 5 --index 101 --index 12001 \
--field trace_seq_line --field trace_no_field_rec
trace_seq_line src_x src_y

```{eval-rst}
.. click:: segy.__main__:main
:prog: segy
:nested: full
trace_index
0 1 41613223 844759437
5 6 41608435 844763454
101 102 41516509 844840591
12001 1896 39801062 846284951
```
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"sphinx.ext.autosummary",
"sphinxcontrib.autodoc_pydantic",
"sphinx.ext.autosectionlabel",
"sphinx_click",
"sphinxcontrib.typer",
"sphinx_copybutton",
"myst_nb",
"sphinx_design",
Expand Down
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
sphinx==7.2.6
sphinx-design==0.5.0
sphinx-click==5.1.0
sphinxcontrib-typer==0.1.11
sphinx-copybutton==0.5.2
furo==2024.1.29
myst-nb==1.0.0
Expand Down
4 changes: 2 additions & 2 deletions noxfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ def docs_build(session: Session) -> None:
session.install(
"sphinx",
"sphinx-design",
"sphinx-click",
"sphinxcontrib-typer",
"sphinx-copybutton",
"furo",
"myst-nb",
Expand All @@ -222,7 +222,7 @@ def docs(session: Session) -> None:
"sphinx",
"sphinx-design",
"sphinx-autobuild",
"sphinx-click",
"sphinxcontrib-typer",
"sphinx-copybutton",
"furo",
"myst-nb",
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ sphinxcontrib-typer = "^0.1.11"
cloud = ["s3fs", "gcsfs", "adlfs"]

[tool.poetry.scripts]
segy = "segy.__main__:app"
segy = "segy.cli.segy:app"

[tool.ruff]
target-version = "py39"
Expand Down Expand Up @@ -127,5 +127,5 @@ exclude_also = [
]

[build-system]
requires = ["poetry-core"]
requires = ["poetry-core", "fastentrypoints"]
build-backend = "poetry.core.masonry.api"
71 changes: 0 additions & 71 deletions src/segy/__main__.py

This file was deleted.

1 change: 1 addition & 0 deletions src/segy/cli/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Command line interface components."""
47 changes: 47 additions & 0 deletions src/segy/cli/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
"""Common components for the CLI."""


from __future__ import annotations

from pathlib import Path
from typing import Annotated
from typing import Optional
from typing import TypeAlias

import typer

UriArgument: TypeAlias = Annotated[
str, typer.Argument(help="Valid URI for loading the SEG-Y file.")
]

ListOfIntegersOption: TypeAlias = Annotated[
list[int], typer.Option(help="List of integers.")
]

ListOfFieldNamesOption: TypeAlias = Annotated[
Optional[list[str]], typer.Option(default_factory=list, help="List of field names.")
]

JsonFileOutOption: TypeAlias = Annotated[
Optional[Path], typer.Option(help="Path for JSON output.")
]

TextFileOutOption: TypeAlias = Annotated[
Optional[Path], typer.Option(help="Path for text output.")
]


def modify_path(
path: Path, suffix: str, default_extension: str, delimiter: str = "_"
) -> Path:
"""Modify a path with a suffix appended and ensure default extension is honored."""
new_stem = f"{path.stem}{delimiter}{suffix}"

if path.suffix: # If there's an existing extension
extension = path.suffix
extension = default_extension if extension != default_extension else extension
new_name = f"{new_stem}{extension}"
else: # If there's no extension
new_name = f"{new_stem}{default_extension}"

return path.with_name(new_name)
Loading

0 comments on commit a55e3c9

Please sign in to comment.