Skip to content

Sage-Bionetworks/fs-synapse

 
 

Repository files navigation

fs-synapse

A Synapse implementation of the PyFileSystem2 interface.

fs-synapse allows us to leverage the PyFileSystem API to interface with Synapse files, folders, and projects. By learning this API, you can write code that is agnostic to where your files are physically located. This is achieved by referring to Synapse entities using URLs. Commented examples are included below, but more details can be found here.

syn://syn50545516               # Synapse project

syn://syn50557597               # Folder in the above Synapse project
syn://syn50545516/syn50557597   # Same folder, but using a full path
syn://syn50545516/TestSubDir    # Same folder, but referenced by name

syn://syn50555279               # File in the above Synapse project
syn://syn50545516/syn50555279   # Same file, but using a full path
syn://syn50545516/test.txt      # Same file, but referenced by name

syn://syn50545516/ExploratoryTests/report.json      # Nested file

Benefits

There are several benefits to using the fs-synapse API over synapseclient.

from fs import open_fs

fs = open_fs("syn://")

Interact with Synapse using a Pythonic interface

This guide provides several code examples for various use cases.

file_url = "syn://syn50555279"

with fs.open(file_url, "a") as fp:
    fp.write("Appending some text to a Synapse file")

Access to several convenience functions

The full list of available functions are listed here.

folder_url = "syn://syn50696438"

fs.makedirs(f"{folder_url}/creating/nested/folders/with/one/operation")

Refer to Synapse files and folders by name

You don't have to track as many Synapse IDs. You only need to care about the top-level projects or folders and refer to subfolders and files by name.

project_url = "syn://syn50545516"

data_url = f"{project_url}/data/raw.csv"
output_url = f"{project_url}/outputs/processed.csv"

with fs.open(data_url, "r") as data_fp, fs.open(output_url, "a") as output_fp:
    results = number_cruncher(data)
    output.write(results)

Write Synapse-agnostic code

Unfortunately, every time you use synapseclient for file and folder operations, you are hard-coding a dependency on Synapse into your project. Leveraging fs-synapse helps avoid this hard dependency and makes your code more portable to other file backends (e.g. S3). You can swap for any other file system by using their URL scheme (e.g. s3://). Here's an index of available file systems that you can swap for.

Rely on code covered by integration tests

So you don't have to write the Synapse integration tests yourself! These tests tend to be slow, so delegating that responsibilty to an externally managed package like fs-synapse keeps your test suite fast and focused on what you care about.

In your test code, you can use mem:// or temp:// URLs for faster I/O instead of storing and retrieving files on Synapse (MemoryFS and TempFS).

def test_some_feature_of_your_code():
    output_url = "mem://report.json"
    cruncher = NumberCruncher()
    cruncher.save(output_url)
    assert cruncher.fs.exists(output_url)

PyScaffold

This project has been set up using PyScaffold 4.3. For details and usage information on PyScaffold see https://pyscaffold.org/.

putup --name fs-synapse --markdown --github-actions --pre-commit --license Apache-2.0 fs-synapse