#1140 - List Files #1626

devclinton · 2021-09-07T15:33:13Z

#1140
#1625

TODO add pytest-lazxfixture

TODO add ticket for support PathLink on download to path

TODO Add support for exclude filters to filter assets/files

TODO add ticket to add to local platform. Can be tackled outside of this PR

Depends on https://github.com/InstituteforDiseaseModeling/pyCOMPS/issues/39 for tests

#1652 - COMPS Asset Collection return

Update old comps download command

devclinton · 2021-09-07T15:36:55Z

@shchen-idmod, @ZDu-IDM this is core functionality to mainly replace the download command. It adds new APIs to the platform to allow fetching files. This should be useful in many other situations as well since we get nice pythonic code to iterate files like this

for asset in experiment.list_assets(filters=["*.txt"])
     asset.download_to_path(output_path)
  
for simulation in experiment.simulations:
    for asset in simulation.list_files(filters=["*.txt"])
       asset.download_to_path(output_path)

and

files = experiment.list_children(filters=[".txt"]
for simulation, files in files.items():
    for file in files:
       file.download_to_path(f"{experiment.id}/{simulation.id}/")

Another thing we could do is have users have alternates to analyzers. For example

# get csvs
files = experiment.list_children(filters=[".csv"]
content = []
for simulation, files in files.items():
    for file in files:
       #create data frames(select data)
       content.append(pd.read_csv(file.download_stream().decode('utf-8')))
# aggregate into one file
df = pd.concat(content)
df.to_csv('all.csv')

shchen-idmod · 2021-09-07T17:18:38Z

idmtools_platform_comps/tests/test_experiment_operations.py, test_list_assets used to return 5 files for the experiment, but now only return 1 file. is this new?

devclinton · 2021-09-07T17:47:15Z

idmtools_platform_comps/tests/test_experiment_operations.py, test_list_assets used to return 5 files for the experiment, but now only return 1 file. is this new?

Yes. Before, the assets would be duplicates. For example, an experiment with 5 simulations containing using the asset model1.py would contain 5 copies of model1.py

devclinton · 2021-09-07T17:49:20Z

I still need to update tests for this. Also, a note for you sharon. I tried using fixtures here to see how we could leverage these on COMPS specifically to reuse test data more often. This has some advantages of making some tests cleaner when we want to verify on multiple COMPS environments. I am not tied to it for this, but thought it would be worth exploring as I developed this

shchen-idmod · 2021-09-07T17:50:58Z

idmtools_platform_comps/tests/test_experiment_operations.py, test_list_assets used to return 5 files for the experiment, but now only return 1 file. is this new?

Yes. Before, the assets would be duplicates. For example, an experiment with 5 simulations containing using the asset model1.py would contain 5 copies of model1.py

before it was not 5 copies of model1.py. they are one model1.py and 4 different config.json from 4 simulations.

devclinton · 2021-09-07T18:26:37Z

idmtools_platform_comps/tests/test_experiment_operations.py, test_list_assets used to return 5 files for the experiment, but now only return 1 file. is this new?

Yes. Before, the assets would be duplicates. For example, an experiment with 5 simulations containing using the asset model1.py would contain 5 copies of model1.py

before it was not 5 copies of model1.py. they are one model1.py and 4 different config.json from 4 simulations.

Ah yes, those were incorrectly being listed under list_assets since they are not technically assets, but instead input files. They now show under list_files instead.

shchen-idmod · 2021-09-07T22:57:40Z

idmtools_platform_comps/tests/test_file_listings.py

+    assert EXPECTED_ASSETS_PER_MODEL1_EXPERIMENT == len(experiment_assets)
+    assert experiment_assets[0].filename == "model1.py"
+    assert experiment_assets[0].absolute_path is None
+    assert experiment_assets[0].checksum == UUID('926ee7aa-4f29-bc94-ff99-8c5f83a6d85a')


I got different UUID:
test_file_listings.py:91 (test_list_files_experiment[python_files_and_assets_slurm2])
UUID('8e02e44c-5754-c79c-2b5c-9ffc5278c59c') != UUID('926ee7aa-4f29-bc94-ff99-8c5f83a6d85a')
I am using windows

With files, there is no true UUID. Uuid is a checksum of the object

shchen-idmod · 2021-09-07T23:37:40Z

idmtools_platform_comps/tests/test_experiment_operations.py, test_list_assets used to return 5 files for the experiment, but now only return 1 file. is this new?

Yes. Before, the assets would be duplicates. For example, an experiment with 5 simulations containing using the asset model1.py would contain 5 copies of model1.py

before it was not 5 copies of model1.py. they are one model1.py and 4 different config.json from 4 simulations.

Ah yes, those were incorrectly being listed under list_assets since they are not technically assets, but instead input files. They now show under list_files instead.

So we need to update this testcase

shchen-idmod · 2021-09-07T23:56:53Z

examples/cookbook/download_files_from_comps.py download output.zip file now(used to unziped file) which is OK. but I can not unzip output.zip file(always throw some error)

shchen-idmod · 2021-09-08T00:19:25Z

idmtools_platform_comps/tests/test_download_workitem.py has 2 tests failed:
test_comps_download()
test_inputs_as_id_files()
we can add extract_after_download=True, to DownloadWorkItem call to fix these tests

shchen-idmod · 2021-09-08T19:21:02Z

simulation_operations.py list_files with filters fail as
linked_files = self.platform._simulations.list_files(sim, filters=['*.json']):

        eid = "e046967b-d110-ec11-92df-f0921c167864"
        e_p: Experiment = Experiment.from_id(eid)
        with self.subTest("test_list_files_simulations"):
            for sim in e_p.simulations:
                linked_files = self.platform._simulations.list_files(sim, filters=['*.json'])

with error:

..\idmtools_platform_comps\comps_operations\simulation_operations.py:653: in list_files
    return apply_file_filters(simulation._comps_metadata['output_files'], filters)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

assets = [<Asset: output\result.json from None>, <Asset: config.json from None>, <Asset: stderr.txt from None>, <Asset: stdout.txt from None>]
filters = ['*.json']

    def apply_file_filters(assets: List['Asset'], filters: TFILE_FILTER_TYPE) -> List['Asset']:
        """
        Apply a list of filter functions to a list of assets.
    
        Args:
            assets: Assets to apply functions to
            filters: List of functions
    
        Returns:
            List of filtered assets
        """
        if filters:
            result = []
            for file in assets:
                for file_filter in filters:
>                   if file_filter(file.short_remote_path()):
E                   TypeError: 'str' object is not callable

..\..\idmtools_core\idmtools\utils\filters\asset_filters.py:151: TypeError

shchen-idmod · 2021-09-08T19:45:49Z

experiment_operations.py list_files return wrong files. following 2 cases should return same number of files: but second case return 0 file for all_linked_files = self.platform._experiments.list_files(e_p)

            eid = "e046967b-d110-ec11-92df-f0921c167864"
            e_p: Experiment = Experiment.from_id(eid)       
            with self.subTest("test_list_files_and_filters_experiment"):
                all_linked_files = self.platform._experiments.list_files(e_p, children=True)
                self.assertEqual(16, len(all_linked_files))

                all_linked_files = self.platform._experiments.list_files(e_p)
               self.assertEqual(16, len(all_linked_files))

devclinton · 2021-09-08T21:11:54Z

experiment_operations.py list_files return wrong files. following 2 cases should return same number of files: but second case return 0 file for all_linked_files = self.platform._experiments.list_files(e_p)

            eid = "e046967b-d110-ec11-92df-f0921c167864"
            e_p: Experiment = Experiment.from_id(eid)       
            with self.subTest("test_list_files_and_filters_experiment"):
                all_linked_files = self.platform._experiments.list_files(e_p, children=True)
                self.assertEqual(16, len(all_linked_files))

                all_linked_files = self.platform._experiments.list_files(e_p)
               self.assertEqual(16, len(all_linked_files))

I see a difference in the two calls. One is called to list files for experiment and the other is listing files for experiment with children. The second(without children) will always return 0 files for COMPs since it does not support files(non shared assets) on experiments.

ZDu-IDM · 2021-09-09T16:42:49Z

@devclinton
Some comments for your reference.

1. docstr: simulation or experiment?

File: idmtools_core/idmtools/entities/experiment.py

    def __hash__(self):
        """
        Hash of simulation(id).

        Returns:
            Hash of simulation
        """
        return id(self.uid)

simulation --> experiment?

2. docstr: true --> True

File: idmtools_core/idmtools/entities/experiment.py

    def list_assets(self, children: bool = False, platform: 'IPlatform' = None, filters: TFILE_FILTER_TYPE = None, **kwargs) -> List[Asset]:
        """
        List assets(shared files) for the experiment.

        Args:
            children: When set to true, simulation assets will be loaded as well
            platform: Optional platform to load assets list from
            filters: Filters to apply. These should be a function that takes a str and return true or false
            **kwargs:
        Returns:
            List of assets
        """

children: When set to true, simulation assets will be loaded as well

--->

children: When set to True, simulation assets will be loaded as well

3. DownloadCommand is not used in anywhere

Better to have a test for it.

File: idmtools_core/idmtools/utils/download.py

@dataclass
class DownloadCommand:
    """
    DownloadCommand defines an interface to make downloading and replicating remote experiments locally easy.
    """
    #: List of glob patterns. See https://docs.python.org/3.7/library/glob.html for details on the patterns
    file_patterns: List[str] = field(default_factory=list)
    ......

4. Need to call normalize_filters(...)?

File: idmtools_core/idmtools/utils/filters/asset_filters.py

By definition:

TFILE_FILTER_TYPE = Union[str, List[str], List[Callable[[str], bool]], Callable[[str], bool]]

If filter is type of str or List[str], the following line will fail:

file_filter(file.short_remote_path())

in

def apply_file_filters(assets: List['Asset'], filters: TFILE_FILTER_TYPE) -> List['Asset']:
    if filters:
        result = []
        for file in assets:
            for file_filter in filters:
                if file_filter(file.short_remote_path()):
                    result.append(file)
        return result
    else:
        return assets

Q: maybe we need to do the following first?

filters = normalize_filters(filters)

5. asset pass on all filters or just any one?

File: idmtools_core/idmtools/utils/filters/asset_filters.py

def apply_file_filters(assets: List['Asset'], filters: TFILE_FILTER_TYPE) -> List['Asset']:
    if filters:
        result = []
        for file in assets:
            for file_filter in filters:
                if file_filter(file.short_remote_path()):
                    result.append(file)
        return result
    else:
        return assets

Q: Pass all filters or just one?

If we follow 'any' logic, then

            for file_filter in filters:
                if file_filter(file.short_remote_path()):
                    result.append(file)

-->

            for file_filter in filters:
                if file_filter(file.short_remote_path()):
                    result.append(file)
                    break

Note: adding break to avoid duplicated files added to result.

If we follow 'all' logic, then

            for file_filter in filters:
                if file_filter(file.short_remote_path()):
                    result.append(file)

-->

            if all([file_filter(file.short_remote_path()) for file_filter in filters]):
                result.append(file)

6. What about the case: a.relative_path is None?

File: idmtools_platform_comps/idmtools_platform_comps/comps_operations/asset_collection_operations.py

    def to_entity(self, asset_collection: Union[COMPSAssetCollection, SimulationFile, List[SimulationFile], OutputFileMetadata, List[WorkItemFile]], **kwargs) \
            -> AssetCollection:
         
                ......
                if a.relative_path and a.relative_path == ".":
                    a.relative_path = ""

For the case a.relative_path is None, should we do the same?

           a.relative_path = ""

7. Need to update comment

There is a left over comment: # Send the assets for the experiment

File: idmtools_platform_comps/idmtools_platform_comps/comps_operations/experiment_operations.py

        # Send the assets for the experiment
        ac_id = self.send_assets(experiment)
        e.configuration._asset_collection_id = ac_id
        return e

    # Send the assets for the experiment ---> ? (or just remove the comment)

8. Need to update docstr: Returns

File: idmtools_platform_comps/idmtools_platform_comps/comps_operations/experiment_operations.py

    def send_assets(self, experiment: Experiment, **kwargs) -> UUID:
        """
        Send assets related to the experiment.

        Args:
            experiment: Experiment to send assets for

            **kwargs:

        Returns:
            None
        """

Need to update

        Returns:
            None

9. Missing parent in docstr Args

File: idmtools_platform_comps/idmtools_platform_comps/comps_operations/simulation_operations.py

    def get_asset_collection_from_comps_simulation(self, simulation: COMPSSimulation, parent: Experiment) -> Optional[AssetCollection]:
        """
        Get assets for a comps simulation.

        Args:
            simulation: Experiment to get asset collection for.

        Returns:
            AssetCollection if configuration is set and configuration.asset_collection_id is set.
        """

10. get_experiment_link is not called anywhere

Better to have a usage/test case.

File: idmtools_platform_comps/idmtools_platform_comps/comps_platform.py

    def get_experiment_link(self, experiment: Experiment):
        """
        Get a link to an experiment.

        Args:
            experiment: Experiment to generate link for

        Returns:
            Link to experiment
        """
        return f"{self.endpoint}/#explore/Experiments?filters=Id={experiment.uid}"

shchen-idmod · 2021-09-10T17:08:32Z

this line always fail in debug mode or run 2 platforms
https://github.com/shchen-idmod/idmtools/blob/2e5db55b645d28b06f48d7b4c1670e8effd16209/idmtools_platform_comps/tests/test_experiment_operations.py#L37


..\idmtools_core\idmtools\assets\asset.py:388: in download_to_path
    self.__write_download_generator_to_stream(out)
C:\idmtools_filelist_37\lib\site-packages\backoff\_sync.py:94: in retry
    ret = target(*args, **kwargs)
..\idmtools_core\idmtools\assets\asset.py:350: in __write_download_generator_to_stream
    gen = self.download_generator()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Asset: model1.py from c:\github\idmtools\idmtools_test\idmtools_test\inputs\python\model1.py>

    def download_generator(self) -> Generator[bytearray, None, None]:
        """
        A Download Generator that returns chunks of bytes from the file.
    
        Returns:
            Generator of bytearray
    
        Raises:
            ValueError - When there is not a download generator hook defined
    
        Notes:
            TODO - Add a custom error with doclink.
        """
        if not self.download_generator_hook:
>           raise ValueError("To be able to download, the Asset needs to be fetched from a platform object")
E           ValueError: To be able to download, the Asset needs to be fetched from a platform object

..\idmtools_core\idmtools\assets\asset.py:321: ValueError

shchen-idmod · 2021-09-16T17:59:52Z

pycomps 2.5.0 fix suite id issue and it is already in prod artifactory. please also update pycomps version.

Update/fix few tests in platform_comps

devclinton added 3 commits September 6, 2021 16:17

#1140 - File list operations implemented for COMPS

1d0442d

#1652 - COMPS Asset Collection return

Commit some slurm diagrams

56058f8

Scaffold of the new download command

934a7e6

Update old comps download command

devclinton requested review from shchen-idmod and ZDu-IDM September 7, 2021 15:33

devclinton added the Core label Sep 7, 2021

shchen-idmod requested changes Sep 7, 2021

View reviewed changes

shchen-idmod added 2 commits September 9, 2021 17:52

fix tests

a575650

fix test_cli.py

2e5db55

fix failed tests

f9a79d7

shchen-idmod and others added 4 commits September 17, 2021 11:10

update test

37645df

change platfrom from Slurm to Slurm2 in comps tests

d1db39c

Merge pull request #1628 from shchen-idmod/fix_test

ef11cde

Update/fix few tests in platform_comps

Slurm updates

9c7079a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#1140 - List Files #1626

#1140 - List Files #1626

devclinton commented Sep 7, 2021 •

edited

Loading

devclinton commented Sep 7, 2021 •

edited

Loading

shchen-idmod commented Sep 7, 2021

devclinton commented Sep 7, 2021

devclinton commented Sep 7, 2021 •

edited

Loading

shchen-idmod commented Sep 7, 2021

devclinton commented Sep 7, 2021

shchen-idmod Sep 7, 2021

devclinton Sep 8, 2021

shchen-idmod commented Sep 7, 2021

shchen-idmod commented Sep 7, 2021

shchen-idmod commented Sep 8, 2021

shchen-idmod commented Sep 8, 2021 •

edited

Loading

shchen-idmod commented Sep 8, 2021

devclinton commented Sep 8, 2021

ZDu-IDM commented Sep 9, 2021 •

edited

Loading

shchen-idmod commented Sep 10, 2021 •

edited

Loading

shchen-idmod commented Sep 16, 2021

#1140 - List Files #1626

Are you sure you want to change the base?

#1140 - List Files #1626

Conversation

devclinton commented Sep 7, 2021 • edited Loading

TODO add pytest-lazxfixture

TODO add ticket for support PathLink on download to path

TODO Add support for exclude filters to filter assets/files

TODO add ticket to add to local platform. Can be tackled outside of this PR

devclinton commented Sep 7, 2021 • edited Loading

shchen-idmod commented Sep 7, 2021

devclinton commented Sep 7, 2021

devclinton commented Sep 7, 2021 • edited Loading

shchen-idmod commented Sep 7, 2021

devclinton commented Sep 7, 2021

shchen-idmod Sep 7, 2021

Choose a reason for hiding this comment

devclinton Sep 8, 2021

Choose a reason for hiding this comment

shchen-idmod commented Sep 7, 2021

shchen-idmod commented Sep 7, 2021

shchen-idmod commented Sep 8, 2021

shchen-idmod commented Sep 8, 2021 • edited Loading

shchen-idmod commented Sep 8, 2021

devclinton commented Sep 8, 2021

ZDu-IDM commented Sep 9, 2021 • edited Loading

shchen-idmod commented Sep 10, 2021 • edited Loading

shchen-idmod commented Sep 16, 2021

devclinton commented Sep 7, 2021 •

edited

Loading

devclinton commented Sep 7, 2021 •

edited

Loading

devclinton commented Sep 7, 2021 •

edited

Loading

shchen-idmod commented Sep 8, 2021 •

edited

Loading

ZDu-IDM commented Sep 9, 2021 •

edited

Loading

shchen-idmod commented Sep 10, 2021 •

edited

Loading