mltable produces key error when trying to consume sdk v1 dataset type data with provided microsoft consume code #38944

Bartcardi · 2024-12-19T13:18:43Z

Package Name: mltable
Package Version: 1.6.1
Operating System: Ubuntu 20.04
Python Version: 3.10.14

Describe the bug
While trying to consume a data asset from azure machine learning studio with table type but with underlying dataset type tabular (see attached image under screenshots) using the microsoft supplied example code for reading this asset into a pandas dataframe via an mltable object, we encounter a KeyError with key paths missing as shown below in the error trace.

Full error trace

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[3], line 8
      5 ml_client = MLClient.from_config(credential=DefaultAzureCredential())
      6 data_asset = ml_client.data.get("Energie_Aansluitingen_Current_1000", version="1")
----> 8 tbl = mltable.load(f'azureml:/{data_asset.id}')
     10 df = tbl.to_pandas_dataframe()
     11 df

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azureml/dataprep/api/_loggerfactory.py:279, in track.<locals>.monitor.<locals>.wrapper(*args, **kwargs)
    277 with _LoggerFactory.track_activity(logger, func.__name__, activity_type, custom_dimensions) as activityLogger:
    278     try:
--> 279         return func(*args, **kwargs)
    280     except Exception as e:
    281         if hasattr(activityLogger, ACTIVITY_INFO_KEY) and hasattr(e, ERROR_CODE_KEY):

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/mltable/mltable.py:600, in load(uri, storage_options, ml_client)
    547 @track(_get_logger,activity_type=_PUBLIC_API, custom_dimensions={'app_name': _APP_NAME})
    548 def load(uri, storage_options: dict = None, ml_client= None):
    549     """
    550     Loads the MLTable file (YAML) present at the given uri.
    551 
   (...)
    598     :rtype: mltable.MLTable
    599     """
--> 600     return _load(uri, storage_options, True, ml_client)

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azureml/dataprep/api/_loggerfactory.py:279, in track.<locals>.monitor.<locals>.wrapper(*args, **kwargs)
    277 with _LoggerFactory.track_activity(logger, func.__name__, activity_type, custom_dimensions) as activityLogger:
    278     try:
--> 279         return func(*args, **kwargs)
    280     except Exception as e:
    281         if hasattr(activityLogger, ACTIVITY_INFO_KEY) and hasattr(e, ERROR_CODE_KEY):

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/mltable/mltable.py:706, in _load(uri, storage_options, enable_validate, ml_client)
    704     return mltable_loaded
    705 except Exception as ex:
--> 706     _reclassify_rslex_error(ex)

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azureml/dataprep/api/mltable/_validation_and_error_handler.py:90, in _reclassify_rslex_error(err)
     87 if 'ExecutionError(StreamError(PermissionDenied' in err_msg:
     88     raise UserErrorException(
     89         f'Getting permission error please make sure proper access is configured on storage: {err_msg}')
---> 90 raise err

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/mltable/mltable.py:698, in _load(uri, storage_options, enable_validate, ml_client)
    696 # v1 sql dataset doesnt have paths
    697 if og_path_pairs is None:  # may have been set in _load_mltable_from_data_asset_uri
--> 698     mltable_dict, og_path_pairs = _make_all_paths_absolute(mltable_dict, base_path)
    699 mltable_loaded = MLTable._create_from_dict(mltable_yaml_dict=mltable_dict,
    700                                             path_pairs=og_path_pairs,
    701                                             load_uri=load_uri)
    702 mltable_loaded._workspace_context = _parse_workspace_context_from_longform_uri(load_uri)

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/mltable/_utils.py:74, in _make_all_paths_absolute(mltable_yaml_dict, base_path)
     72     mltable_yaml_dict[_PATHS_KEY] = list(map(lambda x: x[1], path_pairs))
     73 else:
---> 74     path_pairs = list(tuple(zip(mltable_yaml_dict[_PATHS_KEY], mltable_yaml_dict[_PATHS_KEY])))
     75 return mltable_yaml_dict, path_pairs

KeyError: 'paths'

To Reproduce

Setup an Azure SQL database type datastore in azure ml studio.
Create a data asset from the datastore using a sql statement and make sure it can connect and has data.
Try to consume the data asset for interactive development using the supplied microsoft snippet in the data asset section on azure ml (see second screenshot)

Expected behavior
I expected to end up with a pandas dataframe.

Screenshots

Screenshot of the data asset in azure ml

Screenshot of the consume code snippet

Screenshot of the error trace

Additional context
We run this code on a Standard_DS12_v2 (4 cores, 28 GB RAM, 56 GB disk) compute instance with:

azure-ai-ml==1.23.0
azure-identity==1.18.0

The text was updated successfully, but these errors were encountered:

github-actions · 2024-12-19T21:08:16Z

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

Bartcardi · 2024-12-24T08:10:06Z

By the way this is what mltable_yaml_dict looks like in our case:

mltable_yaml_dict = {
    "query_source": {
        "handler": "AmlDatastore",
        "query": "SELECT TOP (1000) [Extern_Energie_Aansluitingen_HashKey] ,[EAN] ,[Product] ,[Status] ,[Locatie] ,[Bouwdeel] ,[Adres] ,[Postcode] ,[Plaats] ,[Segment] ,[GTV] ,[GeldigVan] ,[GeldigTm] ,[ETLLoaddate] ,[ETLRecordValidFrom] FROM [history].[tbl_Extern_Energie_Aansluitingen_current]",
        "handler_arguments": {
            "subscription": "<SUBSCRIPTION_ID>",
            "resource_group": "<RESOURCE_GROUP>",
            "workspace_name": "<WORKSPACE_NAME>",
            "datastore_name": "zorgcontrol",
        },
    },
    "transformations": [
        {
            "convert_column_types": [
                {
                    "columns": "Extern_Energie_Aansluitingen_HashKey",
                    "column_type": "string",
                },
                {"columns": "EAN", "column_type": "string"},
                {"columns": "Product", "column_type": "string"},
                {"columns": "Status", "column_type": "string"},
                {"columns": "Locatie", "column_type": "string"},
                {"columns": "Bouwdeel", "column_type": "string"},
                {"columns": "Adres", "column_type": "string"},
                {"columns": "Postcode", "column_type": "string"},
                {"columns": "Plaats", "column_type": "string"},
                {"columns": "Segment", "column_type": "string"},
                {"columns": "GTV", "column_type": "string"},
            ]
        }
    ],
}

bhathiya-pilanawithana · 2024-12-25T03:30:24Z

Encountered the same problem. In addition to the mentioned points, the code snippet given for consuming data in SDK V1 works fine for the same scenario, only the SDK V2 code snippet has the issue (at least for my case).

github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Dec 19, 2024

achauhan-scc assigned vivram Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mltable produces key error when trying to consume sdk v1 dataset type data with provided microsoft consume code #38944

mltable produces key error when trying to consume sdk v1 dataset type data with provided microsoft consume code #38944

Bartcardi commented Dec 19, 2024

github-actions bot commented Dec 19, 2024

Bartcardi commented Dec 24, 2024 •

edited

Loading

bhathiya-pilanawithana commented Dec 25, 2024 •

edited

Loading

mltable produces key error when trying to consume sdk v1 dataset type data with provided microsoft consume code #38944

mltable produces key error when trying to consume sdk v1 dataset type data with provided microsoft consume code #38944

Comments

Bartcardi commented Dec 19, 2024

github-actions bot commented Dec 19, 2024

Bartcardi commented Dec 24, 2024 • edited Loading

bhathiya-pilanawithana commented Dec 25, 2024 • edited Loading

Bartcardi commented Dec 24, 2024 •

edited

Loading

bhathiya-pilanawithana commented Dec 25, 2024 •

edited

Loading