Skip to content

Latest commit

 

History

History
233 lines (196 loc) · 10.2 KB

dbconnection.md

File metadata and controls

233 lines (196 loc) · 10.2 KB

DbConnection class

DbConnection class abstracts the connectivity to SAP Datasphere, data query, data fetch and data load thereby reducing complexity and efficiently sourcing the data for the Hyperscaler Machine Learning environments with just couple lines of code.

Pre-requisite:

config.json file present in main path

If using fedml-databricks: SAP Datasphere config stored in a secure manner in the form of Databricks Secret.

SAP Datasphere config.json

{

    "address":  <The IP address or host name of the database instance. Required. String>,
    "port": <The port number of the database instance. Required>,
    "user": <The database user. Required>,
    "password": <The database user's password. Required>,
    "schema": <The SAP Datasphere cloud Space Schema. Optional>,
    "encrypt": <"true" . Denotes an encrypted connection>,
    "sslValidateCertificate": <"false" . Specifies whether to validate the server's certificate>,
    "disableCloudRedirect": < "true". Specifies if there should be a tenant redirection for a cloud instance,
    "communicationTimeout": <"0". Value of 0 Disables any communication Timeouts>,
    "autocommit": <"true". Sets auto commit to true for the database connection>,
    "sslUseDefaultTrustStore": <"true". Denotes the use of client's default trust store>
}

Setting up your SAP Datasphere config.json

  1. Navigate to the space management inside SAP Datasphere

space management


2. Navigate to database users.

databaseusers
3. Create a user if you don't already have one with Read and Write privileges.

createuser
4. Click on the i icon.

userprivileges

Here you will find the following information for your SAP Datasphere config.json:

  • Database User Name --> user
  • Space Schema --> schema
  • Host Name --> address
  • Port --> port
  • Password --> password

5. If you're using fedml-databricks: Store the SAP Datasphere config in the form of Databricks secret. Use the SAP Datasphere config stored as a Databricks secret in the notebook to connect to SAP Datasphere.

Constructor

DBConnection('url=None', 'dict_obj=None')
Parameters:
'url': (str): The url path of where to find the SAP Datasphere config file (config.json)
'dict_obj': (str): The SAP Datasphere config in the form of a dictionary object.
Examples:
db = DbConnection(url='/dbfs/FileStore/config.json')
db = DbConnection(dict_obj=config_object)

Class Methods

  1. get_schema_views():
    Returns the list of all view names in the space schema.
    Parameters:
    None
    Example:
    db.get_schema_views()

  2. get_table_size('table_name'):
    Returns the count of rows in the existing schema object
    Parameter:
    'table_name' (str): The name of the table.
    Example:
    db.get_table_size('TITANIC_VIEW')

  3. get_user_tables():
    Returns the list of all the tables in the user schema.
    Parameters:
    None
    Example:
    db.get_user_tables()

  4. get_data_with_headers('table_name','size=1'):
    Returns the data fetched from schema view as a list <list of rows, list of column headers>.
    Parameters:
    'table_name' (str): The name of the table.
    'size' (float): Number of rows to fetch from the schema view. For example, size=1 fetches all the rows of the view, size=0.2 fetches 20% of the rows in the view.
    Example:
    db.get_data_with_headers(table_name='IRIS_VIEW', size=1)

  5. get_data_with_headers_pyspark('table_name','size=1'):
    This function is only supported for FedML Databricks and FedML DSP. Returns the data fetched from schema view as a PySpark DataFrame.
    Parameters:
    'table_name' (str): The name of the table.
    'size' (float): Number of rows to fetch from the schema view. For example, size=1 fetches all the rows of the view, size=0.2 fetches 20% of the rows in the view.
    Example:
    db.get_data_with_headers_pyspark(table_name='IRIS_VIEW', size=1)

  6. get_view_metadata('view_name'):
    Returns the metadata of the view.
    Parameter:
    'view_name' (str): The view name.
    Example:
    db.get_view_metadata('test_view')

  7. get_table_metadata('table_name'):
    Returns the metadata of the table.
    Parameter:
    'table_name' (str): The table name.
    Example:
    db.get_table_metadata('test_table')

  8. get_view_by_name('view_name'):
    Searches for views with name similar to the parameter 'view_name' and returns it.
    Parameter:
    'view_name' (str): The view name.
    Example:
    db.get_view_by_name('test_view')

  9. execute_query('query'):
    Executes the SQL Query and returns the data fetched as a list <list of rows, list of column headers>.
    Parameter:
    'query' (str): The SQL query to execute
    Example:
    db.execute_query('SELECT * FROM \"FEDMLTEST\".\"iris_view\"')

  10. execute_query_pyspark('query'):
    This function is only supported for FedML Databricks and FedML DSP. Executes the SQL Query and returns the data fetched as a PySpark DataFrame.
    Parameter:
    'query' (str): The SQL query to execute
    Example:
    db.execute_query_pyspark('SELECT * FROM \"FEDMLTEST\".\"iris_view\"')

  11. create_table('query'):
    Creates a table in SAP Datasphere.
    Please note this function will create a default column called INSERTED_AT in the table specified. This column will keep track of the timestamp at which you inserted data into the table for the first time.
    Please note this will create the table in SAP Datasphere with all caps.(i.e. providing table1 will result in TABLE1 in SAP DataSphere) So any references to a table after using this function to create it, must also be in all caps.
    Parameter:
    'query' (str): The SQL query to create a table.
    Example:
    db.create_table("CREATE TABLE T6 (ID INTEGER PRIMARY KEY, C2 VARCHAR(255))")

  12. drop_table('table_name'):
    Drops table specified.
    Please note this function only deletes the DB Users table. If this table was deployed in SAP Datasphere and/or has a view attached, you will need to delete those manually in SAP Datasphere.
    Parameter:
    'table_name' (str): The table name.
    Example:
    db.drop_table('T6')

  13. alter_table('table_name','clause'):
    Alters the table in user schema.
    Parameter:
    'table_name' (str): The table name.
    'clause' (str): The condition for alter.
    Example:
    db.alter_table('test_table','ADD (COL1 VARCHAR(20))')

  14. insert_into_table('table_name', 'table_values'):
    Inserts data into the table specified.
    Please note this function will insert the current timestamp into a column called INSERTED_AT in the table specified.
    Parameter:
    'table_name' (str): The table name.
    'table_values' (Pandas DataFrame): The data to insert.
    Example:
    sample_df = pd.DataFrame([[6,'hey6'],[7,'bye7']], columns=['id', 'c2'])
    db.insert_into_table('T6', sample_df)

  15. delete_from_table('table_name','where_clause=None'):
    Deletes the selected rows from the table.
    Note: Deletes all the rows if 'where_clause' is not specified.
    Parameter:
    'table_name' (str): The table name.
    'where_clause' (str): The where condition for delete.
    Example:
    db.delete_from_table('test_table','col_a=10')

  16. update_table('table_name','set_clause','where_clause=None'):
    Updates the selected rows in the table.
    Note: Updates all the rows if 'where_clause' is not specified.
    Parameter:
    'table_name' (str): The table name.
    'set_clause' (str): The set condition for update.
    'where_clause' (str): The where condition for update.
    Example:
    db.update_table('test_table','col_a=10','col_b=20')

  17. get_data_with_headers_cudf('table_name','size=1'):
    This function is only supported for FedML DSP.

Returns the data fetched from schema view as a GPU DataFrame.
Parameters:
'table_name' (str): The name of the table.
'size' (float): Number of rows to fetch from the schema view. For example, size=1 fetches all the rows of the view, size=0.2 fetches 20% of the rows in the view.
Example:
db.get_data_with_headers_cudf(table_name='IRIS_VIEW', size=1)

  1. execute_query_cudf('query'):
    This function is only supported for FedML DSP.

Executes the SQL Query and returns the data fetched as a GPU DataFrame.
Parameter:
'query' (str): The SQL query to execute
Example:
db.execute_query_cudf('SELECT * FROM \"FEDMLTEST\".\"iris_view\"')