Adding XGBoost on GPU quickstart materials #45

sfc-gh-ebotwick · 2024-07-23T22:26:45Z

No description provided.

sfc-gh-halu · 2024-07-30T17:29:52Z

XGBoost on GPU Quickstart/XGBoost_on_GPU_Quickstart.ipynb

+    "name": "Intro",
+    "collapsed": false
+   },
+   "source": "# GPU Based XGBoost Training\n## In the following notebook we will leverage Snowpark Container Services (SPCS) to run a notebook within Snowflake on a series of GPUs\n\n### * Workflow* \n- Inspect GPU resources available - for this exercise we will use four NVIDIA A10G GPUs\n- Load in data from Snowflake table\n- Set up data for modeling\n- Train two XGBoost models - one trained with CPUs and one leveraging our GPU cluster\n- Compare runtimes and results of our models\n\n\n### * Key Takeaways* \n- SPCS allows users to run notebook workloads that execute on containers, rather than virtual warehouses in Snowflake\n- GPUs can greatly speed up model training jobs 🔥\n- Bringing in third party python libraries offers flexibility to leverage great contirbutions to the OSS ecosystem\n\n\n### Note - In order to successfully run !pip installs make sure you have enabled the external access integration with pypi\n- Do so by clicking on the drop down of the 🟢 Active kernel settings button, clicking Edit Compute Settings, then turning on the PYPI_ACCESS_INTEGRATION radio button in the external access tab"


PYPI_ACCESS_INTEGRATION network integration is not a preset network integration and is created by account admin. So, better link to how it's done.

Done so in setup.sql

sfc-gh-halu · 2024-07-30T17:33:05Z

XGBoost on GPU Quickstart/XGBoost_on_GPU_Quickstart.ipynb

+    "codeCollapsed": false,
+    "collapsed": false
+   },
+   "source": "#Load in data from Snowflake table into a Snowpark dataframe\ntable = \"XGB_GPU_DATABASE.XGB_GPU_SCHEMA.VEHICLES_TABLE\"\ndf = session.table(table)\ndf.count(), len(df.columns)",


XGB_GPU_DATABASE. XGB_GPU_SCHEMA.VEHICLES_TABLE
would not exist in customer account by default.
Could we make it self-contained by having a cell downloading and generating the dataset in snowflake table? Could be in appendix to not disrupt the flow.

Done in setup.sql

sfc-gh-halu

Looks good overall.
Left two comments.

sfc-gh-halu · 2024-08-02T21:14:46Z

XGBoost on GPU Quickstart/setup.sql

+CREATE OR REPLACE DATABASE XGB_GPU_DATABASE;
+CREATE OR REPLACE SCHEMA XGB_GPU_SCHEMA;
+
+-- create external stage with the csv format to stage the dataset
+CREATE STAGE IF NOT EXISTS XGB_GPU_DATABASE.XGB_GPU_SCHEMA.VEHICLES
+    URL = 's3://sfquickstarts/misc/demos/vehicles.csv';


there's a mix of dataset set up SQLs and SPCS, network integration steups.

Could we consolidate data preparation in sequentially(create db/schema, create stage, create table, copyinto)? It would be easier for people to follow and selectively copy and apply for the setup. (e.g., for me who have other things set up, only need to apply data prep SQL)

sfc-gh-halu · 2024-08-02T21:55:46Z

XGBoost on GPU Quickstart/XGBoost_on_GPU_Quickstart.ipynb

+    "name": "model_training_takeaways",
+    "collapsed": false
+   },
+   "source": "## While results aren't entirely determinstic, you should have seen a 3-4x speedup in model training from CPU to GPU training. \n### Investigate in the logs from the two above cells where you see the message *[RayXGBoost] Finished XGBoost training* and look to the end of the line to see the pure training time for that model"


Do we want to keep "Investigate in the logs from the two above cells where you see the message [RayXGBoost] Finished XGBoost training and look to the end of the line to see the pure training time for that model"?

Adding XGBoost on GPU quickstart materials

375c292

sfc-gh-halu reviewed Jul 30, 2024

View reviewed changes

adding latest setup.sql file

70fe5d1

sfc-gh-halu reviewed Aug 2, 2024

View reviewed changes

sfc-gh-halu approved these changes Aug 2, 2024

View reviewed changes

sfc-gh-ebotwick added 2 commits August 26, 2024 12:32

updated notebook

d1da9c7

Updated notebook

7ace736

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding XGBoost on GPU quickstart materials #45

Adding XGBoost on GPU quickstart materials #45

sfc-gh-ebotwick commented Jul 23, 2024

sfc-gh-halu Jul 30, 2024

sfc-gh-ebotwick Jul 31, 2024

sfc-gh-halu Jul 30, 2024

sfc-gh-ebotwick Jul 31, 2024

sfc-gh-halu left a comment

sfc-gh-halu Aug 2, 2024

sfc-gh-halu Aug 2, 2024

Adding XGBoost on GPU quickstart materials #45

Are you sure you want to change the base?

Adding XGBoost on GPU quickstart materials #45

Conversation

sfc-gh-ebotwick commented Jul 23, 2024

sfc-gh-halu Jul 30, 2024

Choose a reason for hiding this comment

sfc-gh-ebotwick Jul 31, 2024

Choose a reason for hiding this comment

sfc-gh-halu Jul 30, 2024

Choose a reason for hiding this comment

sfc-gh-ebotwick Jul 31, 2024

Choose a reason for hiding this comment

sfc-gh-halu left a comment

Choose a reason for hiding this comment

sfc-gh-halu Aug 2, 2024

Choose a reason for hiding this comment

sfc-gh-halu Aug 2, 2024

Choose a reason for hiding this comment