Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport paperspace notebook changes #473

Merged
merged 5 commits into from
Aug 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions notebooks/deberta-blog-notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"\n",
"Hardware requirements: The models show each DeBERTa Base model running on two IPUs. If correctly configured, these models could both be served simultaneously on an IPU POD4.\n",
"\n",
"[![Run on Gradient](images/gradient-badge.svg)](https://console.paperspace.com/github/<runtime-repo>?machine=Free-IPU-POD4&container=<dockerhub-image>&file=<path-to-file-in-repo>) [![Join our Slack Community](https://img.shields.io/badge/Slack-Join%20Graphcore's%20Community-blue?style=flat-square&logo=slack)](https://www.graphcore.ai/join-community)"
"[![Join our Slack Community](https://img.shields.io/badge/Slack-Join%20Graphcore's%20Community-blue?style=flat-square&logo=slack)](https://www.graphcore.ai/join-community)"
]
},
{
Expand Down Expand Up @@ -52,6 +52,22 @@
"This method is demoed in this notebook, as Huggingface do not natively support the MNLI inference task."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f62fc6b6",
"metadata": {},
"source": [
"In order to improve usability and support for future users, Graphcore would like to collect information about the\n",
"applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:\n",
"\n",
"- User progression through the notebook\n",
"- Notebook details: number of cells, code being run and the output of the cells\n",
"- Environment details\n",
"\n",
"You can disable logging at any time by running `%unload_ext graphcore_cloud_tools.notebook_logging.gc_logger` from any cell."
]
},
{
"attachments": {},
"cell_type": "markdown",
Expand All @@ -69,7 +85,9 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install \"optimum-graphcore>=0.6, <0.7\""
"%pip install \"optimum-graphcore==0.7\"\n",
"%pip install graphcore-cloud-tools[logger]@git+https://github.com/graphcore/graphcore-cloud-tools\n",
"%load_ext graphcore_cloud_tools.notebook_logging.gc_logger"
]
},
{
Expand Down
102 changes: 70 additions & 32 deletions notebooks/external_model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -41,7 +42,7 @@
"\n",
"The best way to run this demo is on Paperspace Gradient's cloud IPUs because everything is already set up for you.\n",
"\n",
"[![Run on Gradient](images/gradient-badge.svg)](https://ipu.dev/3xwTmHM)\n",
"[![Run on Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://ipu.dev/3xwTmHM)\n",
"\n",
"To run the demo using other IPU hardware, you need to have the Poplar SDK enabled. Refer to the [Getting Started guide](https://docs.graphcore.ai/en/latest/getting-started.html#getting-started) for your system for details on how to enable the Poplar SDK. Also refer to the [Jupyter Quick Start guide](https://docs.graphcore.ai/projects/jupyter-notebook-quick-start/en/latest/index.html) for how to set up Jupyter to be able to run this notebook on a remote IPU machine."
]
Expand All @@ -51,10 +52,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dependencies and configuration"
"## Dependencies and configuration\n",
"\n",
"In order to improve usability and support for future users, Graphcore would like to collect information about the\n",
"applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:\n",
"\n",
"- User progression through the notebook\n",
"- Notebook details: number of cells, code being run and the output of the cells\n",
"- Environment details\n",
"\n",
"You can disable logging at any time by running `%unload_ext graphcore_cloud_tools.notebook_logging.gc_logger` from any cell."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -64,33 +75,39 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install \"optimum-graphcore>=0.5, <0.6\""
"%pip install \"optimum-graphcore==0.7\"\n",
"%pip install graphcore-cloud-tools[logger]@git+https://github.com/graphcore/graphcore-cloud-tools\n",
"%load_ext graphcore_cloud_tools.notebook_logging.gc_logger"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Values for machine size and cache directories can be configured through environment variables or directly in the notebook:"
"The cache directories can be configured through environment variables or directly in the notebook:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"\n",
"n_ipu = int(os.getenv(\"NUM_AVAILABLE_IPU\", 4))\n",
"executable_cache_dir = os.getenv(\"POPLAR_EXECUTABLE_CACHE_DIR\", \"/tmp/exe_cache/\") + \"/external_model\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "1r_n9OWV3l-Q"
Expand All @@ -114,7 +131,8 @@
"execution_count": null,
"metadata": {
"id": "n2ZRs1cL3l-R",
"outputId": "11151c56-be90-4d11-e7df-db85e745ca5c"
"outputId": "11151c56-be90-4d11-e7df-db85e745ca5c",
"tags": []
},
"outputs": [],
"source": [
Expand Down Expand Up @@ -146,7 +164,8 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "iAYlS40Z3l-v"
"id": "iAYlS40Z3l-v",
"tags": []
},
"outputs": [],
"source": [
Expand All @@ -170,7 +189,8 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lS2m25YM3l-z"
"id": "lS2m25YM3l-z",
"tags": []
},
"outputs": [],
"source": [
Expand All @@ -194,7 +214,8 @@
"metadata": {
"id": "NVAO0H8u3l-3",
"outputId": "30d88b8a-e353-4e13-f709-8e5e06ef747b",
"scrolled": true
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
Expand All @@ -215,7 +236,8 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "DVHs5aCA3l-_"
"id": "DVHs5aCA3l-_",
"tags": []
},
"outputs": [],
"source": [
Expand All @@ -236,7 +258,8 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "iaAJy5Hu3l_B"
"id": "iaAJy5Hu3l_B",
"tags": []
},
"outputs": [],
"source": [
Expand Down Expand Up @@ -272,7 +295,8 @@
"metadata": {
"id": "gXUSfBrq3l_C",
"outputId": "34e55885-3d8f-4f05-cbdb-706ce56a25f8",
"scrolled": true
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
Expand All @@ -285,6 +309,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -294,7 +319,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import torch\n",
Expand Down Expand Up @@ -333,7 +360,7 @@
"\n",
" def _generate_square_subsequent_mask(self, sz):\n",
" mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1)\n",
" mask = mask.float().masked_fill(mask == 0, -10000.0).masked_fill(mask == 1, float(0.0))\n",
" mask = mask.half().masked_fill(mask == 0, -10000.0).masked_fill(mask == 1, float(0.0))\n",
" return mask\n",
"\n",
" def init_weights(self):\n",
Expand Down Expand Up @@ -373,16 +400,17 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import poptorch\n",
"from optimum.graphcore.modeling_utils import PipelineMixin, get_layer_ipu, recomputation_checkpoint, register, tied_weight_model\n",
"from optimum.graphcore.modeling_utils import PipelineMixin, get_layer_ipu, recomputation_checkpoint, register\n",
"from optimum.utils import logging\n",
"logger = logging.get_logger(__name__)\n",
"\n",
"\n",
"@tied_weight_model()\n",
"class IPUTransformerModel(TransformerModel, PipelineMixin):\n",
" def parallelize(self):\n",
" super().parallelize()\n",
Expand All @@ -391,7 +419,7 @@
" self.word_embeddings = poptorch.BeginBlock(self.word_embeddings, \"word_embeddings\", ipu_id=0)\n",
" self.position_embeddings = poptorch.BeginBlock(self.position_embeddings, \"position_embeddings\", ipu_id=0)\n",
"\n",
" layer_ipu = get_layer_ipu(self.ipu_config.layers_per_ipu, self.transformer_encoder.layers)\n",
" layer_ipu = get_layer_ipu(self.ipu_config, self.transformer_encoder.layers)\n",
" for index, layer in enumerate(self.transformer_encoder.layers):\n",
" if self.ipu_config.recompute_checkpoint_every_layer:\n",
" # Put checkpoints on every encoder layer\n",
Expand Down Expand Up @@ -433,7 +461,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"model = IPUTransformerModel(\n",
Expand All @@ -459,7 +489,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from optimum.graphcore import IPUConfig, IPUTrainer, IPUTrainingArguments\n",
Expand All @@ -473,8 +505,6 @@
" \"enable_half_partials\": True,\n",
" \"device_iterations\": 1, \n",
" \"inference_device_iterations\": 5,\n",
" \"replication_factor\": {\"pod4\": 1, \"pod8\": 2, \"pod16\": 4, \"pod32\": 8, \"pod64\": 16, \"default\": 1},\n",
" \"inference_replication_factor\": {\"pod4\": 1, \"pod8\": 2, \"pod16\": 4, \"pod32\": 8, \"pod64\": 16, \"default\": 1},\n",
" \"gradient_accumulation_steps\": 512,\n",
" \"executable_cache_dir\": executable_cache_dir,\n",
" \"ipus_per_replica\": 4,\n",
Expand All @@ -496,7 +526,8 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YbSwEhQ63l_L"
"id": "YbSwEhQ63l_L",
"tags": []
},
"outputs": [],
"source": [
Expand All @@ -510,7 +541,7 @@
" per_device_train_batch_size=micro_batch_size,\n",
" per_device_eval_batch_size=micro_batch_size,\n",
" gradient_accumulation_steps=gradient_accumulation_steps,\n",
" n_ipu=n_ipu,\n",
" n_ipu=4,\n",
" num_train_epochs=10,\n",
" loss_scaling=16384,\n",
" warmup_ratio=0.1,\n",
Expand All @@ -535,7 +566,8 @@
"execution_count": null,
"metadata": {
"id": "OEuqwIra3l_N",
"scrolled": true
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
Expand All @@ -549,6 +581,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "6Vvz34Td3l_O"
Expand All @@ -562,14 +595,16 @@
"execution_count": null,
"metadata": {
"id": "NyZvu_MF3l_P",
"outputId": "b69d0931-7f1f-4f2d-fdb8-09d37c7418bb"
"outputId": "b69d0931-7f1f-4f2d-fdb8-09d37c7418bb",
"tags": []
},
"outputs": [],
"source": [
"trainer.train()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "3APq-vUc3l_R"
Expand All @@ -583,7 +618,8 @@
"execution_count": null,
"metadata": {
"id": "diKZnB1I3l_R",
"outputId": "9b3ac725-0117-4830-f380-a555ee57c8cf"
"outputId": "9b3ac725-0117-4830-f380-a555ee57c8cf",
"tags": []
},
"outputs": [],
"source": [
Expand All @@ -605,7 +641,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"trainer.train(resume_from_checkpoint='mymodel-wikitext2/checkpoint-500')"
Expand All @@ -628,7 +666,7 @@
"provenance": []
},
"kernelspec": {
"display_name": "Python 3.8.10 64-bit",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -651,5 +689,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 1
"nbformat_minor": 4
}
Loading