Skip to content

Commit

Permalink
Changed notebooks to support backup use case also
Browse files Browse the repository at this point in the history
  • Loading branch information
kesarwam committed Sep 5, 2024
1 parent dbdd10d commit 85b9c17
Show file tree
Hide file tree
Showing 9 changed files with 123 additions and 35 deletions.
2 changes: 1 addition & 1 deletion 00_notebooks/00_index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@
"* [*Labelbox* integration](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/labelbox-integration/)\n",
"* [*Kafka* integration](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/kafka/)\n",
"* [*Flink* integration](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/flink/)\n",
"* [How to **migrate or clone** a repo](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/migrate-or-clone-repo/)"
"* [How to **backup, migrate or clone** a repo](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/backup-migrate-or-clone-repo/)"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
"id": "c4e81663-80e6-4d27-b5d8-788660b20453",
"metadata": {},
"source": [
"# Migrate or clone a lakeFS repository on AWS\n",
"# Backup, migrate or clone a lakeFS repository on AWS\n",
"\n",
"#### Use this notebook if you want to migrate/clone a source repository to a target repository within the same lakeFS environment or in different lakeFS environments"
"#### Use this notebook if you want to backup & restore/migrate/clone a source repository to a target repository within the same lakeFS environment or in different lakeFS environments"
]
},
{
Expand Down Expand Up @@ -323,7 +323,8 @@
"id": "d174ba40-7a8b-428a-94f8-868a2cb5fecc",
"metadata": {},
"source": [
"# Step 2 - Dump Metadata of Source Repository"
"# Step 2 - Dump Metadata of Source Repository\n",
"### IMPORTANT: Shutdown lakeFS services immediately after dumping the metadata so nobody can make any changes in the source repository"
]
},
{
Expand All @@ -341,7 +342,8 @@
"id": "e789b515-2333-4d45-879c-130afbd8ef85",
"metadata": {},
"source": [
"# Step 3 - Copy Data from Source to Target"
"# Step 3 - Copy Data from Source to Target\n",
"### You can restart lakeFS services after copying the data from source to target"
]
},
{
Expand All @@ -362,7 +364,9 @@
"id": "20314a68-533e-45f3-bd09-ac6be17a34cc",
"metadata": {},
"source": [
"## Step 4 - Create Target Bare Repository"
"## Step 4 - Create Target Bare Repository\n",
"\n",
"#### IMPORTANT: For Backup & Restore process, run this step only when you want to restore the repository"
]
},
{
Expand All @@ -380,7 +384,9 @@
"id": "95570b12-b39d-41d7-852c-f09dc4b05bdf",
"metadata": {},
"source": [
"## Step 5 - Restore Metadata to Target Repository"
"## Step 5 - Restore Metadata to Target Repository\n",
"\n",
"#### IMPORTANT: For Backup & Restore process, run this step only when you want to restore the repository"
]
},
{
Expand All @@ -398,7 +404,7 @@
"metadata": {},
"outputs": [],
"source": [
"s3DownloadRefsManifestFileCommand = 'aws s3 cp ' + target_storage_namespace + '/' + source_repo_name + '/_lakefs/refs_manifest.json .'\n",
"s3DownloadRefsManifestFileCommand = 'aws s3 cp ' + target_storage_namespace + '/_lakefs/refs_manifest.json .'\n",
"! $s3DownloadRefsManifestFileCommand"
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
"id": "c4e81663-80e6-4d27-b5d8-788660b20453",
"metadata": {},
"source": [
"# Migrate or clone a lakeFS repository on Azure\n",
"# Backup, migrate or clone a lakeFS repository on AWS\n",
"\n",
"#### Use this notebook if you want to migrate/clone a source repository to a target repository within the same lakeFS environment or in different lakeFS environments"
"#### Use this notebook if you want to backup & restore/migrate/clone a source repository to a target repository within the same lakeFS environment or in different lakeFS environments"
]
},
{
Expand Down Expand Up @@ -44,7 +44,8 @@
"from lakefs_sdk.client import LakeFSClient\n",
"import random\n",
"import os\n",
"import datetime"
"import datetime\n",
"import json"
]
},
{
Expand Down Expand Up @@ -281,7 +282,8 @@
"source": [
"for branchList in sourceRepo.branches():\n",
" for diff in sourceRepo.branch(branchList.id).uncommitted():\n",
" print('Branch with uncommitted data: ' + branchList.id)"
" print('Branch with uncommitted data: ' + branchList.id)\n",
" break"
]
},
{
Expand All @@ -305,15 +307,17 @@
"for branchList in sourceRepo.branches():\n",
" for diff in sourceRepo.branch(branchList.id).uncommitted():\n",
" ref = sourceRepo.branch(branchList.id).commit(message='Committed changes during the migration of the repository')\n",
" print(ref.get_commit())"
" print(ref.get_commit())\n",
" break"
]
},
{
"cell_type": "markdown",
"id": "998b31bd-87d8-4e5d-9fa9-42c3f2bf920e",
"metadata": {},
"source": [
"# Step 2 - Dump Metadata of Source Repository"
"# Step 2 - Dump Metadata of Source Repository\n",
"### IMPORTANT: Shutdown lakeFS services immediately after dumping the metadata so nobody can make any changes in the source repository"
]
},
{
Expand All @@ -331,7 +335,8 @@
"id": "e789b515-2333-4d45-879c-130afbd8ef85",
"metadata": {},
"source": [
"# Step 3 - Copy Data from Source to Target"
"# Step 3 - Copy Data from Source to Target\n",
"### You can restart lakeFS services after copying the data from source to target"
]
},
{
Expand All @@ -352,7 +357,9 @@
"id": "20314a68-533e-45f3-bd09-ac6be17a34cc",
"metadata": {},
"source": [
"## Step 4 - Create Target Bare Repository"
"## Step 4 - Create Target Bare Repository\n",
"\n",
"#### IMPORTANT: For Backup & Restore process, run this step only when you want to restore the repository"
]
},
{
Expand All @@ -370,17 +377,51 @@
"id": "95570b12-b39d-41d7-852c-f09dc4b05bdf",
"metadata": {},
"source": [
"## Step 5 - Restore Metadata to Target Repository"
"## Step 5 - Restore Metadata to Target Repository\n",
"\n",
"#### IMPORTANT: For Backup & Restore process, run this step only when you want to restore the repository"
]
},
{
"cell_type": "markdown",
"id": "bbd2617d-8605-43f6-b8cd-b9d6c5cb039c",
"metadata": {},
"source": [
"### Download metadata(refs_manifest.json) file created by \"Step 2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d61ff81e-3809-4351-b695-0d42a7de74dd",
"metadata": {},
"outputs": [],
"source": [
"azureDownloadRefsManifestFileCommand = \"azcopy copy '\" + target_storage_namespace + \"/_lakefs/refs_manifest.json?\" + target_container_SAS_token + \"' .\"\n",
"\n",
"! $azureDownloadRefsManifestFileCommand"
]
},
{
"cell_type": "markdown",
"id": "b520223e-8c7b-495b-b611-5a2ecd0f32a4",
"metadata": {},
"source": [
"### Read refs_manifest.json file and restore metadata to new repository"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cef6e90a-fd2d-42ff-b1ae-053a619ec560",
"id": "44e850ef-3126-4aec-9000-3a915892c329",
"metadata": {},
"outputs": [],
"source": [
"target_lakefs_sdk_client.internal_api.restore_refs(target_repo_name, source_lakefs_sdk_client.internal_api.dump_refs(source_repo_name))"
"with open('./refs_manifest.json') as file:\n",
" refs_manifest_json = json.load(file)\n",
" print(refs_manifest_json)\n",
" \n",
"target_lakefs_sdk_client.internal_api.restore_refs(target_repo_name, refs_manifest_json)"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,8 @@
"id": "89dcf103-090d-43a2-bb60-080a0dbe828d",
"metadata": {},
"source": [
"# Step 2 - Dump Metadata of Source Repository"
"# Step 2 - Dump Metadata of Source Repository\n",
"### IMPORTANT: Shutdown lakeFS services immediately after dumping the metadata so nobody can make any changes in the source repository"
]
},
{
Expand Down Expand Up @@ -349,7 +350,9 @@
"source": [
"#### You can directly copy data from local storage to target storage on your own\n",
"#### or you can run following printed command on your local machine to copy data from local Docker container to local machine first\n",
"#### (change the Docker container name for lakeFS and go to the folder where you cloned lakefs-samples Git repo before running the command)"
"#### (change the Docker container name for lakeFS and go to the folder where you cloned lakefs-samples Git repo before running the command)\n",
"\n",
"#### You can restart lakeFS services after copying the data from source to target"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@
"from lakefs_sdk.client import LakeFSClient\n",
"import random\n",
"import os\n",
"import datetime"
"import datetime\n",
"import json"
]
},
{
Expand Down Expand Up @@ -267,7 +268,8 @@
"source": [
"for branchList in sourceRepo.branches():\n",
" for diff in sourceRepo.branch(branchList.id).uncommitted():\n",
" print('Branch with uncommitted data: ' + branchList.id)"
" print('Branch with uncommitted data: ' + branchList.id)\n",
" break"
]
},
{
Expand All @@ -291,15 +293,17 @@
"for branchList in sourceRepo.branches():\n",
" for diff in sourceRepo.branch(branchList.id).uncommitted():\n",
" ref = sourceRepo.branch(branchList.id).commit(message='Committed changes during the migration of the repository')\n",
" print(ref.get_commit())"
" print(ref.get_commit())\n",
" break"
]
},
{
"cell_type": "markdown",
"id": "dc52914e-6ee2-4bf6-99b4-4e254ad38133",
"metadata": {},
"source": [
"# Step 2 - Dump Metadata of Source Repository"
"# Step 2 - Dump Metadata of Source Repository\n",
"### IMPORTANT: Shutdown lakeFS services immediately after dumping the metadata so nobody can make any changes in the source repository"
]
},
{
Expand Down Expand Up @@ -327,7 +331,9 @@
"source": [
"#### You can directly copy data from local storage to target storage container on your own\n",
"#### or you can run following printed command on your local machine to copy data from local Docker container to local machine first\n",
"#### (change the Docker container name for lakeFS and go to the folder where you cloned lakefs-samples Git repo before running the command)"
"#### (change the Docker container name for lakeFS and go to the folder where you cloned lakefs-samples Git repo before running the command)\n",
"\n",
"#### You can restart lakeFS services after copying the data from source to target"
]
},
{
Expand Down Expand Up @@ -389,14 +395,46 @@
"## Step 5 - Restore Metadata to Target Repository"
]
},
{
"cell_type": "markdown",
"id": "b2ae630f-9440-4bba-9977-1d26f147fdb1",
"metadata": {},
"source": [
"### Download metadata(refs_manifest.json) file created by \"Step 2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cef6e90a-fd2d-42ff-b1ae-053a619ec560",
"id": "af086889-2e99-4c55-98b4-16d564e74fc4",
"metadata": {},
"outputs": [],
"source": [
"target_lakefs_sdk_client.internal_api.restore_refs(target_repo_name, source_lakefs_sdk_client.internal_api.dump_refs(source_repo_name))"
"azureDownloadRefsManifestFileCommand = \"azcopy copy '\" + target_storage_namespace + \"/_lakefs/refs_manifest.json?\" + target_container_SAS_token + \"' .\"\n",
"\n",
"! $azureDownloadRefsManifestFileCommand"
]
},
{
"cell_type": "markdown",
"id": "0b18c28a-3f62-4a20-a6a7-57c514e654b7",
"metadata": {},
"source": [
"### Read refs_manifest.json file and restore metadata to new repository"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9bf070b2-47a4-4c7e-95b8-5d83e36d674d",
"metadata": {},
"outputs": [],
"source": [
"with open('./refs_manifest.json') as file:\n",
" refs_manifest_json = json.load(file)\n",
" print(refs_manifest_json)\n",
" \n",
"target_lakefs_sdk_client.internal_api.restore_refs(target_repo_name, refs_manifest_json)"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Migrate or clone a lakeFS repository
# Backup, migrate or clone a lakeFS repository

Start by ⭐️ starring [lakeFS open source](https://go.lakefs.io/oreilly-course) project.

This repository includes a Jupyter Notebook which you can run on your local machine.
This repository includes Jupyter Notebooks which you can run on your local machine.

## Prerequisites
* Docker installed on your local machine
Expand All @@ -15,19 +15,19 @@ This repository includes a Jupyter Notebook which you can run on your local mach
1. Start by cloning this repository:

```bash
git clone https://github.com/treeverse/lakeFS-samples && cd lakeFS-samples/01_standalone_examples/migrate-or-clone-repo
git clone https://github.com/treeverse/lakeFS-samples && cd lakeFS-samples/01_standalone_examples/backup-migrate-or-clone-repo
```

2. Run following commands to download and run Docker container which includes Python, Spark, Jupyter Notebook, JDK, Hadoop binaries and lakeFS Python client (Docker image size is around 4GB):

```bash
docker build -t lakefs-migrate-or-clone-repo .
docker build -t lakefs-backup-migrate-or-clone-repo .

docker run -d -p 48888:8888 -p 44040:4040 --user root -e GRANT_SUDO=yes -v $PWD:/home/jovyan -v $PWD/jupyter_notebook_config.py:/home/jovyan/.jupyter/jupyter_notebook_config.py --name lakefs-migrate-or-clone-repo lakefs-migrate-or-clone-repo
docker run -d -p 48888:8888 -p 44040:4040 --user root -e GRANT_SUDO=yes -v $PWD:/home/jovyan -v $PWD/jupyter_notebook_config.py:/home/jovyan/.jupyter/jupyter_notebook_config.py --name lakefs-backup-migrate-or-clone-repo lakefs-backup-migrate-or-clone-repo
```

3. Open JupyterLab UI [http://127.0.0.1:48888/](http://127.0.0.1:48888/) in your web browser.

## Demo Instructions

Once you have successfully completed setup then open either "Migrate or Clone AWS Repo", "Migrate or Clone Azure Repo", "Migrate or Clone Local Repo to AWS" or "Migrate or Clone Local Repo to Azure" notebook (depending upon your requirement) from JupyterLab UI and follow the instructions.
Once you have successfully completed setup then open either "Backup Migrate or Clone AWS Repo", "Backup Migrate or Clone Azure Repo", "Migrate or Clone Local Repo to AWS" or "Migrate or Clone Local Repo to Azure" notebook (depending upon your requirement) from JupyterLab UI and follow the instructions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Under the [standalone_examples](./01_standalone_examples/) folder are a set of e
* [Labelbox integration](./01_standalone_examples/labelbox-integration/)
* [Kafka integration](./01_standalone_examples/kafka/)
* [Flink integration](./01_standalone_examples/flink/)
* [How to migrate or clone a repo](./01_standalone_examples/migrate-or-clone-repo/)
* [How to backup, migrate or clone a repo](./01_standalone_examples/backup-migrate-or-clone-repo/)
* [Running lakeFS with PostgreSQL as K/V store](./01_standalone_examples/docker-compose-with-postgres/)

## Got Questions or Want to Chat?
Expand Down

0 comments on commit 85b9c17

Please sign in to comment.