docs2 (#995)

* docs2 * version update * docs update * docs update
catalyst-team · Nov 12, 2020 · 8b1e3f9 · 8b1e3f9
1 parent dd7be23
commit 8b1e3f9
Show file tree

Hide file tree

Showing 7 changed files with 380 additions and 31 deletions.
diff --git a/catalyst/__version__.py b/catalyst/__version__.py
@@ -1 +1 @@
-__version__ = "20.10.1"
+__version__ = "20.11"
diff --git a/docs/faq/amp.rst b/docs/faq/amp.rst
@@ -1,8 +1,155 @@
 Mixed precision training
 ==============================================================================
+Catalyst support a variety of backends for mixed precision training.
+For the PyTorch versions below 1.6 it's better to use ``Nvidia Apex`` helper.
+After PyTorch 1.6 release, it's possible to use AMP natively inside ``torch`` package.
 
-- How to use Nvidia Apex?
-- How to use torch.amp?
+Suppose you have the following pipeline with Linear Regression:
+
+.. code-block:: python
+
+    import torch
+    from torch.utils.data import DataLoader, TensorDataset
+    from catalyst.dl import SupervisedRunner
+
+    # data
+    num_samples, num_features = int(1e4), int(1e1)
+    X, y = torch.rand(num_samples, num_features), torch.rand(num_samples)
+    dataset = TensorDataset(X, y)
+    loader = DataLoader(dataset, batch_size=32, num_workers=1)
+    loaders = {"train": loader, "valid": loader}
+
+    # model, criterion, optimizer, scheduler
+    model = torch.nn.Linear(num_features, 1)
+    criterion = torch.nn.MSELoss()
+    optimizer = torch.optim.Adam(model.parameters())
+    scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6])
+
+    # model training
+    runner = SupervisedRunner()
+    runner.train(
+        model=model,
+        criterion=criterion,
+        optimizer=optimizer,
+        scheduler=scheduler,
+        loaders=loaders,
+        logdir="./logdir",
+        num_epochs=8,
+        verbose=True,
+    )
+
+Nvidia Apex
+----------------------------------------------------
+To use Nvidia Apex fp16 support you firstly need to install it with,
+
+.. code-block:: bash
+
+    !git clone https://github.com/NVIDIA/apex
+    !pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex
+
+After that you could easily extend our current pipeline with just one line of code:
+
+.. code-block:: python
+
+    import torch
+    from torch.utils.data import DataLoader, TensorDataset
+    from catalyst.dl import SupervisedRunner
+
+    # data
+    num_samples, num_features = int(1e4), int(1e1)
+    X, y = torch.rand(num_samples, num_features), torch.rand(num_samples)
+    dataset = TensorDataset(X, y)
+    loader = DataLoader(dataset, batch_size=32, num_workers=1)
+    loaders = {"train": loader, "valid": loader}
+
+    # model, criterion, optimizer, scheduler
+    model = torch.nn.Linear(num_features, 1)
+    criterion = torch.nn.MSELoss()
+    optimizer = torch.optim.Adam(model.parameters())
+    scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6])
+
+    # model training
+    runner = SupervisedRunner()
+    runner.train(
+        model=model,
+        criterion=criterion,
+        optimizer=optimizer,
+        scheduler=scheduler,
+        loaders=loaders,
+        logdir="./logdir",
+        num_epochs=8,
+        verbose=True,
+        fp16=dict(apex=True, opt_level="O1") # <-- Nvidia Apex FP16 params -->
+    )
+
+You could also check out the example above in `this Google Colab notebook`_
+
+Torch AMP
+----------------------------------------------------
+If you would like to use native AMP support, you could do the following:
+
+.. code-block:: python
+
+    import torch
+    from torch.utils.data import DataLoader, TensorDataset
+    from catalyst.dl import SupervisedRunner
+
+    # data
+    num_samples, num_features = int(1e4), int(1e1)
+    X, y = torch.rand(num_samples, num_features), torch.rand(num_samples)
+    dataset = TensorDataset(X, y)
+    loader = DataLoader(dataset, batch_size=32, num_workers=1)
+    loaders = {"train": loader, "valid": loader}
+
+    # model, criterion, optimizer, scheduler
+    model = torch.nn.Linear(num_features, 1)
+    criterion = torch.nn.MSELoss()
+    optimizer = torch.optim.Adam(model.parameters())
+    scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6])
+
+    # model training
+    runner = SupervisedRunner()
+    runner.train(
+        model=model,
+        criterion=criterion,
+        optimizer=optimizer,
+        scheduler=scheduler,
+        loaders=loaders,
+        logdir="./logdir",
+        num_epochs=8,
+        verbose=True,
+        fp16=dict(amp=True) # <-- PyTorch AMP FP16 params -->
+    )
+
+You could also check out the example above in `this Google Colab notebook`_
+
+.. _`this Google Colab notebook`: https://colab.research.google.com/drive/12ONaj4sMPiOT_64wh2bpH_AvRCuNFxLx?usp=sharing
+
+Nvidia Apex (Config API)
+----------------------------------------------------
+
+Firstly, prepare the config. For example, like:
+
+.. code-block:: yaml
+
+    distributed_params:
+        opt_level: "O1"
+        ...
+
+After that you ca easily run:
+
+.. code-block:: bash
+
+    catalyst-dl run -C=/path/to/configs --apex
+
+Torch AMP (Config API)
+----------------------------------------------------
+
+For native AMP support you only need to pass required flag to the ``run`` command:
+
+.. code-block:: bash
+
+    catalyst-dl run -C=/path/to/configs --amp
 
 If you haven't found the answer for your question, feel free to `join our slack`_ for the discussion.
 

diff --git a/docs/faq/checkpointing.rst b/docs/faq/checkpointing.rst
@@ -1,11 +1,74 @@
-[WIP] Model checkpointing
+Model checkpointing
 ==============================================================================
 
-- how to load bset model?
-- notebook and config api
-- how to save model?
-- how to load model?
-- whats the difference between checkpoint and checkpoint_full?
+Experiment checkpoints
+----------------------------------------------------
+With the help of ``CheckpointCallback``
+Catalyst creates the following checkpoints structure under selected ``logdir``:
+
+.. code-block:: bash
+
+    logdir/
+        code/ <-- your experiment and catalyst code for reproducibility -->
+        checkpoints/ <-- theme of the topic -->
+            {stage_name}.{epoch_index}.pth <-- topK checkpoints based on model selection logic -->
+            best.pth <-- best model based on specified model selection logic -->
+            last.pth <-- last model checkpoint in the whole experiment run -->
+            <-- the same checkpoints with ``_full`` prefix -->
+        ...
+
+This checkpoint are pure PyTorch checkpoints without any mixins with the following structure:
+
+.. code-block:: bash
+
+    checkpoint.pth = {
+        "model_state_dict": model.state_dict(),
+        "criterion_state_dict": criterion.state_dict(),
+        "optimizer_state_dict": optimizer.state_dict(),
+        "scheduler_state_dict": scheduler.state_dict(),
+    }
+
+Full checkpoints
+----------------------------------------------------
+Catalyst saves 2 types of checkpoints:
+
+- ``{checkpoint}.pth`` - which stores only model state dict and could be easily used for production purposes.
+- ``{checkpoint}_full.pth`` - which stores all state dicts for model(s), criterion(s), optimizer(s) and scheduler(s) and could be easily used for experiment analysis purposes.
+
+Save model
+----------------------------------------------------
+Catalyst has a user-friendly utils to save the model:
+
+.. code-block:: python
+
+    from catalyst import utils
+
+    model = Net()
+    checkpoint = utils.pack_checkpoint(model=model)
+    utils.save_checkpoint(checkpoint, logdir="/path/to/logdir", suffix="my_checkpoint")
+    #  now you could find your checkpoint under "/path/to/logdir/my_checkpoint.pth" location
+
+Load model
+----------------------------------------------------
+With Catalyst utils it's very easy to load models after experiment run:
+
+.. code-block:: python
+
+    from catalyst import utils
+
+    model = Net()
+    optimizer = ...
+    criterion = ...
+    checkpoint = utils.load_checkpoint(path="/path/to/checkpoint")
+    utils.unpack_checkpoint(
+        checkpoint=checkpoint,
+        model=model,
+        optimizer=optimizer,
+        criterion=criterion
+    )
+
+In this case Catalyst would try to unpack requested state dicts from the checkpoint.
+
 
 If you haven't found the answer for your question, feel free to `join our slack`_ for the discussion.
 

diff --git a/docs/faq/ddp.rst b/docs/faq/ddp.rst
@@ -1,8 +1,145 @@
-[WIP] Distributed training
+Distributed training
 ==============================================================================
+Catalyst supports automatic experiments scaling with distributed training support.
+
+Notebook API
+----------------------------------------------------
+
+Suppose you have the following pipeline with Linear Regression:
+
+.. code-block:: python
+
+    import torch
+    from torch.utils.data import DataLoader, TensorDataset
+    from catalyst.dl import SupervisedRunner
+
+    # experiment setup
+    logdir = "./logdir"
+    num_epochs = 8
+
+    # data
+    num_samples, num_features = int(1e4), int(1e1)
+    X, y = torch.rand(num_samples, num_features), torch.rand(num_samples)
+    dataset = TensorDataset(X, y)
+    loader = DataLoader(dataset, batch_size=32, num_workers=1)
+    loaders = {"train": loader, "valid": loader}
+
+    # model, criterion, optimizer, scheduler
+    model = torch.nn.Linear(num_features, 1)
+    criterion = torch.nn.MSELoss()
+    optimizer = torch.optim.Adam(model.parameters())
+    scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6])
+
+    # model training
+    runner = SupervisedRunner()
+    runner.train(
+        model=model,
+        criterion=criterion,
+        optimizer=optimizer,
+        scheduler=scheduler,
+        loaders=loaders,
+        logdir=logdir,
+        num_epochs=num_epochs,
+        verbose=True,
+    )
+
+For correct DDP training, you need to split your dataset creation from the main training.
+In this way Catalyst could easily transfer your datasets to the distributed mode
+and there would be no data re-creation on each worker.
+
+As a best practice scenario for this case:
+
+.. code-block:: python
+
+    import torch
+    from torch.utils.data import TensorDataset
+    from catalyst.dl import SupervisedRunner, utils
+
+    def datasets_fn(num_features: int):
+        X = torch.rand(int(1e4), num_features)
+        y = torch.rand(X.shape[0])
+        dataset = TensorDataset(X, y)
+        return {"train": dataset, "valid": dataset}
+
+    def train():
+        num_features = int(1e1)
+        # model, criterion, optimizer, scheduler
+        model = torch.nn.Linear(num_features, 1)
+        criterion = torch.nn.MSELoss()
+        optimizer = torch.optim.Adam(model.parameters())
+        scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6])
+
+        runner = SupervisedRunner()
+        runner.train(
+            model=model,
+            datasets={
+                "batch_size": 32,
+                "num_workers": 1,
+                "get_datasets_fn": datasets_fn,
+                "num_features": num_features,
+            },
+            criterion=criterion,
+            optimizer=optimizer,
+            scheduler=scheduler,
+            logdir="./logs/example_3",
+            num_epochs=8,
+            verbose=True,
+            distributed=False,
+        )
+
+    utils.distributed_cmd_run(train)
+
+Config API
+----------------------------------------------------
+To run Catalyst experiments in the DDP-mode,
+the only thing you need to do for the Config API - pass required flag to the ``run`` command:
+
+.. code-block:: bash
+
+    catalyst-dl run -C=/path/to/configs --distributed
+
+Launch your training
+----------------------------------------------------
+
+In your terminal,
+type the following line (adapt `script_name` to your script name ending with .py).
+
+.. code-block:: bash
+
+    python {script_name}
+
+You can vary availble GPUs with ``CUDA_VIBIBLE_DEVICES`` option, for example,
+
+.. code-block:: bash
+
+    # run only on 1st and 2nd GPUs
+    CUDA_VISIBLE_DEVICES="1,2" python {script_name}
+
+.. code-block:: bash
+
+    # run only on 0, 1st and 3rd GPUs
+    CUDA_VISIBLE_DEVICES="0,1,3" python {script_name}
+
+
+What will happen is that the same model will be copied on all your available GPUs.
+During training, the full dataset will randomly be split between the GPUs
+(that will change at each epoch).
+Each GPU will grab a batch (on that fractioned dataset),
+pass it through the model, compute the loss then back-propagate the gradients.
+Then they will share their results and average them,
+which means like your training is the equivalent of a training
+with a batch size of ```batch_size x num_gpus``
+(where ``batch_size`` is what you used in your script).
+
+Since they all have the same gradients at this stage,
+they will al perform the same update,
+so the models will still be the same after this step.
+Then training continues with the next batch,
+until the number of desired iterations is done.
+
+During training Catalyst will automatically average all metrics
+and log them on ``Master`` node only. Same logic used for model checkpointing.
 
-- How to run experiments in distributed mode?
-- (?) How to collect metrics in distributed mode in the right way?
 
 If you haven't found the answer for your question, feel free to `join our slack`_ for the discussion.