Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove github submodule #4

Merged
merged 2 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions .github/workflows/push-docker-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,7 @@ jobs:
# Link to discussion: https://github.com/orgs/community/discussions/25678

- name: Checkout
uses: actions/checkout@v3
with:
submodules: true

uses: actions/checkout@v3
- name: Docker meta
id: meta
uses: crazy-max/ghaction-docker-meta@v2
Expand Down
3 changes: 1 addition & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@ RUN echo "export PATH=\"/opt/conda/bin:/root/.cargo/bin:\$PATH\"" >> /root/.bash
# Install Python dependencies (The gradual copies help with caching)
WORKDIR open_diloco
RUN pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu
COPY hivemind_source hivemind_source
RUN pip install --no-cache-dir ./hivemind_source
RUN pip install flash-attn>=2.5.8
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY requirements-dev.txt requirements-dev.txt
Expand Down
30 changes: 5 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,26 +30,16 @@ source .venv/bin/activate

Install python dependencies:
```bash
# Hivemind
cd hivemind_source
pip install .
cp build/lib/hivemind/proto/* hivemind/proto/.
pip install -e ".[all]"
cd ..
# Requirements
pip install -r requirements.txt
# Others
pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu
samsja marked this conversation as resolved.
Show resolved Hide resolved
pip install -e ./pydantic_config
# OpenDiLoCo
pip install .
```

Optionally, you can install flash-attn to use Flash Attention 2.
This requires your system to have cuda compiler set up.
```

```bash
# (Optional) flash-attn
pip install flash-attn==2.5.8
samsja marked this conversation as resolved.
Show resolved Hide resolved
pip install flash-attn>=2.5.8
```

## Docker container
Expand Down Expand Up @@ -305,20 +295,10 @@ We recommend using `bf16` to avoid scaling and desynchronization issues with hiv


# Debugging Issues
1. `hivemind` or `pydantic_config`
If you are having issues with `hivemind` or `pydantic_config`, the issue could be related to submodules.
You can clean and reinitialize the submodules from the root of the repository with the following commands:

```
git submodule deinit -f .
git clean -xdf
git submodule update --init --recursive
```

2. `RuntimeError: CUDA error: invalid device ordinal`
1. `RuntimeError: CUDA error: invalid device ordinal`
A possible culprit is that your `--nproc-per-node` argument for the torchrun launcher is set incorrectly.
Please set it to an integer less than equal to the number of gpus you have on your machine.

3. `torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate...`
2. `torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate...`
A possible culprit is that your `--per-device-train-batch-size` is too high.
Try a smaller value.
1 change: 0 additions & 1 deletion hivemind_source
Submodule hivemind_source deleted from ad080e
1 change: 0 additions & 1 deletion pydantic_config
Submodule pydantic_config deleted from 8e19e0
7 changes: 5 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
transformers~=4.40
datasets>=2.19.1
wandb==0.16.4
wandb>=0.16.4
cyclopts>=2.6.1
fsspec[gcs]>=2024.3.1
torch==2.3.1
torch>=2.3.1
hivemind @ git+https://github.com/learning-at-home/hivemind.git@213bff9
pydantic_config @ git+https://github.com/samsja/pydantic_config.git@8e19e05

Loading