Skip to content

Commit

Permalink
Fix #18 (#34)
Browse files Browse the repository at this point in the history
* make pining not the default, add setup guide Ref #18

* Update README
  • Loading branch information
chainsawriot authored Mar 25, 2024
1 parent 8a45df9 commit 230cb33
Show file tree
Hide file tree
Showing 7 changed files with 152 additions and 24 deletions.
3 changes: 3 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ RoxygenNote: 7.3.1
URL: https://github.com/gesistsa/grafzahl
BugReports: https://github.com/gesistsa/grafzahl/issues
Suggests:
knitr,
quanteda.textmodels,
rmarkdown,
testthat (>= 3.0.0),
withr
Config/testthat/edition: 3
Expand All @@ -26,3 +28,4 @@ Imports:
LazyData: true
Depends:
R (>= 3.5)
VignetteBuilder: knitr
24 changes: 15 additions & 9 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,9 @@ Please cite this software as:

Chan, C., (2023). [grafzahl: fine-tuning Transformers for text data from within R](paper/grafzahl_sp.pdf). *Computational Communication Research* 5(1): 76-84. [https://doi.org/10.5117/CCR2023.1.003.CHAN](https://doi.org/10.5117/CCR2023.1.003.CHAN)

## Installation
## Installation: Local environment

You can install the development version of grafzahl like so:

``` r
remotes::install_github("chainsawriot/grafzahl")
```

Or, you can install the CRAN version
Install the CRAN version

```r
install.packages("grafzahl")
Expand All @@ -48,9 +42,21 @@ require(grafzahl)
setup_grafzahl(cuda = TRUE) ## if you have GPU(s)
```

## On remote environments, e.g. Google Colab

On Google Colab, you need to enable non-Conda mode

```r
install.packages("grafzahl")
require(grafzahl)
use_nonconda()
```

Please refer the vignette.

## Usage

Suppose you have a bunch of tweets in quanteda corpus format. And the corpus has exactly one docvar that denotes the labels you want to predict. The data is from [this repository](https://github.com/pablobarbera/incivility-sage-open) (Theocharis et al., 2020).
Suppose you have a bunch of tweets in the quanteda corpus format. And the corpus has exactly one docvar that denotes the labels you want to predict. The data is from [this repository](https://github.com/pablobarbera/incivility-sage-open) (Theocharis et al., 2020).

```{r, echo = FALSE, message = FALSE}
devtools::load_all()
Expand Down
24 changes: 15 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,31 +28,37 @@ Chan, C., (2023). [grafzahl: fine-tuning Transformers for text data from
within R](paper/grafzahl_sp.pdf). *Computational Communication Research*
5(1): 76-84. <https://doi.org/10.5117/CCR2023.1.003.CHAN>

## Installation
## Installation: Local environment

You can install the development version of grafzahl like so:
Install the CRAN version

``` r
remotes::install_github("chainsawriot/grafzahl")
install.packages("grafzahl")
```

Or, you can install the CRAN version
After that, you need to setup your conda environment

``` r
install.packages("grafzahl")
require(grafzahl)
setup_grafzahl(cuda = TRUE) ## if you have GPU(s)
```

After that, you need to setup your conda environment
## On remote environments, e.g. Google Colab

On Google Colab, you need to enable non-Conda mode

``` r
install.packages("grafzahl")
require(grafzahl)
setup_grafzahl(cuda = TRUE) ## if you have GPU(s)
use_nonconda()
```

Please refer the vignette.

## Usage

Suppose you have a bunch of tweets in quanteda corpus format. And the
corpus has exactly one docvar that denotes the labels you want to
Suppose you have a bunch of tweets in the quanteda corpus format. And
the corpus has exactly one docvar that denotes the labels you want to
predict. The data is from [this
repository](https://github.com/pablobarbera/incivility-sage-open)
(Theocharis et al., 2020).
Expand Down
10 changes: 5 additions & 5 deletions inst/grafzahl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ channels:
- anaconda
- defaults
dependencies:
- python=3.10
- python
- pip
- pytorch>=1.6+cpuonly
- pip:
- pandas
- tqdm
- simpletransformers==0.63.11
- emoji==0.6.0
- transformers==4.30.2
- scipy==1.10.1
- simpletransformers
- emoji
- transformers
- scipy
2 changes: 1 addition & 1 deletion inst/grafzahl_gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ dependencies:
- pip:
- pandas
- tqdm
- emoji==0.6.0
- emoji
2 changes: 2 additions & 0 deletions vignettes/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.html
*.R
111 changes: 111 additions & 0 deletions vignettes/grafzahl.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
title: "Setup Guide"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Setup Guide}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

This is a quick setup guide for different situations.

`grafzahl` requires a Python environment. By default, `grafzahl` assumes you would like to use a miniconda-based Python environment. It can be installed by using the provided `setup_grafzahl()` function.

```r
require(grafzahl)
setup_grafzahl(cuda = TRUE) # FALSE if you don't have CUDA compatible GPUs

## Use grafzahl right away, an example
model <- grafzahl(unciviltweets, model_type = "bertweet", model_name = "vinai/bertweet-base")

```

There are other setup options.

# Google Colab and similar services

In order to use `grafzahl` on Google Colab, please choose the R-based Runtime (Runtime > Change Runtime Type > Runtime Type: R). You might also want to choose a hardware accelerator, e.g. T4 GPU.

In this case, you need to enable the non-Conda mode, i.e. `use_nonconda()`. By default, it will also install the required Python packages.

```r
install.packages("grafzahl")
use_nonconda(install = TRUE, check = TRUE) # default

## Use grafzahl right away, an example
model <- grafzahl(unciviltweets, model_type = "bertweet", model_name = "vinai/bertweet-base")
```

# Default Python

If you don't want to use any conda configuration on your local machine, you can just install the Python packages `simpletransformers` and `emoji`.

```bash
python3 -m pip install simpletransformers emoji
```

And then

```r
require(grafzahl)
use_nonconda(install = FALSE, check = TRUE) ## what it does is just: options("grafzahl.nonconda" = TRUE)

## Use grafzahl right away, an example
model <- grafzahl(unciviltweets, model_type = "bertweet", model_name = "vinai/bertweet-base")
```

# Use conda, but not the grafzahl's default

Suppose you have installed a conda installation elsewhere. Please note the `base` path of your conda installation.

```bash
conda env list
```

Create a new conda environment with the default grafzahl environment name

## With Cuda

```bash
conda env create -n grafzahl_condaenv_cuda
conda activate grafzahl_condaenv_cuda
conda install -n grafzahl_condaenv_cuda python pip pytorch pytorch-cuda cudatoolkit -c pytorch -c nvidia
python -m pip install simpletransformers emoji
conda deactivate

## Test the CUDA installation with

Rscript -e "grafzahl::detect_cuda()"
```

## Without Cuda

```bash
conda env create -n grafzahl_condaenv
conda activate grafzahl_condaenv
conda install -n grafzahl_condaenv python pip pytorch -c pytorch
python -m pip install simpletransformers emoji
conda deactivate
```

In R, you have to change to default conda path

```r
## suppose /home/yourname/miniconda is the base path of your conda installation
require(grafzahl)
Sys.setenv(GRAFZAHL_MINICONDA_PATH = "/home/yourname/miniconda")

## Use grafzahl right away, an example
model <- grafzahl(unciviltweets, model_type = "bertweet", model_name = "vinai/bertweet-base")
```

# Explanation: Important options and envvars

There are two important options and envvars. `options("grafzahl.nonconda")` controls whether to use the non-conda mode. Envvar `GRAFZAHL_MINICONDA_PATH` controls the base path of the conda installation. If it is `""` (the default), `reticulate::miniconda_path()` is used as the base path.

0 comments on commit 230cb33

Please sign in to comment.