diff --git a/DESCRIPTION b/DESCRIPTION index ecff5f4..3166a39 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -12,7 +12,9 @@ RoxygenNote: 7.3.1 URL: https://github.com/gesistsa/grafzahl BugReports: https://github.com/gesistsa/grafzahl/issues Suggests: + knitr, quanteda.textmodels, + rmarkdown, testthat (>= 3.0.0), withr Config/testthat/edition: 3 @@ -26,3 +28,4 @@ Imports: LazyData: true Depends: R (>= 3.5) +VignetteBuilder: knitr diff --git a/README.Rmd b/README.Rmd index 5a4ba84..429c6d8 100644 --- a/README.Rmd +++ b/README.Rmd @@ -27,15 +27,9 @@ Please cite this software as: Chan, C., (2023). [grafzahl: fine-tuning Transformers for text data from within R](paper/grafzahl_sp.pdf). *Computational Communication Research* 5(1): 76-84. [https://doi.org/10.5117/CCR2023.1.003.CHAN](https://doi.org/10.5117/CCR2023.1.003.CHAN) -## Installation +## Installation: Local environment -You can install the development version of grafzahl like so: - -``` r -remotes::install_github("chainsawriot/grafzahl") -``` - -Or, you can install the CRAN version +Install the CRAN version ```r install.packages("grafzahl") @@ -48,9 +42,21 @@ require(grafzahl) setup_grafzahl(cuda = TRUE) ## if you have GPU(s) ``` +## On remote environments, e.g. Google Colab + +On Google Colab, you need to enable non-Conda mode + +```r +install.packages("grafzahl") +require(grafzahl) +use_nonconda() +``` + +Please refer the vignette. + ## Usage -Suppose you have a bunch of tweets in quanteda corpus format. And the corpus has exactly one docvar that denotes the labels you want to predict. The data is from [this repository](https://github.com/pablobarbera/incivility-sage-open) (Theocharis et al., 2020). +Suppose you have a bunch of tweets in the quanteda corpus format. And the corpus has exactly one docvar that denotes the labels you want to predict. The data is from [this repository](https://github.com/pablobarbera/incivility-sage-open) (Theocharis et al., 2020). ```{r, echo = FALSE, message = FALSE} devtools::load_all() diff --git a/README.md b/README.md index 9c3e83c..ca5325f 100644 --- a/README.md +++ b/README.md @@ -28,31 +28,37 @@ Chan, C., (2023). [grafzahl: fine-tuning Transformers for text data from within R](paper/grafzahl_sp.pdf). *Computational Communication Research* 5(1): 76-84. -## Installation +## Installation: Local environment -You can install the development version of grafzahl like so: +Install the CRAN version ``` r -remotes::install_github("chainsawriot/grafzahl") +install.packages("grafzahl") ``` -Or, you can install the CRAN version +After that, you need to setup your conda environment ``` r -install.packages("grafzahl") +require(grafzahl) +setup_grafzahl(cuda = TRUE) ## if you have GPU(s) ``` -After that, you need to setup your conda environment +## On remote environments, e.g. Google Colab + +On Google Colab, you need to enable non-Conda mode ``` r +install.packages("grafzahl") require(grafzahl) -setup_grafzahl(cuda = TRUE) ## if you have GPU(s) +use_nonconda() ``` +Please refer the vignette. + ## Usage -Suppose you have a bunch of tweets in quanteda corpus format. And the -corpus has exactly one docvar that denotes the labels you want to +Suppose you have a bunch of tweets in the quanteda corpus format. And +the corpus has exactly one docvar that denotes the labels you want to predict. The data is from [this repository](https://github.com/pablobarbera/incivility-sage-open) (Theocharis et al., 2020). diff --git a/inst/grafzahl.yml b/inst/grafzahl.yml index 3e1f067..33a0b4d 100644 --- a/inst/grafzahl.yml +++ b/inst/grafzahl.yml @@ -5,13 +5,13 @@ channels: - anaconda - defaults dependencies: - - python=3.10 + - python - pip - pytorch>=1.6+cpuonly - pip: - pandas - tqdm - - simpletransformers==0.63.11 - - emoji==0.6.0 - - transformers==4.30.2 - - scipy==1.10.1 + - simpletransformers + - emoji + - transformers + - scipy diff --git a/inst/grafzahl_gpu.yml b/inst/grafzahl_gpu.yml index 1695fd4..423ef2d 100644 --- a/inst/grafzahl_gpu.yml +++ b/inst/grafzahl_gpu.yml @@ -9,4 +9,4 @@ dependencies: - pip: - pandas - tqdm - - emoji==0.6.0 + - emoji diff --git a/vignettes/.gitignore b/vignettes/.gitignore new file mode 100644 index 0000000..097b241 --- /dev/null +++ b/vignettes/.gitignore @@ -0,0 +1,2 @@ +*.html +*.R diff --git a/vignettes/grafzahl.Rmd b/vignettes/grafzahl.Rmd new file mode 100644 index 0000000..327639a --- /dev/null +++ b/vignettes/grafzahl.Rmd @@ -0,0 +1,111 @@ +--- +title: "Setup Guide" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Setup Guide} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +This is a quick setup guide for different situations. + +`grafzahl` requires a Python environment. By default, `grafzahl` assumes you would like to use a miniconda-based Python environment. It can be installed by using the provided `setup_grafzahl()` function. + +```r +require(grafzahl) +setup_grafzahl(cuda = TRUE) # FALSE if you don't have CUDA compatible GPUs + +## Use grafzahl right away, an example +model <- grafzahl(unciviltweets, model_type = "bertweet", model_name = "vinai/bertweet-base") + +``` + +There are other setup options. + +# Google Colab and similar services + +In order to use `grafzahl` on Google Colab, please choose the R-based Runtime (Runtime > Change Runtime Type > Runtime Type: R). You might also want to choose a hardware accelerator, e.g. T4 GPU. + +In this case, you need to enable the non-Conda mode, i.e. `use_nonconda()`. By default, it will also install the required Python packages. + +```r +install.packages("grafzahl") +use_nonconda(install = TRUE, check = TRUE) # default + +## Use grafzahl right away, an example +model <- grafzahl(unciviltweets, model_type = "bertweet", model_name = "vinai/bertweet-base") +``` + +# Default Python + +If you don't want to use any conda configuration on your local machine, you can just install the Python packages `simpletransformers` and `emoji`. + +```bash +python3 -m pip install simpletransformers emoji +``` + +And then + +```r +require(grafzahl) +use_nonconda(install = FALSE, check = TRUE) ## what it does is just: options("grafzahl.nonconda" = TRUE) + +## Use grafzahl right away, an example +model <- grafzahl(unciviltweets, model_type = "bertweet", model_name = "vinai/bertweet-base") +``` + +# Use conda, but not the grafzahl's default + +Suppose you have installed a conda installation elsewhere. Please note the `base` path of your conda installation. + +```bash +conda env list +``` + +Create a new conda environment with the default grafzahl environment name + +## With Cuda + +```bash +conda env create -n grafzahl_condaenv_cuda +conda activate grafzahl_condaenv_cuda +conda install -n grafzahl_condaenv_cuda python pip pytorch pytorch-cuda cudatoolkit -c pytorch -c nvidia +python -m pip install simpletransformers emoji +conda deactivate + +## Test the CUDA installation with + +Rscript -e "grafzahl::detect_cuda()" +``` + +## Without Cuda + +```bash +conda env create -n grafzahl_condaenv +conda activate grafzahl_condaenv +conda install -n grafzahl_condaenv python pip pytorch -c pytorch +python -m pip install simpletransformers emoji +conda deactivate +``` + +In R, you have to change to default conda path + +```r +## suppose /home/yourname/miniconda is the base path of your conda installation +require(grafzahl) +Sys.setenv(GRAFZAHL_MINICONDA_PATH = "/home/yourname/miniconda") + +## Use grafzahl right away, an example +model <- grafzahl(unciviltweets, model_type = "bertweet", model_name = "vinai/bertweet-base") +``` + +# Explanation: Important options and envvars + +There are two important options and envvars. `options("grafzahl.nonconda")` controls whether to use the non-conda mode. Envvar `GRAFZAHL_MINICONDA_PATH` controls the base path of the conda installation. If it is `""` (the default), `reticulate::miniconda_path()` is used as the base path.