Skip to content

Text Processing for Small or Big Data Files in R

Notifications You must be signed in to change notification settings

jcmartinezovando/textTinyR

 
 

Repository files navigation

CRAN_Status_Badge Travis-CI Build Status codecov.io AppVeyor build status Downloads

textTinyR


The textTinyR package consists of text pre-processing functions for small or big data files. More details on the functionality of the textTinyR can be found in the blog-post and in the package Vignette. The R package can be installed, in the following OS's: Linux, Mac and Windows. However, there are some limitations :

  • there is no support for chinese, japanese, korean, thai or languages with ambiguous word boundaries.
  • there is no support functions for utf-locale on windows, meaning only english character strings or files can be input and pre-processed.

System Requirements ( for unix OS's )


Debian/Ubuntu

sudo apt-get install libboost-all-dev

sudo apt-get update

sudo apt-get install libboost-locale-dev


Fedora

yum install boost-devel


Macintosh OSX/brew

The boost library will be installed on Macintosh OSx using the Homebrew package manager,

If the boost library is already installed using brew install boost then it must be removed using the following command,


brew uninstall boost


Then the formula for the boost library should be modified using a text editor (TextEdit, TextMate, etc). The formula is saved in:


/usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/boost.rb


The user should open the boost.rb formula and replace the following code chunk beginning from (approx.) line 71,


# layout should be synchronized with boost-python
args = ["--prefix=#{prefix}",
        "--libdir=#{lib}",
        "-d2",
        "-j#{ENV.make_jobs}",
        "--layout=tagged",
        "--user-config=user-config.jam",
        "install"]

if build.with? "single"
  args << "threading=multi,single"
else
  args << "threading=multi"
end

with the following code chunk,


# layout should be synchronized with boost-python
args = ["--prefix=#{prefix}",
        "--libdir=#{lib}",
        "-d2",
        "-j#{ENV.make_jobs}",
        "--layout=system", 
        "--user-config=user-config.jam",
        "threading=multi",
        "install"]

#if build.with? "single"
#  args << "threading=multi,single"
#else
#  args << "threading=multi"
#end

Then the user should save the changes, close the file and run,


brew update


to apply the changes.


Then he/she should open a new terminal (console) and type the following command, which installs the boost library using the modified formula from source, (warning: there are two dashes before : build-from-source)


brew install /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/boost.rb --build-from-source


That's it.


Installation of the textTinyR package (CRAN, Github)


To install the package from CRAN use,

install.packages('textTinyR', clean = TRUE)


and to download the latest version from Github use the install_github function of the devtools package,

devtools::install_github(repo = 'mlampros/textTinyR', clean = TRUE)


Use the following link to report bugs/issues,

https://github.com/mlampros/textTinyR/issues

About

Text Processing for Small or Big Data Files in R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 72.9%
  • C++ 26.9%
  • M4 0.2%