Skip to content
alexandreyc edited this page Jun 6, 2012 · 6 revisions

R: Notes

Some collected notes for using R.

Warning I don't know much R. So this might not be the best way to do it

The SUR example and the new helper functions are currently in my branch

Skipper's original file is in tools/R2nparray

The files can be sourced in R to make the helper functions available (for example with Windows path separators)

source("E:\\path_to_repo\\tools\\R2nparray\\R\\R2nparray.R")
source("E:\\path_to_repo\\tools\\topy.R")

Introspection

Assuming we already made a call to systemfit and assigned the results to SUR

> names(SUR)
 [1] "eq"           "call"         "coefficients" "coefCov"      "residCovEst"  "residCov"     "method"       "rank"
 [9] "df.residual"  "iter"         "control"      "panelLike"
> attributes(SUR)
$names
 [1] "eq"           "call"         "coefficients" "coefCov"      "residCovEst"  "residCov"     "method"       "rank"
 [9] "df.residual"  "iter"         "control"      "panelLike"

$class(SUR)
[1] "systemfit"

> cc = SUR$coefCov
> is.numeric(cc)
[1] TRUE
> class(cc)
[1] "matrix"
> is.matrix(cc)
[1] TRUE
> class(SUR$eq)
[1] "list"

Looping over names - mkarray

A for loop that prints out all numeric attributes as python code that creates numpy arrays.
  • SUR[ [name]] or get(SUR, name) accesses the names attributes (?) of the object SUR. (I'm adding extra space between [ [ to avoid the Wiki to convert it to a link. It needs to be without space to be valid R code.)
  • mkarray is one of our helper functions in tools to print the data as np.array
  • it's a oneliner so it was easier to work with in the R shell
> for (name in names(SUR)) {if (is.numeric(SUR[ [name]])) {mkarray(SUR[ [name]], name)}}; cat("\n")
coefficients = np.array([0.9979991848420328,0.06886083327936214,...,0.0429020916196108])

coefCov = np.array([157.3943509170185,-0.2165142902938106,...,0.002035467551712387]).reshape(15,15, order='F')

residCovEst = np.array([176.3202565715889,-25.14782439226425,...,104.3078782568039]).reshape(5,5, order='F')

residCov = np.array([180.2786473970981,3.703259980763286,...,111.6549965340746]).reshape(5,5, order='F')

rank = np.array([15])

df.residual = np.array([85])

iter = np.array([1])

Create named list and save to python module - R2nparray

> aa = list(covparams=SUR$coefCov, rank=SUR$rank)
> R2nparray(aa, fname="temp3.py")

The content of temp3.py module is then

------------ temp3.py ----------
import numpy as np

covparams = np.array([157.3943509170185,-0.2165142902938106,...,0.002035467551712387]).reshape(15,15, order='F')

rank = np.array([15])
--------------------------------

Saving a dataframe

f is a data frame with fitted values from the ``SUR` model

> class(f)
[1] "data.frame"
> f
                Chrysler   General.Electric     General.Motors          US.Steel      Westinghouse
X1935  32.98546930516650  34.82254735597956  208.2453286635445 247.5131792455174 12.27690563625844
X1936  61.83516118316266  66.98918588257341  420.2793547553419 300.2827737683187 30.52156144761057
...

Calling another helper function, writes the data series of the data frame into a python module

> R2nparray(f, fname="temp4.py")

------------ temp4.py ----------
import numpy as np

Chrysler = np.array([32.9854693051665,...,177.371048256085])

General_Electric = np.array([34.82254735597956,...,195.5150518056073])

General_Motors = np.array([208.2453286635445,...,1364.599470457204])

US_Steel = np.array([247.5131792455174,3...,566.277048536767])

Westinghouse = np.array([12.27690563625844,...,77.5688631853628])
--------------------------------

We can also combine these two, named list aa and data frame f and save them at the same time

R2nparray(c(aa, f), fname="temp5.py")

The resulting python module contains the merged content

>>> import temp5
>>> dir(temp5)
['Chrysler', 'General_Electric', 'General_Motors', 'US_Steel',
'Westinghouse', '__builtins__', '__doc__', '__file__', '__name__',
'__package__', 'covparams', 'np', 'rank']
>>> temp5.covparams.shape
(15, 15)

Save all - cat_items

a new version that saves everything that is not blacklisted, but currently mainly numerical types are useful. (TODO:not committed to statsmodels/tools yet, and no name cleaning):

> cat_items(SUR, prefix="sur.", blacklist=c("eq", "control"))
sur.call = '''systemfit(formula = formula, method = "SUR", data = panel)'''
sur.coefficients = np.array([0.9979991848420328,...,0.0429020916196108]).reshape(15,1, order='F')


sur.coefCov = np.array([157.3943509170185,...,0.002035467551712387]).reshape(15,15, order='F')


sur.residCovEst = np.array([176.3202565715889,...,104.3078782568039]).reshape(5,5, order='F')


sur.residCov = np.array([180.2786473970981,...,111.6549965340746]).reshape(5,5, order='F')


sur.method = SUR
sur.rank = 15
sur.df.residual = 85
sur.iter = 1
sur.panelLike = '''TRUE'''

Redirecting output to file - sink

Our helper functions use cat to write the output. cat print the strings to the standard output. The output can be redirected to a file using sink, for example

fname = "tmp_sur.py"
append = TRUE

sink(file=fname, append=append)
mkarray(SUR$coefficients, "params")
mkarray(SUR$coefCov, "cov_params")
mkarray(SUR$residCovEst, "resid_cov_est")
mkarray(SUR$residCov, "resid_cov")
mkarray(SUR$df.residual, "df_resid")
sink()

sink() clears the redirecting of the output. When there is an exception in the code, then sink() is not called and the interpreter shell doesn't print any output anymore. Typing sink() once or several times will bring the standard output back to the shell.

Clone this wiki locally