Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation Issues #339

Open
tfurmston opened this issue Oct 30, 2020 · 4 comments
Open

Installation Issues #339

tfurmston opened this issue Oct 30, 2020 · 4 comments

Comments

@tfurmston
Copy link

tfurmston commented Oct 30, 2020

Hi,

I am trying to do a local installation of the project so that I can play around with it, but am having some issues with the installation.

There is a fair amount going on with the install, so I have decided to do it through docker. I couldn't find the image referenced in the documentation, so am writing my own. Here it is thus far:

FROM python:3.7-slim-stretch

ENV PROJECT_LOCATION /srv/reagent
RUN mkdir -p $PROJECT_LOCATION
WORKDIR $PROJECT_LOCATION

RUN apt-get update -qq \
  && apt-get install --no-install-recommends -y \
    build-essential \
    openssh-client \
    git \
    software-properties-common \
    libblas-dev \
    libffi-dev \
    liblapack-dev \
    libopenblas-base \
    libsasl2-dev \
    libssl-dev \
    libsasl2-modules \
    python3-dev \
    libpq-dev \
    ffmpeg \
    libsm6 \
    libxext6 \
    curl \
    unzip \
    zip \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*

RUN git clone https://github.com/facebookresearch/ReAgent.git $PROJECT_LOCATION
RUN python -m pip install ".[gym]"
RUN python -m pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

RUN curl -s "https://get.sdkman.io" | bash
SHELL ["/bin/bash", "-c", "source $HOME/.sdkman/bin/sdkman-init.sh"]
RUN sdk version
RUN sdk install java 8.0.272.hs-adpt
RUN sdk install scala
RUN sdk install maven
RUN sdk install spark 2.4.6
RUN apt-get update
RUN apt-get install bc

This all goes through fine and builds successfully. (I got some of the configuration from the CI as the documentation seemed a bit out of date.)

However, when I try to run through the offline RL training (batch) introduction, here, I run into some issues.

In particular, when I get to the line:

./reagent/workflow/cli.py run reagent.workflow.gym_batch_rl.timeline_operator $CONFIG

I get the following error:

Building with config: 
{'spark.app.name': 'ReAgent',
 'spark.driver.extraClassPath': '/usr/local/lib/python3.7/site-packages/reagent/../preprocessing/target/rl-preprocessing-1.1.jar',
 'spark.driver.host': '127.0.0.1',
 'spark.master': 'local[*]',
 'spark.sql.catalogImplementation': 'hive',
 'spark.sql.execution.arrow.enabled': 'true',
 'spark.sql.session.timeZone': 'UTC',
 'spark.sql.shuffle.partitions': '12',
 'spark.sql.warehouse.dir': '/srv/reagent/spark-warehouse'}
JAVA_HOME is not set
Traceback (most recent call last):
  File "./reagent/workflow/cli.py", line 89, in <module>
    reagent()
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "./reagent/workflow/cli.py", line 77, in run
    func(**config.asdict())
  File "/usr/local/lib/python3.7/site-packages/reagent/workflow/gym_batch_rl.py", line 75, in timeline_operator
    spark = get_spark_session()
  File "/usr/local/lib/python3.7/site-packages/reagent/workflow/spark_utils.py", line 62, in get_spark_session
    spark = spark.getOrCreate()
  File "/usr/local/lib/python3.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 367, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 133, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 316, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py", line 46, in launch_gateway
    return _launch_gateway(conf)
  File "/usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py", line 108, in _launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
> /usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py(108)_launch_gateway()
-> raise Exception("Java gateway process exited before sending its port number")

Am I missing something in my install? Any help would be much appreciated.

Also, generally, I think it would be helpful if you provided a docker script for people. Happy to add one once I am finished fixing mine, if it helps.

@roelbertens
Copy link
Contributor

I am having the some problem. Also the CI seems to be failing on the master branch.
Is there any update on adding a dockerfile? I agree it would be very helpful also to keep things up to date when it is included in the CI.

@stats2ml
Copy link

stats2ml commented Mar 7, 2021

I was also looking for a docker install option but couldn't find the image anywhere in the documentation.

@MisterTea
Copy link
Contributor

We don't use docker anymore since the installation is all done with pip. But you can use a stock Ubuntu image and pip install it in there

@tfurmston
Copy link
Author

Sorry, maybe I am missing something, but don't we also have non-python dependencies. For example, I thought part of the project uses Spark.

From the error message above it was my impression that the error was coming from the spark pipeline to pre-process the data. Did I misunderstand something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants