Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run in Spark on YARN? #133

Open
SungMinHong opened this issue Oct 23, 2017 · 3 comments
Open

How to run in Spark on YARN? #133

SungMinHong opened this issue Oct 23, 2017 · 3 comments

Comments

@SungMinHong
Copy link

SungMinHong commented Oct 23, 2017

Hi, Everyone. Tensorframe is interesting to me. So I want to test Tensorframe in my Spark's cluster. But I have wonder.

  1. I wondering that Tensorframe just needs to install to master node and don't needs to install any worker.
  2. This command is okay that "pyspark --packages databricks:tensorframes:0.2.9-s_2.11"
    But I want to use Spark's cluster. So, I want to use this command "spark-submit --packages databricks:tensorframes:0.2.9-s_2.11". But this command has the error like this:
Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource.
       at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
       at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
       at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:274)
       at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
       at org.apache.spark.launcher.Main.main(Main.java:86)

Someone have this error?

  • I have Spark's cluster that has 3 nodes (YARN on a Hadoop cluster, 1 master, 2 worker)
  • I used Virtualenv for Tensorflow(CPU version)
  • I install panda-0.20.3

I would be grateful if you could please answer my question.

@SungMinHong SungMinHong changed the title How many times install(include) Tesorframe in my Spark cluster? How can I install(include) Tesorframe in my Spark cluster? Oct 23, 2017
@SungMinHong SungMinHong changed the title How can I install(include) Tesorframe in my Spark cluster? How can I install(include) Tesorframe in my Spark's cluster? Oct 23, 2017
@SungMinHong
Copy link
Author

SungMinHong commented Oct 23, 2017

I solved that problem. I just forget python script file name in the command. : (

"spark-submit --packages databricks:tensorframes:0.2.9-s_2.11 --master yarn --deploy-mode cluster <python_script_name>"

And I also get a new problem that workers(data node) can't import tensorflow maybe because of my all cluster's node use Virtualenv to install tensorflow.

Someone know solution about that problem? If you know the answer, let me know. : )

@SungMinHong SungMinHong changed the title How can I install(include) Tesorframe in my Spark's cluster? How to run Tesorframe in Spark cluster(YARN on a Hadoop cluster)? Oct 23, 2017
@SungMinHong SungMinHong changed the title How to run Tesorframe in Spark cluster(YARN on a Hadoop cluster)? How to run in Spark cluster(YARN on a Hadoop cluster)? Oct 23, 2017
@SungMinHong
Copy link
Author

SungMinHong commented Oct 24, 2017

Spark YARN cluster is not serving Virtulenv mode until now.

So I reinstalled tensorflow using pip.
And I testing tensorframe in my single local node
like this.

$ spark-submit --packages databricks:tensorframes:0.2.9-s_2.11 --master local --deploy-mode client test_tfs.py > output

test_tfs.py

import tensorflow as tf
import tensorframes as tfs
from pyspark.sql import Row
from pyspark.sql.functions import *
from pyspark.sql import SQLContext
from pyspark import SparkContext

sc = SparkContext("local", "tfs single node mode test")
sc.setLogLevel("ERROR")
sqlContext = SQLContext(sc)

#tensorframe's example
data = [Row(x=float(x)) for x in range(5)]
df = sqlContext.createDataFrame(data)
with tf.Graph().as_default() as g:
    # The placeholder that corresponds to column 'x'
    x = tf.placeholder(tf.double, shape=[None], name="x")
    # The output that adds 3 to x
    z = tf.add(x, 3, name='z')
    # The resulting dataframe
    df2 = tfs.map_blocks(z, df)

df2.show()

But, I meet problem about AttributeError

Traceback (most recent call last):
  File "/home/hong/test_tfs.py", line 19, in <module>
    df2 = tfs.map_blocks(z, df)
  File "/home/hong/.ivy2/jars/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 312, in map_blocks
  File "/home/hong/.ivy2/jars/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 146, in _map
  File "/home/hong/.ivy2/jars/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 101, in _get_graph
  File "/home/hong/.ivy2/jars/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 43, in _initialize_variables
AttributeError: 'module' object has no attribute 'global_variables'

If someone have this error like me, let me know plz.

=> It's caused by tensorflow's old version. sol: upadate tensorflow

@SungMinHong SungMinHong changed the title How to run in Spark cluster(YARN on a Hadoop cluster)? How to run in Spark on YARN? Nov 3, 2017
@UlionTse
Copy link

UlionTse commented Jun 6, 2019

Hi, Everyone. Tensorframe is interesting to me. So I want to test Tensorframe in my Spark's cluster. But I have wonder.

  1. I wondering that Tensorframe just needs to install to master node and don't needs to install any worker.
  2. This command is okay that "pyspark --packages databricks:tensorframes:0.2.9-s_2.11"
    But I want to use Spark's cluster. So, I want to use this command "spark-submit --packages databricks:tensorframes:0.2.9-s_2.11". But this command has the error like this:
Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource.
       at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
       at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
       at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:274)
       at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
       at org.apache.spark.launcher.Main.main(Main.java:86)

Someone have this error?

  • I have Spark's cluster that has 3 nodes (YARN on a Hadoop cluster, 1 master, 2 worker)
  • I used Virtualenv for Tensorflow(CPU version)
  • I install panda-0.20.3

I would be grateful if you could please answer my question.

@SungMinHong Same question!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants