Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySpark library working in Glue Spark version 3.0 not working anymore in Glue Spark 4.0 #204

Open
lorenzo-necto opened this issue Mar 21, 2024 · 1 comment

Comments

@lorenzo-necto
Copy link

from awsglueml.transform import EntityDetector does not work anymore in Glue version 4.0 - what is the replacement. AWS docs only are covering Scala for PII detection outside of Glue studio and via libraries and not Pyspark

@awongCM
Copy link

awongCM commented Mar 30, 2024

In addition to this, I followed the setup guide here - https://github.com/awslabs/aws-glue-libs?tab=readme-ov-file#setup-guide, using the latest master branch and when I tried to run a simple glue script as below

gluesparksubmit main.py --JOB_NAME=test1

Traceback (most recent call last):
  File "/Users/andywongcheeming/Projects/poc/local-aws-glue-jobs/aws-glue-local/main.py", line 8, in <module>
    sc = SparkContext.getOrCreate()
  File "/Users/andywongcheeming/Projects/spark/python/lib/pyspark.zip/pyspark/context.py", line 491, in getOrCreate
  File "/Users/andywongcheeming/Projects/spark/python/lib/pyspark.zip/pyspark/context.py", line 197, in __init__
  File "/Users/andywongcheeming/Projects/spark/python/lib/pyspark.zip/pyspark/context.py", line 282, in _do_init
  File "/Users/andywongcheeming/Projects/spark/python/lib/pyspark.zip/pyspark/context.py", line 410, in _initialize_context
  File "/Users/andywongcheeming/Projects/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1585, in __call__
  File "/Users/andywongcheeming/Projects/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.ExceptionInInitializerError
        at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:56)
        at org.apache.spark.memory.MemoryManager$.getPageSizeBytes(MemoryManager.scala:287)
        at org.apache.spark.memory.MemoryManager.<init>(MemoryManager.scala:250)
        at org.apache.spark.memory.UnifiedMemoryManager.<init>(UnifiedMemoryManager.scala:58)
        at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:207)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:324)
        at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:198)
        at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:280)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:465)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.IllegalStateException: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long,int)
        at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:113)
        ... 21 more
Caused by: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long,int)
        at java.base/java.lang.Class.getConstructor0(Class.java:3761)
        at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2930)
        at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:71)
        ... 21 more

It's saying sc = SparkContext.getOrCreate() method does not exist. I'm confused cause that function has always been there since the last two major aws glue lib versions.

I'm running on Apple Macbook M2 Pro using Spark version spark-3.3.0-amzn-1-bin-3.3.3-amzn-0 btw.

So I'm wondering is it because I'm running on arm64 based machine that's why it's not working as expected vs non-arm64 based machines.

PS: I'm using binary distribution version of the libray, not the docker-based image version. Just want to clarify this upfront.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants