Replies: 1 comment
-
So finally figured out that the JOB_NAME needs to reference an existing glue job. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am using the code fragment below and the job.init call is failing with an error while getting security configuration. (see log below). I'm not suprised by the error since I can't find documentation as to the args that need to be supplied. I did find https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html but the essential parameters are missing: (IAM_ROLE, connections, etc.) . When I look at the AWS Glue Studio job editor there is a lot of Job context established in the "Job Details" tab. I expect that I need to pass similar information to job.init.
I'm trying to do local debugging of glue jobs as suggested by https://aws.amazon.com/blogs/big-data/developing-aws-glue-etl-jobs-locally-using-a-container/
Any help would be appreciated.
Code:
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
LOG:
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.amazonaws.services.glue.util.Job.init.
: java.lang.RuntimeException: error while getting security configuration for None
at com.amazonaws.services.glue.util.SecurityConfig$.get(SecurityConfig.scala:29)
at com.amazonaws.services.glue.util.Job$.init(Job.scala:67)
at com.amazonaws.services.glue.util.Job.init(Job.scala)
I should say I'm trying to use the docker method described in https://aws.amazon.com/blogs/big-data/developing-aws-glue-etl-jobs-locally-using-a-container/
I have gotten a bit futher, better documentation would be most helpful. I had to actually make the glue job from the glue studio and provide the job name as a pycharm program argument using: --JOB_NAME=dynamoToRDS-Carriers-dci.
I also provided the credentials as environment variables.
Now I'm able to get further but when I get to my transform statement:
//Convert glue DynamicFrame to regular pySpark dataframe
df = dfc.select(list(dfc.keys())[0]).toDF()
It fails with the error below. Has anyone actually gotten this to work? I suspect I need a glue debug endpoint.
It fails with
22/01/11 12:51:29 WARN JobBookmark$: Disabling Job Bookmarks for developer environment
22/01/11 12:51:31 WARN DataCatalogWrapper: Cell Filtering is not supported in local development.
ANTLR Tool version 4.3 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.7.2 used for parser compilation does not match the current runtime version 4.8ANTLR Tool version 4.3 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.7.2 used for parser compilation does not match the current runtime version 4.822/01/11 12:51:46 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties
22/01/11 12:51:46 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: Could not lookup table Carriers in DynamoDB.
at org.apache.hadoop.dynamodb.DynamoDBClient.describeTable(DynamoDBClient.java:143)
at org.apache.hadoop.dynamodb.read.ReadIopsCalculator.getThroughput(ReadIopsCalculator.java:67)
at org.apache.hadoop.dynamodb.read.ReadIopsCalculator.calculateTargetIops(ReadIopsCalculator.java:58)
at org.apache.hadoop.dynamodb.read.AbstractDynamoDBRecordReader.initReadManager(AbstractDynamoDBRecordReader.java:152)
at org.apache.hadoop.dynamodb.read.AbstractDynamoDBRecordReader.(AbstractDynamoDBRecordReader.java:84)
at org.apache.hadoop.dynamodb.read.DefaultDynamoDBRecordReader.(DefaultDynamoDBRecordReader.java:24)
at org.apache.hadoop.dynamodb.read.DynamoDBInputFormat.getRecordReader(DynamoDBInputFormat.java:32)
at com.amazonaws.services.glue.connections.DynamoConnection.getReader(DynamoConnection.scala:136)
at com.amazonaws.services.glue.DynamicRecordRDD.compute(DataSource.scala:610)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The security token included in the request is invalid. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: UnrecognizedClientException; Request ID: 6VF395GT2JUI0HE7O8538B4V3FVV4KQNSO5AEMVJF66Q9ASUAAJG; Proxy: null)
at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.handleException(DynamoDBFibonacciRetryer.java:108)
at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.runWithRetry(DynamoDBFibonacciRetryer.java:83)
at org.apache.hadoop.dynamodb.DynamoDBClient.describeTable(DynamoDBClient.java:132)
... 23 more
Caused by: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The security token included in the request is invalid. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: UnrecognizedClientException; Request ID: 6VF395GT2JUI0HE7O8538B4V3FVV4KQNSO5AEMVJF66Q9ASUAAJG; Proxy: null)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:6214)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:6181)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeDescribeTable(AmazonDynamoDBClient.java:2247)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.describeTable(AmazonDynamoDBClient.java:2211)
at org.apache.hadoop.dynamodb.DynamoDBClient$1.call(DynamoDBClient.java:136)
at org.apache.hadoop.dynamodb.DynamoDBClient$1.call(DynamoDBClient.java:133)
at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.runWithRetry(DynamoDBFibonacciRetryer.java:80)
... 24 more
22/01/11 12:51:46 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (defa526a1b86 executor driver): java.lang.RuntimeException: Could not lookup table Carriers in DynamoDB.
Beta Was this translation helpful? Give feedback.
All reactions