Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running on AWS Lambda #56

Open
DaaS-20xx opened this issue Sep 12, 2021 · 3 comments
Open

Error when running on AWS Lambda #56

DaaS-20xx opened this issue Sep 12, 2021 · 3 comments

Comments

@DaaS-20xx
Copy link

Hi all,
everything is working fine locally, but when deployed on AWS Lambda, the simple_file.py application doesn't work.
Here the steps I executed:

  • I created a new Lambda function, assigning Role with policy permission to access Transcribe service
  • I changed the default handler to "simple_file.basic_transcribe" (simple_file.py is the application in the example folder and basic_transcribe is the main function inside).
  • Locally I created a "deploy-transcribe" folder, and within it I placed the simple_file.py; I've also modified the basic_transcribe function to include the parameters event and context (to make the simple_file.py the lambda handler). I also placed the sample file.wav in S3 bucket and updated the address in the aiofile command in the simple_file.py.
  • in the same local folder, I installed the dependencies amazon-transcribe and aiofile via "pip -t ."; I also installed manually the awscrt by downloading from here: https://pypi.org/project/awscrt/#files and unzipping in the same local folder (as explained in the issue Unable to deploy to AWS Lambda #18 here.
  • finally I zipped the content of the entire folder, uploaded in lambda, successfully deployed.

But when testing, I get an error: "[Errno 38] Function not implemented"; here below (*) the log from CloudWatch.

Any Idea of what is the reason? Am I missing anything or should the simple_file.py script be adapted/modified in some way?

Thanks a lot!!
Bye!

(*)
[ERROR] OSError: [Errno 38] Function not implemented
Traceback (most recent call last):
  File "/var/lang/lib/python3.9/importlib/init.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "", line 1030, in _gcd_import
  File "", line 1007, in _find_and_load
  File "", line 986, in _find_and_load_unlocked
  File "", line 680, in _load_unlocked
  File "", line 850, in exec_module
  File "", line 228, in _call_with_frames_removed
  File "/var/task/simple_file.py", line 53, in
    loop.run_until_complete(basic_transcribe())
  File "/var/lang/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/var/task/simple_file.py", line 50, in basic_transcribe
    await asyncio.gather(write_chunks(), handler.handle_events())
  File "/var/task/simple_file.py", line 42, in write_chunks
    async with aiofile.AIOFile('s3://transcr12092021/test/test.wav', 'rb') as afp:
  File "/var/task/aiofile/aio.py", line 124, in init
    self.__context = context or get_default_context()
  File "/var/task/aiofile/aio.py", line 307, in get_default_context
    return create_context()
  File "/var/task/aiofile/aio.py", line 294, in create_context
    context = caio.AsyncioContext(max_requests, loop=loop)
  File "/var/task/caio/asyncio_base.py", line 22, in init
    self.context = self._create_context(max_requests, **kwargs)
  File "/var/task/caio/asyncio_base.py", line 25, in _create_context
    return self.CONTEXT_CLASS(max_requests=max_requests, **kwargs)
  File "/var/task/caio/python_aio.py", line 34, in init
    self.pool = ThreadPool(pool_size)
  File "/var/lang/lib/python3.9/multiprocessing/pool.py", line 927, in init
    Pool.init(self, processes, initializer, initargs)
  File "/var/lang/lib/python3.9/multiprocessing/pool.py", line 196, in init
    self._change_notifier = self._ctx.SimpleQueue()
  File "/var/lang/lib/python3.9/multiprocessing/context.py", line 113, in SimpleQueue
    return SimpleQueue(ctx=self.get_context())
  File "/var/lang/lib/python3.9/multiprocessing/queues.py", line 342, in init
    self._rlock = ctx.Lock()
  File "/var/lang/lib/python3.9/multiprocessing/context.py", line 68, in Lock
    return Lock(ctx=self.get_context())
  File "/var/lang/lib/python3.9/multiprocessing/synchronize.py", line 162, in init
    SemLock.init(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/var/lang/lib/python3.9/multiprocessing/synchronize.py", line 57, in init
    sl = self._semlock = _multiprocessing.SemLock(

@DaaS-20xx
Copy link
Author

Figured out that, according to the details provided here: https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/, it's not possible to implement parallelism on Lambda by using the Multiprocessing.Pool and Multiprocessing.Queue (from the multiprocessing module only Pipe can be used). And from the stack trace, the multiprocessing.pool is invoked when the aiofile.AIOFile is fired.
Is there an alternative to that?

@DaaS-20xx
Copy link
Author

Hi everyone,
I just figured out how to fix the issue. As explained here: https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/ AWS Lambda does not support multiprocessing.pool, which is used in the aiofile library (exactly multiprocessing.pool.ThreadPool is invoked in the caio.python_aio.py), and this cause the error I reported above.
Now I found the same issues in this other discussion on AWS Lambda not working with snowflake connector: snowflakedb/snowflake-connector-python#287. In this case the issue has been fixed with the release of new version of the snowflake-connector-python library, specifically by simply replacing multiprocessing.pool.ThreadPool with concurrent.futures.thread.ThreadPoolExecutor. And I applied exactly the same fix manually to the python_aio.py within the caio folder, i.e.:

from concurrent.futures.thread import ThreadPoolExecutor
..
self.pool = ThreadPoolExecutor(pool_size)

Then I zipped everything and redeployed in AWS Lambda function, and issue is fixed!

Anyway I assume that this fix should be addressed officially, e.g. some specific add-on for AWS Lambda.

Thanks everybody!

@DaaS-20xx DaaS-20xx reopened this Sep 14, 2021
@DaaS-20xx
Copy link
Author

Unfortunately, moving forward, the issue showed up when running the python script in lambda during the opening of the file, and it's on the ThreadPoolExecutor which I had put in place of ThreadPool in the caio.python_aio,py:

[ERROR] AttributeError: 'ThreadPoolExecutor' object has no attribute 'apply_async'
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 61, in lambda_handler
return loop.run_until_complete(basic_transcribe())
File "/var/lang/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/var/task/lambda_function.py", line 51, in basic_transcribe
await asyncio.gather(write_chunks(), handler.handle_events())
File "/var/task/lambda_function.py", line 45, in write_chunks
async for chunk in reader:
File "/var/task/aiofile/utils.py", line 77, in anext
chunk = await self.read_chunk()
File "/var/task/aiofile/utils.py", line 64, in read_chunk
chunk = await self.file.read_bytes(
File "/var/task/aiofile/aio.py", line 220, in read_bytes
return await self.__context.read(size, self.fileno(), offset)
File "/var/task/caio/asyncio_base.py", line 47, in submit
assert self.context.submit(op) == 1, "Operation was not submitted"
File "/var/task/caio/python_aio.py", line 128, in submit
self._execute(operation)
File "/var/task/caio/python_aio.py", line 69, in _execute
self.pool.apply_async(

So it seems the issue is with opening file in async/non blocking mode in AWS Lambda, that is apparently on this piece of code:

async with aiofile.AIOFile('/tmp/test.wav', 'rb') as afp:

Any suggestion or hint on how can it be addressed? or is it not feasible in AWS Lambda?

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant