Replica Function support (Lambda@Edge) #364

iDVB · 2018-10-26T14:13:39Z

I'm still in the process of testing out what exactly the issue might be but I have a good hunch that the service does not know how to deal with edge function logs.

Likely this is because they're located in the region that the end user was closest to. (edge location)
This means that when an end user makes a request near Oregon.... then the logs for that lambda will end up in the Oregon region.

Additionally, the log prefix is different as well in that its:
/aws/lambda/us-east-1.demo-stack-proxy-dev-originRequest
vs the traditional...
/aws/lambda/demo-stack-proxy-dev-originRequest

I thought I might be able to shim the issue by running around and putting MONITORED=0 on all the replica functions... but as it appears (from the console anyway) you can't tag replica functions.

I'm not sure how you can even programmatically detect if they are replica functions.... looking into that now.

UPDATE:
You can't seem to see the function when you make a aws lambda list-functions but you can see it when you aws lambda get-function directly. However, there does not seem to be any replica identifying traits other then all replica function names start with us-east-1. regardless of what region a developer usually uses. Edge function masters always get created in us-east-1.

I'm going to start looking into your source to see if I can just add a check for us-east-1. as a prefix in the function name and exclude it for now. Later, maybe there is a way to even have these logging by adding subscription filters to the region related logs.

The text was updated successfully, but these errors were encountered:

simlu · 2018-10-26T14:32:18Z

So I take it deploying this monitor to all regions that could have edge function executions doesn't work (yet)? Targeting region functions log groups shouldn't be hard.

Why do you want to disable monitoring for those functions? Is it more broken than simply not working?

What issues are you seeing? As I said this service works great for us, but we are not currently using edge lambda function.

iDVB · 2018-10-26T14:45:33Z

No, as is, it seems to be fully broken as I think the Dummy Custom Resources that initially calls those subscribe functions are failing and causing the deploy to fail and roll back. This is maybe to do with it trying to locate a replica's associated log group and failing?

Deploying to all regions might work but is a bit overkill in my mind, especially since the regions that are “likely to have edge functions” is all of them. That’s how they work, they get replicated to all edge locations and sit there ready anytime an end user visits from that area. So it would be impossible to predict.

I'm going to dive into the code a bit more and see if I can see exactly what the issue might be. But let me know if you have any ideas. I'm very happy to test them.

simlu · 2018-10-26T14:53:41Z

I'd love to see what the cloudformation template / resource lambda function logs look like (you need to be quick to find them before everything is torn down).

Also I'm sure @MacMcIrish has some input here in what would be interesting to look at!

iDVB · 2018-10-26T16:57:44Z

AFK, for 1hr then will look for ya.
Also, I think I can alter these refs to logs and have it check for replica log groups across regions.

lambda-monitor/src/logic/util/lambda.js

Line 28 in 04e8d5e

logGroupNamePrefix: `/aws/lambda/${f.FunctionName}`

MacMcIrish · 2018-10-26T17:13:18Z

@iDVB Failing on the post deploy hook via a dummy resource is a frustrating issue that I plan to make much easier to understand :) up until now, I was the only one suffering from this, so I'm sorry to put you through my pain. Fortunately this is only an issue on the very first deploy of the stack, as when the initial deploy fails the teardown of the stack also deletes the log group.

To examine the log group before the teardown happens:

Open the Cloudwatch Management Console in the targeted region
Deploy the stack
As the stack is deploying, refresh the page until /aws/lambda/lambda-monitor-${YOUR_STAGE_HERE}-subscribe and /aws/lambda/lambda-monitor-${YOUR_STAGE_HERE}-set-retention appear. Open the log groups immediately.
When the logs are attached to this log group, you can find the error output on why the custom resource failed to deploy.

Once you have this info, it'll be a lot easier to track down the reason that this is failing to stabilise on deploy. Looking forward to hearing back from you!

iDVB · 2018-10-26T17:31:54Z

Thanks! Actually, it seems to successfully send the error to Rollbar... so I CAN see that its due to

ResourceNotFoundException: The specified log group does not exist.

which led me to hypothesize about it not being able to fine the replica log groups.

Testing your steps to catch the logs though...
OUTPUT:

START RequestId: NOTTELLING Version: $LATEST
2018-10-26T17:36:58.132Z	NOTTELLING
    "Status": "FAILED",
    "Reason": "See the details in CloudWatch Log Stream: 2018/10/26/[$LATEST]NOTTELLING",
    "PhysicalResourceId": "2018/10/26/[$LATEST]NOTTELLING",
    "StackId": "arn:aws:cloudformation:us-east-1:NOTTELLING:stack/lambda-monitor-dev/NOTTELLING",
    "RequestId": "NOTTELLING",
    "LogicalResourceId": "TriggerLambdaSubscribeResource",
    "Data": {}
}

2018-10-26T17:36:58.370Z	NOTTELLING	Status code: 200
2018-10-26T17:36:58.370Z	NOTTELLING	Status message: OK
2018-10-26T17:36:58.607Z	NOTTELLING
{
    "errorMessage": "The specified log group does not exist.",
    "errorType": "ResourceNotFoundException",
    "stackTrace": [
        "Request.extractError (/var/task/node_modules/aws-sdk/lib/protocol/json.js:48:27)",
        "Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:106:20)",
        "Request.emit (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:78:10)",
        "Request.emit (/var/task/node_modules/aws-sdk/lib/request.js:683:14)",
        "Request.transition (/var/task/node_modules/aws-sdk/lib/request.js:22:10)",
        "AcceptorStateMachine.runTo (/var/task/node_modules/aws-sdk/lib/state_machine.js:14:12)",
        "/var/task/node_modules/aws-sdk/lib/state_machine.js:26:10",
        "Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:38:9)",
        "Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:685:12)",
        "Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:116:18)",
        "Request.emit (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:78:10)",
        "Request.emit (/var/task/node_modules/aws-sdk/lib/request.js:683:14)",
        "Request.transition (/var/task/node_modules/aws-sdk/lib/request.js:22:10)",
        "AcceptorStateMachine.runTo (/var/task/node_modules/aws-sdk/lib/state_machine.js:14:12)",
        "/var/task/node_modules/aws-sdk/lib/state_machine.js:26:10",
        "Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:38:9)",
        "Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:685:12)",
        "Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:116:18)",
        "callNextListener (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:96:12)",
        "IncomingMessage.onEnd (/var/task/node_modules/aws-sdk/lib/event_listeners.js:294:13)",
        "emitNone (events.js:111:20)",
        "IncomingMessage.emit (events.js:208:7)"
    ]
}

END RequestId: NOTTELLING
REPORT RequestId: NOTTELLING	Duration: 2351.20 ms	Billed Duration: 2400 ms Memory Size: 256 MB	Max Memory Used: 57 MB	
START RequestId: NOTTELLING Version: $LATEST
2018-10-26T17:37:13.071Z	NOTTELLING	Response body:
{
    "Status": "FAILED",
    "Reason": "See the details in CloudWatch Log Stream: 2018/10/26/[$LATEST]NOTTELLING",
    "PhysicalResourceId": "2018/10/26/[$LATEST]NOTTELLING",
    "StackId": "arn:aws:cloudformation:us-east-1:NOTTELLING:stack/lambda-monitor-dev/NOTTELLING",
    "RequestId": "NOTTELLING",
    "LogicalResourceId": "TriggerLambdaSubscribeResource",
    "Data": {}
}

2018-10-26T17:37:13.529Z	NOTTELLING	Status code: 200
2018-10-26T17:37:13.529Z	NOTTELLING	Status message: OK
2018-10-26T17:37:13.711Z	NOTTELLING
{
    "errorMessage": "The specified log group does not exist.",
    "errorType": "ResourceNotFoundException",
    "stackTrace": [
        "Request.extractError (/var/task/node_modules/aws-sdk/lib/protocol/json.js:48:27)",
        "Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:106:20)",
        "Request.emit (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:78:10)",
        "Request.emit (/var/task/node_modules/aws-sdk/lib/request.js:683:14)",
        "Request.transition (/var/task/node_modules/aws-sdk/lib/request.js:22:10)",
        "AcceptorStateMachine.runTo (/var/task/node_modules/aws-sdk/lib/state_machine.js:14:12)",
        "/var/task/node_modules/aws-sdk/lib/state_machine.js:26:10",
        "Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:38:9)",
        "Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:685:12)",
        "Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:116:18)",
        "Request.emit (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:78:10)",
        "Request.emit (/var/task/node_modules/aws-sdk/lib/request.js:683:14)",
        "Request.transition (/var/task/node_modules/aws-sdk/lib/request.js:22:10)",
        "AcceptorStateMachine.runTo (/var/task/node_modules/aws-sdk/lib/state_machine.js:14:12)",
        "/var/task/node_modules/aws-sdk/lib/state_machine.js:26:10",
        "Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:38:9)",
        "Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:685:12)",
        "Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:116:18)",
        "callNextListener (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:96:12)",
        "IncomingMessage.onEnd (/var/task/node_modules/aws-sdk/lib/event_listeners.js:294:13)",
        "emitNone (events.js:111:20)",
        "IncomingMessage.emit (events.js:208:7)"
    ]
}

iDVB · 2018-10-26T17:40:31Z

Looks like I can see pretty much the same thing in the Rollbar logs

simlu · 2018-10-26T18:24:55Z

@iDVB Looks like this is exactly the problem that you anticipated. There are two things to do here:
(1) Fix the deploy, so if the log group doesn't exist, it shouldn't subscribe. It should also look for the us-east-1.* log group and subscribe to that one if it exists.
(2) Make it so that cloudtrail triggered subscriptions happens for log groups that are prefixed with region.

Note: At that point (2) it might be desirable to also prevent subscription if already subscribed (that's #361), but that could also happen later.

iDVB · 2018-10-26T18:28:39Z

@MacMcIrish @simlu Any reason lambda.js isn't using AWS.REQUEST.promise().then() rather then creating your own promises? https://aws.amazon.com/blogs/developer/support-for-promises-in-the-sdk/

simlu · 2018-10-26T18:33:48Z

@iDVB Not really. That should probably be changed if the tests just continue working. I've already done that in other places last night.

iDVB · 2018-10-29T17:04:02Z

Working on a PR for this now...
#371

simlu · 2018-11-05T18:17:14Z

Task outline:

(1)
CloudTrail needs to be getting events from all regions
This is necessary so we can do actions based on edge lambda function log group creation.

(2)
The subscribe and set-retention need to be updated to handle "CloudTrail log group events" and "non CloudTrail log groups events" differently. For CloudTrail events only the log group that triggered the invocation should be handled. Otherwise we need to do an "expensive" check all call.
This is necessary so we do not overload the system with requests, otherwise this could get expensive and we will run into throttling.

(3)
Handle edge log group subscription correctly (when an edge log group gets created we need to check the master function for tags etc). This should be relatively easy, we just need to check if this is an edge log group and check the master function for tags.

I would start by figuring out what the lambda event for the CloudTrail log group creation event looks like (i.e. how much meta information is available).

iDVB · 2018-11-05T19:09:04Z

Going to test to see what we can see in the CT event by letting it trigger a function with
console.log(JSON.stringify(event, null, 2)) in it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replica Function support (Lambda@Edge) #364

Replica Function support (Lambda@Edge) #364

iDVB commented Oct 26, 2018 •

edited

Loading

simlu commented Oct 26, 2018

iDVB commented Oct 26, 2018

simlu commented Oct 26, 2018

iDVB commented Oct 26, 2018

MacMcIrish commented Oct 26, 2018

iDVB commented Oct 26, 2018 •

edited

Loading

iDVB commented Oct 26, 2018

simlu commented Oct 26, 2018

iDVB commented Oct 26, 2018

simlu commented Oct 26, 2018

iDVB commented Oct 29, 2018

simlu commented Nov 5, 2018

iDVB commented Nov 5, 2018

Replica Function support (Lambda@Edge) #364

Replica Function support (Lambda@Edge) #364

Comments

iDVB commented Oct 26, 2018 • edited Loading

simlu commented Oct 26, 2018

iDVB commented Oct 26, 2018

simlu commented Oct 26, 2018

iDVB commented Oct 26, 2018

MacMcIrish commented Oct 26, 2018

iDVB commented Oct 26, 2018 • edited Loading

iDVB commented Oct 26, 2018

simlu commented Oct 26, 2018

iDVB commented Oct 26, 2018

simlu commented Oct 26, 2018

iDVB commented Oct 29, 2018

simlu commented Nov 5, 2018

iDVB commented Nov 5, 2018

iDVB commented Oct 26, 2018 •

edited

Loading

iDVB commented Oct 26, 2018 •

edited

Loading