Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added enable flag for chef and set it to true by default #20

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
31 changes: 29 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ You will need to create a CloudWatch Events rule that's configured to send termi
1. Goto CloudWatch Events in the AWS Console
1. Click *Create rule*
1. Select event source of *EC2 instance state change notification*
1. Select specific state of *Terminated*
1. Select specific state of *Terminated* (optional will ignore events that are not *Terminated*)
1. Add a target of *SQS Queue* and set queue to the cloudwatch-events queue created in step one
1. Give the rule a name/description and click *Create rule*

Expand Down Expand Up @@ -87,9 +87,36 @@ To enable webhooks, add a `:webhooks:` section to the config:
Chat notifications can be sent when the webhook successfully executes. See
config.yml.sample for an example of the config.

### Client Lookup

By default aws-cleaner assumes that the lookup will be done against chef as this was one of its core intentions. This can be configured for sensu via config:
```
:general:
:lookup_type: 'sensu'
```

### States

By default aws-cleaner assumes that the only state you will be looking for is `terminated` you can change this via config:
```
:general:
:states:
- 'terminated'
```

### Chef

If switching the client lookup to sensu you will probably want to disable chef via the following config:
```
:chef:
:enable: false
```

### Sensu

By default aws-cleaner assumes that removing from sensu is desired as this was one of its core intentions. To allow people to leverage this without sensu you can disable it via config:
By default aws-cleaner assumes that the lookup will be done against chef as this was one of its core intentions. This can be configured for sensu via config:

You can disable sensu via config as well to skip removing the client from sensu:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not indicate that you can disable removing from chef.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the read me a little, but the section of chef is a little further down in the read me, but I have clarified the README a bit in the next commit.

```
:sensu:
:enable: false
Expand Down
80 changes: 54 additions & 26 deletions bin/aws_cleaner.rb
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def logger(config)
logger
end

def webhook(id, instance_id)
def webhook(instance_id)
if @config[:webhooks]
@config[:webhooks].each do |hook, hook_config|
if AwsCleaner::Webhooks.fire_webhook(hook_config, @config, instance_id)
Expand All @@ -72,31 +72,26 @@ def webhook(id, instance_id)
@logger.info("Failed to run webhook #{hook}")
end
end
AwsCleaner.new.delete_message(id, @config)
end
end

def chef(id, instance_id, chef_node)
if chef_node
if AwsCleaner::Chef.remove_from_chef(chef_node, @chef_client, @config)
@logger.info("Removed #{chef_node} from Chef")
AwsCleaner.new.delete_message(id, @config)
end
def chef(instance_id, chef_node)
return unless @config[:chef][:enable]

if AwsCleaner::Chef.remove_from_chef(chef_node, @chef_client, @config)
@logger.info("Removed #{chef_node} from Chef")
else
@logger.info("Instance #{instance_id} does not exist in Chef, deleting message")
AwsCleaner.new.delete_message(id, @config)
@logger.info("Instance #{instance_id} does not exist in Chef")
end
end

def sensu(id, instance_id, chef_node)
def sensu(instance_id, sensu_node)
return unless @config[:sensu][:enable]
if AwsCleaner::Sensu.in_sensu?(chef_node, @config)
if AwsCleaner::Sensu.remove_from_sensu(chef_node, @config)
@logger.info("Removed #{chef_node} from Sensu")
else
@logger.info("Instance #{instance_id} does not exist in Sensu, deleting message")
end
AwsCleaner.new.delete_message(id, @config)

if AwsCleaner::Sensu.remove_from_sensu(sensu_node, @config)
@logger.info("Removed #{sensu_node} from Sensu")
else
@logger.info("Instance #{instance_id} does not exist in Sensu")
end
end

Expand All @@ -115,12 +110,30 @@ def closelog(message)
@sqs_client = AwsCleaner::SQS.client(@config)
@chef_client = AwsCleaner::Chef.client(@config)

# to provide backwards compatibility as this key did not exist previously
# to provide backwards compatibility as these keys did not exist previously
@config[:chef][:enable] = if @config[:chef][:enable].nil?
true
else
@config[:chef][:enable]
end
@config[:sensu][:enable] = if @config[:sensu][:enable].nil?
true
else
@config[:sensu][:enable]
end
@config[:general] = if @config[:general].nil?
''
end
@config[:general][:lookup_type] = if @config[:general][:lookup_type].nil?
'chef'
else
@config[:general][:lookup_type]
end
@config[:general][:states] = if @config[:general][:states].nil?
['terminated']
else
@config[:general][:states]
end

# main loop
loop do
Expand All @@ -144,16 +157,31 @@ def closelog(message)
next
end

instance_id = AwsCleaner.new.process_message(body)
instance_id = AwsCleaner.new.process_message(body, @config)

if instance_id
chef_node = AwsCleaner::Chef.get_chef_node_name(instance_id, @config)
webhook(id, instance_id)
chef(id, instance_id, chef_node)
sensu(id, instance_id, chef_node)
else
@logger.info('Message not relevant, deleting')
node = case @config[:general][:lookup_type]
when 'sensu'
AwsCleaner::Sensu.get_sensu_node_name(instance_id, @config)
else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be when 'chef'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this should be when chef, since the variable set to "chef" plus any invalid value would default to use chef.

AwsCleaner::Chef.get_chef_node_name(instance_id, @config)
end

if node
@logger.info("Instance #{instance_id} found, node name is #{node}.")

webhook(instance_id)
chef(instance_id, node)
sensu(instance_id, node)

@logger.info("All done with #{node}, deleting message.")
else
@logger.info("Instance #{instance_id} not found, deleting message.")
end

AwsCleaner.new.delete_message(id, @config)
else
@logger.info('Message not relevant, ignoring...')
end
end

Expand Down
5 changes: 5 additions & 0 deletions config.yml.sample
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
:general:
:lookup_type: 'chef'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking this variable should node_name_provider or something like that to be clear what exactly is being looked up.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the last commit.

:states:
- 'terminated'
:aws:
# only region is required when using iam which is recommended
:access_key_id: 'AWS Access Key'
Expand All @@ -12,6 +16,7 @@
:url: 'https://chef.example.com/organizations/example'
:client: 'somebody'
:key: '/path/to/client.pem'
:enable: true
:hipchat:
:api_token: 'Hipchat API token'
:room: 'Notifications'
Expand Down
41 changes: 36 additions & 5 deletions lib/aws-cleaner.rb
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,15 @@ def self.client(config)
# call the Chef API to get the node name of the instance
def self.get_chef_node_name(instance_id, config)
chef = client(config)
results = chef.search.query(:node, "ec2_instance_id:#{instance_id} OR chef_provisioning_reference_server_id:#{instance_id}")
results = chef.search.query(:node, "ec2_instance_id:#{instance_id[2..-1]} OR chef_provisioning_reference_server_id:#{instance_id[2..-1]}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a chef server currently to test this against can you show me the output of this or explain this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes the "i-" part from the instance ID in case you don't have that. I do the same on the sensu side since we don't have that part in our client names.

I tested it on a quick chef install and it seemed to work but don't have a full chef install to test against. I'm ok omitting this change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to address this separately and will be starting a new job next week so I will have a chef and sensu server to test with.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be removed since it's going to break lookups for the instance. Ohai populates that attribute with the actual instance-id and it will always include i-.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted that, I figured since it was the "search" api that was being called it would also search partial strings it seemed to work on my quick test with chef but since I don't use it I'm fine with reverting it.

return false if results.rows.empty?
results.rows.first['name']
end

# call the Chef API to get the FQDN of the instance
def self.get_chef_fqdn(instance_id, config)
chef = client(config)
results = chef.search.query(:node, "ec2_instance_id:#{instance_id} OR chef_provisioning_reference_server_id:#{instance_id}")
results = chef.search.query(:node, "ec2_instance_id:#{instance_id[2..-1]} OR chef_provisioning_reference_server_id:#{instance_id[2..-1]}")
return false if results.rows.empty?
results.rows.first['automatic']['fqdn']
end
Expand All @@ -58,6 +58,37 @@ def self.remove_from_chef(node_name, chef, config)
end

module Sensu
# call the Chef API to get the node name of the instance
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment to Sensu.

def self.get_sensu_node_name(instance_id, config)
offset = 0
returned_clients = 100

while returned_clients == 100
response = RestClient::Request.execute(
url: "#{config[:sensu][:url]}/clients?limit=100&offset=#{offset}",
method: :get,
timeout: 5,
open_timeout: 5
)

clients = JSON.parse(response.body)
client = clients.find { |child| child['name'].include?(instance_id[2..-1]) }
client_name = if client.nil?
false
else
client['name']
end

break if client_name

returned_clients = clients.count
offset += 100
end

return false unless client_name
client_name
end

# check if the node exists in Sensu
def self.in_sensu?(node_name, config)
RestClient::Request.execute(
Expand Down Expand Up @@ -102,9 +133,9 @@ def parse(body)
end

# return the instance_id of the terminated instance
def process_message(message_body)
return false if message_body['detail']['instance-id'].nil? &&
message_body['detail']['state'] != 'terminated'
def process_message(message_body, config)
return false if message_body['detail']['instance-id'].nil? ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use the && to keep the existing behavior.

Copy link
Author

@ghost ghost Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is my only problem, since we use one queue to monitor the "running" state notifications with aws-watcher and prefer not to add multiple queues since it's more to manage. We already have a queue/process for every AWS region.

Would it be ok to add a config parameter to delete irelevant messages that is set to true by default, that way by default the existing behavior is kept but it can be overwritten. I was thinking I can add that either in the main loop or the process_message function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome to see others using that project and the use case makes sense to me.

I'd still prefer it to be in 2 separate PRs though personally @eheydrick what do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have separated the changes. I will submit another pull request for the message handling.

!config[:general][:states].include?(message_body['detail']['state'])

instance_id = message_body['detail']['instance-id']
instance_id
Expand Down