Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Calyptos: Running midonet-api on non-CLC hosts should fail validation #70

Open
dmccue opened this issue Sep 9, 2015 · 15 comments
Open

Comments

@dmccue
Copy link

dmccue commented Sep 9, 2015

Successful install logs: https://eucalyptus.atlassian.net/secure/attachment/25710/calyptos-1441818819.tgz

[root@odc-f-09 ~]# euca-describe-instances i-b6698673
RESERVATION r-689b39b6  000251786737    default
INSTANCE    i-b6698673  emi-990a5431    10.116.156.1    172.31.1.94 pending admin   0       m1.medium   2015-09-09T17:51:29.152Z    az-01               monitoring-enabled  10.116.156.1    172.31.1.94 vpc-7e0cb490    subnet-7aea6bd4 instance-store                  hvm         sg-91fc5339             x86_64
NETWORKINTERFACE    eni-33419845    subnet-7aea6bd4 vpc-7e0cb490    000251786737    in-use  172.31.1.94 euca-172-31-1-94.eucalyptus.internal    true
ATTACHMENT      0   attached    2015-09-09T17:51:29.157Z    true
ASSOCIATION 10.116.156.1        172.31.1.94
GROUP   sg-91fc5339 default
PRIVATEIPADDRESS    172.31.1.94 euca-172-31-1-94.eucalyptus.internal    primary
TAG instance    i-b6698673  Name    test1
TAG instance    i-b6698673  euca:node   10.105.1.209

Useful reference: http://jeevanullas.in/blog/aws-vpc-eucalyptus-midonet-2/

This is more than likely VPC related, debugging will be required to see if the configuration is set, possible missing routes as this is a non-BGP setup

@viglesiasce
Copy link
Collaborator

@dmccue this is caused by the midonet-api not being colocated with the CLC/eucanetd. In 4.2 that is a requirement. We need to add a validator for this for sure.

@viglesiasce viglesiasce changed the title Calyptos: After successful install, instances hang at pending - VPCMIDO Calyptos: Running midonet-api on non-CLC hosts should fail validation Sep 9, 2015
@dmccue
Copy link
Author

dmccue commented Sep 10, 2015

This is what happens when the midonet-api is set to the CLC IP:
https://eucalyptus.atlassian.net/secure/attachment/25720/calyptos-1441889948.tgz

midokura.midonet-api-url changed from http://10.105.10.70:8080/midonet-api to http://10.105.10.51:8080/midonet-api

[10.105.10.70] out:   * execute[Create TunnelZone] action run[2015-09-10T05:55:41-07:00] INFO: Processing execute[Create TunnelZone] action run (midokura::create-first-resources line 8)
[10.105.10.70] out: [2015-09-10T05:55:41-07:00] INFO: Retrying execution of execute[Create TunnelZone], 19 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:55:52-07:00] INFO: Retrying execution of execute[Create TunnelZone], 18 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:02-07:00] INFO: Retrying execution of execute[Create TunnelZone], 17 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:12-07:00] INFO: Retrying execution of execute[Create TunnelZone], 16 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:23-07:00] INFO: Retrying execution of execute[Create TunnelZone], 15 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:33-07:00] INFO: Retrying execution of execute[Create TunnelZone], 14 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:43-07:00] INFO: Retrying execution of execute[Create TunnelZone], 13 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:54-07:00] INFO: Retrying execution of execute[Create TunnelZone], 12 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:04-07:00] INFO: Retrying execution of execute[Create TunnelZone], 11 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:14-07:00] INFO: Retrying execution of execute[Create TunnelZone], 10 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:25-07:00] INFO: Retrying execution of execute[Create TunnelZone], 9 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:35-07:00] INFO: Retrying execution of execute[Create TunnelZone], 8 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:45-07:00] INFO: Retrying execution of execute[Create TunnelZone], 7 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:56-07:00] INFO: Retrying execution of execute[Create TunnelZone], 6 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:06-07:00] INFO: Retrying execution of execute[Create TunnelZone], 5 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:16-07:00] INFO: Retrying execution of execute[Create TunnelZone], 4 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:27-07:00] INFO: Retrying execution of execute[Create TunnelZone], 3 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:37-07:00] INFO: Retrying execution of execute[Create TunnelZone], 2 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:47-07:00] INFO: Retrying execution of execute[Create TunnelZone], 1 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:58-07:00] INFO: Retrying execution of execute[Create TunnelZone], 0 attempt(s) left

@dmccue
Copy link
Author

dmccue commented Sep 11, 2015

Have unset midokura.midonet-api-url to default to http://localhost:8080/midonet-api which seems to have worked...

However there's now a fatal issue with cassandra:

ERROR 06:57:52,308 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
    at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1296)
    at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:457)
    at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:671)
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:623)
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:515)
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:424)
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)
java.lang.RuntimeException: Unable to gossip with any seeds
    at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1296)
    at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:457)
    at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:671)
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:623)
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:515)
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:424)
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)
Exception encountered during startup: Unable to gossip with any seeds

Which leads to:
https://stackoverflow.com/questions/20690987/apache-cassandra-unable-to-gossip-with-any-seeds

@dmccue
Copy link
Author

dmccue commented Sep 11, 2015

[root@odc-f-28 ~]# grep 'listen_address\|broadcast_address' /etc/cassandra/conf/cassandra.yaml
listen_address: odc-f-28.prc.eucalyptus-systems.com
# Leaving this blank will set it to the same value as listen_address
# broadcast_address: 1.2.3.4
#    Uses public IPs as broadcast_address to allow cross-region

Reason why this doesn't work is because odc-f-28.prc.eucalyptus-systems.com is resolving to the public interface and not the private interface. What is the best way to address this, change the /etc/hosts file or modify cassandra.yaml listen_address to use the private interface IP?

This would need to be altered from node['fqdn'] to something that allows overriding to private ip address
https://github.com/eucalyptus/midokura-cookbook/blob/master/recipes/cassandra.rb#L13

execute "CASSANDRA: set listening address" do
 command "sed -i -e 's/localhost/#{node['fqdn']}/g' /etc/cassandra/conf/cassandra.yaml"
end

@viglesiasce
Copy link
Collaborator

@dmccue glad to hear the localhost change fixed up the mido side.

Im going to move the cassandra issue to a different issue so we dont cross wires for this one.

@viglesiasce
Copy link
Collaborator

I opened #72 to continue the cassandra work

@viglesiasce
Copy link
Collaborator

Looking more at that cloud, it looks like instances are now going to running but not able to get their addresses via DHCP. Need to investigate that further.

@viglesiasce
Copy link
Collaborator

Eucanetd is not running on the CLC which another requirement and needs a validator. After that was cleared out we had issues because eucanetd was not able to figure out which mido hosts were running the instances. This is caused from the lack of a reverse mapping of the nodes hostnames to their registered IP addresses (both in mido and euca). To work around this I added the following to the CLC/eucanetd /etc/hosts file and instances then began to get their IPs properly:

10.105.10.51 odc-f-09.prc.eucalyptus-systems.com
10.105.10.73 odc-f-31.prc.eucalyptus-systems.com
10.105.10.78 odc-f-36.prc.eucalyptus-systems.com
10.105.1.209 odc-d-30.prc.eucalyptus-systems.com

Instances are now booting and getting their IP addresses/metadata as expected.

The diff for the env file is as follows:

[root@odc-f-09 calyptos-deploy]# diff environment.yml environment-vic.yml
46,47c46
<     # Mappings for only NCs and CCs
<       odc-f-28.prc.eucalyptus-systems.com: 10.105.10.70
---
>     # Mappings for only NCs and CLC
54a54,55
>   - &EUCANETD_HOST
>     odc-f-09.prc.eucalyptus-systems.com
83c84
<           EucanetdHost: *MIDO_GATEWAY_HOST
---
>           EucanetdHost: *EUCANETD_HOST
[root@odc-f-09 calyptos-deploy]#

@dmccue
Copy link
Author

dmccue commented Sep 14, 2015

Made those changes: https://eucalyptus.atlassian.net/secure/attachment/25801/calyptos-1442240272.tgz
Not able to connect to the midonet-api, will investigate

@dmccue
Copy link
Author

dmccue commented Sep 14, 2015

https://eucalyptus.atlassian.net/secure/attachment/25802/calyptos-1442248064.tgz

(on clc)
[root@odc-f-09 ~]# netstat -antp | grep 8080
tcp 0 0 :::8080 :::* LISTEN 26516/java
[root@odc-f-09 ~]# midonet-cli --midonet-url=http://localhost:8080/midonet-api -A -e add tunnel-zone name mido-tz type gre
The API server failed to respond normally. The network DB is possibly down. Bye.
[root@odc-f-09 ~]# tail -1 /var/log/eucalyptus/eucanetd.log
2015-09-14 09:24:53 FATAL 000022010 mido_check_state | midonet-api is not reachable after 120 retries: eucanetd shutting down

Obviously the midonet-api is installed on the CLC (10.105.10.51), however the REST api is showing 404 for all calls. Likely to be a tomcat configuration issue or backend issue whereby tomcat can't communicate with zookeeper

@viglesiasce
Copy link
Collaborator

@dmccue looks like the midonet-api is pointing at 10.105.10.70 but zookeeper is running on 10.104.10.5. Can you rerun with 10.105.10.70 as your zookeeper host. The cookbook is currently only installing zookeeper on the midonet-api host.

@viglesiasce
Copy link
Collaborator

Sorry @dmccue i meant rerunning with 10.105.10.51

@dmccue
Copy link
Author

dmccue commented Sep 14, 2015

@viglesiasce That has now built with exit code 0, there remains ingress connectivity issues over private and public IPs

Validators required:

  1. midokura.zookeepers contains a minimum of one array item pointing to eucalyptus.topology.clc-1
  2. midokura.midonet-api-url contains ip address of eucalyptus.topology.clc-1
  3. eucalyptus.network.config-json.Mido.EucanetdHost contains hostname of eucalyptus.topology.clc-1

@viglesiasce
Copy link
Collaborator

Thanks @dmccue! You saved me the work of going back through this journey to figure out the right validators 👍

@dmccue
Copy link
Author

dmccue commented Sep 21, 2015

Have switched over to using midonet on clc and specifying localhost as midonet api endpoint

@dmccue dmccue closed this as completed Sep 21, 2015
@dmccue dmccue reopened this Sep 21, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants