Deployment example doesn't work #37

mdavis-xyz · 2019-01-17T02:30:17Z

I am trying to follow this example on this wiki page.

It fails at the first ansible playbook.

TASK [Gathering Facts] ****************************************************************************************************************
fatal: [10.87.64.23]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 10.87.64.23 port 22: Connection timed out\r\n", "unreachable": true}
        to retry, use: --limit @/home/opentelco/contrail-ansible-deployer/playbooks/provision_instances.retry

Am I supposed to change on of the IP addresses in that yaml from the previous step? The guide says that the scripts setup openstack for me on a single bare metal server.
I interpreted that as meaning that I don't have to spin up the VMs myself, therefore there are no other IP addresses I could possibly put into that yaml.

How is this wiki example supposed to be run?

The text was updated successfully, but these errors were encountered:

Andrey-mp · 2019-01-17T05:02:15Z

in the instances.yaml you need to specify real IP-s of servers/services that will be used for your deployment.

10.87.64.23 is an IP of KVM host that should be accessible by ssh for ansible's code.

mdavis-xyz · 2019-01-17T05:38:59Z

That's what I initially though. However the text of the wiki says the IP of the server is 172.16.10.1. So why doesn't the yaml say 172.16.10.1?

I tried changing the IP address to localhost. Now I get a permission error because Ansible can't save the image to /var/lib/libvirt/images/kvm103.qcow2. That's strange because I already changed the file system permissions so my user can write to that directory.

When I run ansible-playbook with sudo I get a different error:

Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

Are the ssh credentials in the provider->kvm section supposed to be how I log into my machine, or is that what the playbook uses to set up access to the VMs it creates?

mdavis-xyz · 2019-01-17T06:43:18Z

Ah, yes. Those ssh credentials are supposed to be for the host you're on.

Once I changed the SSH credentials, I then got a virsh error because the "default" virsh network bridge did not exist.
I tried to create and activate it, but that didn't go so well. Now vish net-list --all shows something different to sudo virsh net-list --all. They both show the default bridge, with different states.

Anyway, after setting up 2 default bridges, I now get a virsh error:

ERROR Cannot get interface MTU on 'virbr0': No such device
Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
virsh --connect qemu:///system start kvm103
otherwise, please restart your installation.

Does Ansible know about virsh bridges? Is this playbook supposed to set up the bridge for me?

Andrey-mp · 2019-01-17T08:50:32Z

As I know ansible doesn't setup networking with virsh. it just can run machines for you.

different results in running virsh caused by different connection string to qemu.

mdavis-xyz · 2019-01-17T22:56:30Z

Ok, well I think the wiki should be modified to specify what virsh commands are needed as prep.

It says:

You don't need to create the VMs. Instead, the deployment scripts will do it for you

and also includes basic setup like installing epel, pip and git. So I think it is reasonable for a reader to assume that they don't need to do any VM networking setup other than what the wiki tells them to do.

I'm happy to modify the wiki myself, once I figure out what the solution is.

I tried using the steps here, making sure that I run all virsh commands with sudo.

sudo virsh net-define /usr/share/libvirt/networks/default.xml
sudo virsh net-autostart default
sudo virsh net-start default

But I still get the same error when I run the playbook.

ERROR Network not found: no network with matching name 'default'

Even though sudo virsh net-list --all clearly shows an active network named default.
virsh net-list --all (without sudo) shows no networks.

So then I ran:

sudo virsh net-undefine default
sudo virsh net-destroy default
virsh net-define /usr/share/libvirt/networks/default.xml # no sudo
virsh net-autostart default # no sudo
virsh net-start default # no sudo

That last command fails.

error: Failed to start network default
error: error creating bridge interface virbr0: Operation not permitted

I cannot simply add sudo to get those permissions, since sudo virsh and virsh operate on different sets of network objects.

While running the playbook, I noticed two errors which appear but are ignored by Ansible. They both say:

TASK [kvm : get container vm status] **************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: libvirtError: authentication unavailable: no polkit agent available to authenticate action 'org.libvirt.unix.manage'
fatal: [localhost]: FAILED! => {"changed": false, "msg": "authentication unavailable: no polkit agent available to authenticate action 'org.libvirt.unix.manage'"}

Is that something which can be ignored? If yes, then I'll go change the wiki page to mention that this error message is ok.

mdavis-xyz · 2019-01-18T04:33:09Z

Ok, it turns out I needed to add LIBVIRT_DEFAULT_URI=qemu:///system to my environment, so that virsh and sudo virsh control the same things.

That fixed the bridge error.

Then I got another error about permissions. I discovered that I must run the playbooks with sudo. (Confusingly having keys, username root and hostname localhost in instances.yaml doesn't mean that Ansible will ssh to root@localhost. inventory/hosts tells Ansible to connect to a shell directly, as the current user.) I modified the wiki to add a link which talks about instances.yaml.

When I run with sudo, that step runs sucessfully, and then I got a different error. I fixed that with PR #38 .

Now I get a new error:

TASK [kvm : add container vm kvm103 to inventory when private key is defined] *********************************************************
changed: [localhost]
ERROR! an undefined variable was found when attempting to template the vars_files item '{{ hostvars['localhost'].config_file }}'

The error appears to have been in '/home/opentelco/contrail-ansible-deployer/playbooks/provision_instances.yml': line 39, column 5, but
 may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  vars_files:
  - "{{ hostvars['localhost'].config_file }}"
    ^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes.  Always quote template expression brackets when they
start a value. For instance:

    with_items:
      - {{ foo }}

Should be written as:

    with_items:
      - "{{ foo }}"

As you can probably tell, I'm new to Ansible.
I find this error message confusing because the task after the last successful task is not the one described by the error message.

The last successful task was add container vm kvm103 to inventory when private key is defined in playbooks/roles/kvm/tasks/build_and_start_container_hosts.yml
The next task should therefore be Wait for the VM be available in the same file (which has no 'when' statement, so should therefore always run)
The task pointed to by that error message is Provision KVM instances in playbooks/provision_instances.yml

Is there a reason why the last task in provision_instances.yaml has vars_files equal to hostvars['localhost'].config_file when all other tasks in that playbook have just config_file?

neoliupassccie · 2019-09-19T15:47:27Z

Is this problem turned out?

mdavis-xyz · 2019-09-20T00:15:50Z

No, this problem still exists because PR #38 was rejected.

I'm not working on Contrail stuff any more. (I changed teams at work.) So unfortunately I don't have the time required to climb the steep learning curve necessary to figure out how to resubmit #38 through gerrit.

mdavis-xyz mentioned this issue Jan 18, 2019

"is defined" added #38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment example doesn't work #37

Deployment example doesn't work #37

mdavis-xyz commented Jan 17, 2019

Andrey-mp commented Jan 17, 2019

mdavis-xyz commented Jan 17, 2019

mdavis-xyz commented Jan 17, 2019

Andrey-mp commented Jan 17, 2019

mdavis-xyz commented Jan 17, 2019 •

edited

Loading

mdavis-xyz commented Jan 18, 2019

neoliupassccie commented Sep 19, 2019

mdavis-xyz commented Sep 20, 2019 •

edited

Loading

Deployment example doesn't work #37

Deployment example doesn't work #37

Comments

mdavis-xyz commented Jan 17, 2019

Andrey-mp commented Jan 17, 2019

mdavis-xyz commented Jan 17, 2019

mdavis-xyz commented Jan 17, 2019

Andrey-mp commented Jan 17, 2019

mdavis-xyz commented Jan 17, 2019 • edited Loading

mdavis-xyz commented Jan 18, 2019

neoliupassccie commented Sep 19, 2019

mdavis-xyz commented Sep 20, 2019 • edited Loading

mdavis-xyz commented Jan 17, 2019 •

edited

Loading

mdavis-xyz commented Sep 20, 2019 •

edited

Loading