Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment example doesn't work #37

Open
mdavis-xyz opened this issue Jan 17, 2019 · 8 comments
Open

Deployment example doesn't work #37

mdavis-xyz opened this issue Jan 17, 2019 · 8 comments

Comments

@mdavis-xyz
Copy link

I am trying to follow this example on this wiki page.

It fails at the first ansible playbook.

TASK [Gathering Facts] ****************************************************************************************************************
fatal: [10.87.64.23]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 10.87.64.23 port 22: Connection timed out\r\n", "unreachable": true}
        to retry, use: --limit @/home/opentelco/contrail-ansible-deployer/playbooks/provision_instances.retry

Am I supposed to change on of the IP addresses in that yaml from the previous step? The guide says that the scripts setup openstack for me on a single bare metal server.
I interpreted that as meaning that I don't have to spin up the VMs myself, therefore there are no other IP addresses I could possibly put into that yaml.

How is this wiki example supposed to be run?

@Andrey-mp
Copy link
Member

in the instances.yaml you need to specify real IP-s of servers/services that will be used for your deployment.

10.87.64.23 is an IP of KVM host that should be accessible by ssh for ansible's code.

@mdavis-xyz
Copy link
Author

That's what I initially though. However the text of the wiki says the IP of the server is 172.16.10.1. So why doesn't the yaml say 172.16.10.1?

I tried changing the IP address to localhost. Now I get a permission error because Ansible can't save the image to /var/lib/libvirt/images/kvm103.qcow2. That's strange because I already changed the file system permissions so my user can write to that directory.

When I run ansible-playbook with sudo I get a different error:

Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

Are the ssh credentials in the provider->kvm section supposed to be how I log into my machine, or is that what the playbook uses to set up access to the VMs it creates?

@mdavis-xyz
Copy link
Author

Ah, yes. Those ssh credentials are supposed to be for the host you're on.

Once I changed the SSH credentials, I then got a virsh error because the "default" virsh network bridge did not exist.
I tried to create and activate it, but that didn't go so well. Now vish net-list --all shows something different to sudo virsh net-list --all. They both show the default bridge, with different states.

Anyway, after setting up 2 default bridges, I now get a virsh error:

ERROR Cannot get interface MTU on 'virbr0': No such device
Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
virsh --connect qemu:///system start kvm103
otherwise, please restart your installation.

Does Ansible know about virsh bridges? Is this playbook supposed to set up the bridge for me?

@Andrey-mp
Copy link
Member

As I know ansible doesn't setup networking with virsh. it just can run machines for you.

different results in running virsh caused by different connection string to qemu.

@mdavis-xyz
Copy link
Author

mdavis-xyz commented Jan 17, 2019

Ok, well I think the wiki should be modified to specify what virsh commands are needed as prep.

It says:

You don't need to create the VMs. Instead, the deployment scripts will do it for you

and also includes basic setup like installing epel, pip and git. So I think it is reasonable for a reader to assume that they don't need to do any VM networking setup other than what the wiki tells them to do.

I'm happy to modify the wiki myself, once I figure out what the solution is.

I tried using the steps here, making sure that I run all virsh commands with sudo.

sudo virsh net-define /usr/share/libvirt/networks/default.xml
sudo virsh net-autostart default
sudo virsh net-start default

But I still get the same error when I run the playbook.

ERROR Network not found: no network with matching name 'default'

Even though sudo virsh net-list --all clearly shows an active network named default.
virsh net-list --all (without sudo) shows no networks.

So then I ran:

sudo virsh net-undefine default
sudo virsh net-destroy default
virsh net-define /usr/share/libvirt/networks/default.xml # no sudo
virsh net-autostart default # no sudo
virsh net-start default # no sudo

That last command fails.

error: Failed to start network default
error: error creating bridge interface virbr0: Operation not permitted

I cannot simply add sudo to get those permissions, since sudo virsh and virsh operate on different sets of network objects.

While running the playbook, I noticed two errors which appear but are ignored by Ansible. They both say:

TASK [kvm : get container vm status] **************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: libvirtError: authentication unavailable: no polkit agent available to authenticate action 'org.libvirt.unix.manage'
fatal: [localhost]: FAILED! => {"changed": false, "msg": "authentication unavailable: no polkit agent available to authenticate action 'org.libvirt.unix.manage'"}

Is that something which can be ignored? If yes, then I'll go change the wiki page to mention that this error message is ok.

@mdavis-xyz
Copy link
Author

Ok, it turns out I needed to add LIBVIRT_DEFAULT_URI=qemu:///system to my environment, so that virsh and sudo virsh control the same things.

That fixed the bridge error.

Then I got another error about permissions. I discovered that I must run the playbooks with sudo. (Confusingly having keys, username root and hostname localhost in instances.yaml doesn't mean that Ansible will ssh to root@localhost. inventory/hosts tells Ansible to connect to a shell directly, as the current user.) I modified the wiki to add a link which talks about instances.yaml.

When I run with sudo, that step runs sucessfully, and then I got a different error. I fixed that with PR #38 .

Now I get a new error:

TASK [kvm : add container vm kvm103 to inventory when private key is defined] *********************************************************
changed: [localhost]
ERROR! an undefined variable was found when attempting to template the vars_files item '{{ hostvars['localhost'].config_file }}'

The error appears to have been in '/home/opentelco/contrail-ansible-deployer/playbooks/provision_instances.yml': line 39, column 5, but
 may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  vars_files:
  - "{{ hostvars['localhost'].config_file }}"
    ^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes.  Always quote template expression brackets when they
start a value. For instance:

    with_items:
      - {{ foo }}

Should be written as:

    with_items:
      - "{{ foo }}"

As you can probably tell, I'm new to Ansible.
I find this error message confusing because the task after the last successful task is not the one described by the error message.

  • The last successful task was add container vm kvm103 to inventory when private key is defined in playbooks/roles/kvm/tasks/build_and_start_container_hosts.yml
  • The next task should therefore be Wait for the VM be available in the same file (which has no 'when' statement, so should therefore always run)
  • The task pointed to by that error message is Provision KVM instances in playbooks/provision_instances.yml

Is there a reason why the last task in provision_instances.yaml has vars_files equal to hostvars['localhost'].config_file when all other tasks in that playbook have just config_file?

@neoliupassccie
Copy link

Is this problem turned out?

@mdavis-xyz
Copy link
Author

mdavis-xyz commented Sep 20, 2019

No, this problem still exists because PR #38 was rejected.

I'm not working on Contrail stuff any more. (I changed teams at work.) So unfortunately I don't have the time required to climb the steep learning curve necessary to figure out how to resubmit #38 through gerrit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants