Skip to content
This repository has been archived by the owner on Feb 23, 2023. It is now read-only.

RandomDesign has a bug when there is a variable after a bandit variable #319

Open
samath117 opened this issue May 1, 2020 · 5 comments
Open

Comments

@samath117
Copy link

samath117 commented May 1, 2020

Take a look at this definition in RandomDesign:

    def fill_noncontinous_variables(self, samples):
        """
        Fill sample values to non-continuous variables in place
        """
        init_points_count = samples.shape[0]

        for (idx, var) in enumerate(self.space.space_expanded):
            if isinstance(var, DiscreteVariable) or isinstance(var, CategoricalVariable) :
                sample_var = np.atleast_2d(np.random.choice(var.domain, init_points_count))
                samples[:,idx] = sample_var.flatten()

            # sample in the case of bandit variables
            elif isinstance(var, BanditVariable):
                # Bandit variable is represented by a several adjacent columns in the samples array
                idx_samples = np.random.randint(var.domain.shape[0], size=init_points_count)
                bandit_idx = np.arange(idx, idx + var.domain.shape[1])
                samples[:, bandit_idx] = var.domain[idx_samples,:]

This function fills in random values for the discrete variables, simply sampling them. It aims to place them at the appropriate indices for those variables, idx or bandit_idx, the first element of which iterates along with the variables. However, bandit variables take up multiple indices, so idx lags behind and paints over the values of bandit variables after the first index. This causes it to fail whenever there is a bandit variable that isn't the last variable in the list.

Here is the fix:

    def fill_noncontinous_variables(self, samples):
        """
        Fill sample values to non-continuous variables in place
        """
        init_points_count = samples.shape[0]

        idx = 0
        for var in self.space.space_expanded:
            if isinstance(var, DiscreteVariable) or isinstance(var, CategoricalVariable) :
                sample_var = np.atleast_2d(np.random.choice(var.domain, init_points_count))
                samples[:,idx] = sample_var.flatten()
                idx = idx + 1

            # sample in the case of bandit variables
            elif isinstance(var, BanditVariable):
                # Bandit variable is represented by a several adjacent columns in the samples array
                idx_samples = np.random.randint(var.domain.shape[0], size=init_points_count)
                bandit_idx = np.arange(idx, idx + var.domain.shape[1])
                samples[:, bandit_idx] = var.domain[idx_samples,:]
                idx = idx + var.domain.shape[1]

            # skip the continuous variables
            else:
                idx += 1

Instead of iterating idx with enumerate, we handle the iteration manually, making sure to take into account the dimensions taken up by bandit variables.

Should I submit a PR?

@apaleyes
Copy link
Collaborator

apaleyes commented May 1, 2020

Looks reasonable. To confirm, can you create a clear unit test that fails with current code, and that your implementation fixes?

@samath117
Copy link
Author

Here's a simple test case:

# Test that the random design works with multiple bandit variables.
region = GPyOpt.Design_space(space=[{'name': str(n), 'type': 'bandit', 'domain': np.array([[1, 1]])} for n in range(2)])
# Both bandit variables only have one option, [1, 1], so the only possible sample from the whole space is [1, 1, 1, 1].
assert np.array_equal(GPyOpt.experiment_design.initial_design('random', region, 1), np.array([[1, 1, 1, 1]]))

With the current implementation, the last entry is never replaced from the np.empty initialization, so only the first three entries are 1.

I'm not sure exactly how to incorporate this into the test suite. If you want to take this from here, I'd appreciate it.

@ekalosak
Copy link
Contributor

ekalosak commented May 4, 2020

If you submit a PR @samath117 I'd try my hand at it, fwiw.

@apaleyes
Copy link
Collaborator

apaleyes commented May 4, 2020

@samath117 to add it to the existing tests, just follow the pattern! Go to https://github.com/SheffieldML/GPyOpt/blob/master/GPyOpt/testing/util_tests/test_general_utils.py and add a new test there - should look very similar to existing methods in this class. Run the tests and make sure it fails as expected. Then implement the fix, run the tests again and make sure the test is green now. That's it, push and submit a PR.

@samath117
Copy link
Author

Unfortunately, this actually isn't the only issue with using a bandit variable with any other variables.

In the round_optimum method of Design_space in core.task.space.py, the indices are managed correctly, but there is another problem:

    def round_optimum(self, x):
        """
        Rounds some value x to a feasible value in the design space.
        x is expected to be a vector or an array with a single row
        """
        x = np.array(x)
        if not ((x.ndim == 1) or (x.ndim == 2 and x.shape[0] == 1)):
            raise ValueError("Unexpected dimentionality of x. Got {}, expected (1, N) or (N,)".format(x.ndim))

        if x.ndim == 2:
            x = x[0]

        x_rounded = []
        value_index = 0
        for variable in self.space_expanded:
            var_value = x[value_index : value_index + variable.dimensionality_in_model]
            var_value_rounded = variable.round(var_value)

            x_rounded.append(var_value_rounded)
            value_index += variable.dimensionality_in_model

        return np.atleast_2d(np.concatenate(x_rounded))

The problem is that these var_value_rounded variables are two-dimensional for bandit variables, and therefore concatenating them will not produce the expected linear array. If the variables are bandit variables of the same dimensionality, it'll make a rectangular array (shape[0] > 1); if they are different, the concatenation will fail itself with this error:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, [...]
The key is to flatten all the arrays before concatenating, replacing the last line with:
return np.atleast_2d(np.concatenate(x_rounded, axis=None))

Is it safe to say that GPyOpt was never tested on a problem with a bandit variable and at least one other variable?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants