RandomDesign has a bug when there is a variable after a bandit variable #319

samath117 · 2020-05-01T10:05:39Z

Take a look at this definition in RandomDesign:

    def fill_noncontinous_variables(self, samples):
        """
        Fill sample values to non-continuous variables in place
        """
        init_points_count = samples.shape[0]

        for (idx, var) in enumerate(self.space.space_expanded):
            if isinstance(var, DiscreteVariable) or isinstance(var, CategoricalVariable) :
                sample_var = np.atleast_2d(np.random.choice(var.domain, init_points_count))
                samples[:,idx] = sample_var.flatten()

            # sample in the case of bandit variables
            elif isinstance(var, BanditVariable):
                # Bandit variable is represented by a several adjacent columns in the samples array
                idx_samples = np.random.randint(var.domain.shape[0], size=init_points_count)
                bandit_idx = np.arange(idx, idx + var.domain.shape[1])
                samples[:, bandit_idx] = var.domain[idx_samples,:]

This function fills in random values for the discrete variables, simply sampling them. It aims to place them at the appropriate indices for those variables, idx or bandit_idx, the first element of which iterates along with the variables. However, bandit variables take up multiple indices, so idx lags behind and paints over the values of bandit variables after the first index. This causes it to fail whenever there is a bandit variable that isn't the last variable in the list.

Here is the fix:

    def fill_noncontinous_variables(self, samples):
        """
        Fill sample values to non-continuous variables in place
        """
        init_points_count = samples.shape[0]

        idx = 0
        for var in self.space.space_expanded:
            if isinstance(var, DiscreteVariable) or isinstance(var, CategoricalVariable) :
                sample_var = np.atleast_2d(np.random.choice(var.domain, init_points_count))
                samples[:,idx] = sample_var.flatten()
                idx = idx + 1

            # sample in the case of bandit variables
            elif isinstance(var, BanditVariable):
                # Bandit variable is represented by a several adjacent columns in the samples array
                idx_samples = np.random.randint(var.domain.shape[0], size=init_points_count)
                bandit_idx = np.arange(idx, idx + var.domain.shape[1])
                samples[:, bandit_idx] = var.domain[idx_samples,:]
                idx = idx + var.domain.shape[1]

            # skip the continuous variables
            else:
                idx += 1

Instead of iterating idx with enumerate, we handle the iteration manually, making sure to take into account the dimensions taken up by bandit variables.

Should I submit a PR?

The text was updated successfully, but these errors were encountered:

apaleyes · 2020-05-01T10:23:50Z

Looks reasonable. To confirm, can you create a clear unit test that fails with current code, and that your implementation fixes?

samath117 · 2020-05-01T11:34:35Z

Here's a simple test case:

# Test that the random design works with multiple bandit variables.
region = GPyOpt.Design_space(space=[{'name': str(n), 'type': 'bandit', 'domain': np.array([[1, 1]])} for n in range(2)])
# Both bandit variables only have one option, [1, 1], so the only possible sample from the whole space is [1, 1, 1, 1].
assert np.array_equal(GPyOpt.experiment_design.initial_design('random', region, 1), np.array([[1, 1, 1, 1]]))

With the current implementation, the last entry is never replaced from the np.empty initialization, so only the first three entries are 1.

I'm not sure exactly how to incorporate this into the test suite. If you want to take this from here, I'd appreciate it.

ekalosak · 2020-05-04T20:55:03Z

If you submit a PR @samath117 I'd try my hand at it, fwiw.

apaleyes · 2020-05-04T21:35:28Z

@samath117 to add it to the existing tests, just follow the pattern! Go to https://github.com/SheffieldML/GPyOpt/blob/master/GPyOpt/testing/util_tests/test_general_utils.py and add a new test there - should look very similar to existing methods in this class. Run the tests and make sure it fails as expected. Then implement the fix, run the tests again and make sure the test is green now. That's it, push and submit a PR.

samath117 · 2020-05-14T21:12:44Z

Unfortunately, this actually isn't the only issue with using a bandit variable with any other variables.

In the round_optimum method of Design_space in core.task.space.py, the indices are managed correctly, but there is another problem:

    def round_optimum(self, x):
        """
        Rounds some value x to a feasible value in the design space.
        x is expected to be a vector or an array with a single row
        """
        x = np.array(x)
        if not ((x.ndim == 1) or (x.ndim == 2 and x.shape[0] == 1)):
            raise ValueError("Unexpected dimentionality of x. Got {}, expected (1, N) or (N,)".format(x.ndim))

        if x.ndim == 2:
            x = x[0]

        x_rounded = []
        value_index = 0
        for variable in self.space_expanded:
            var_value = x[value_index : value_index + variable.dimensionality_in_model]
            var_value_rounded = variable.round(var_value)

            x_rounded.append(var_value_rounded)
            value_index += variable.dimensionality_in_model

        return np.atleast_2d(np.concatenate(x_rounded))

The problem is that these var_value_rounded variables are two-dimensional for bandit variables, and therefore concatenating them will not produce the expected linear array. If the variables are bandit variables of the same dimensionality, it'll make a rectangular array (shape[0] > 1); if they are different, the concatenation will fail itself with this error:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, [...]
The key is to flatten all the arrays before concatenating, replacing the last line with:
return np.atleast_2d(np.concatenate(x_rounded, axis=None))

Is it safe to say that GPyOpt was never tested on a problem with a bandit variable and at least one other variable?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RandomDesign has a bug when there is a variable after a bandit variable #319

RandomDesign has a bug when there is a variable after a bandit variable #319

samath117 commented May 1, 2020 •

edited

Loading

apaleyes commented May 1, 2020

samath117 commented May 1, 2020

ekalosak commented May 4, 2020

apaleyes commented May 4, 2020 •

edited

Loading

samath117 commented May 14, 2020

RandomDesign has a bug when there is a variable after a bandit variable #319

RandomDesign has a bug when there is a variable after a bandit variable #319

Comments

samath117 commented May 1, 2020 • edited Loading

apaleyes commented May 1, 2020

samath117 commented May 1, 2020

ekalosak commented May 4, 2020

apaleyes commented May 4, 2020 • edited Loading

samath117 commented May 14, 2020

samath117 commented May 1, 2020 •

edited

Loading

apaleyes commented May 4, 2020 •

edited

Loading