Calculating interactions between chooser and alternative #4

smmaurer · 2017-06-29T18:51:04Z

We need a way to generate columns of data that represent interactions between chooser and alternative. This could be for distances between locations, for weights that vary depending on the category of chooser, and so on.

I'm proposing an InteractionGenerator() class for storing such relationships and calculating them on demand. This approach provides computational and memory efficiencies when there are very large numbers of choosers and alternatives.

InteractionGenerator() would be a template class. We'll provide a couple of implementations, like DistanceGenerator() for calculating distances, and advanced users can write their own.

Usage example:

choosers  # pd.DataFrame with index, lat, lng
alternatives  # pd.DataFrame with index, lat, lng

dg = DistanceGenerator(choosers, alternatives, type='straight_line')
print(dg.get_data(chooser_ids=[...], alternative_ids=[...])

# include the column in a merged & sampled table
merged_table = MergedChoiceTable(choosers, alternatives, sample_size=10, interactions=[dg])

Another common use case will be providing an InteractionGenerator() to specify sampling weights.

There is a rough sketch of these classes in my branch of the code: interaction.py#L21-L85

The text was updated successfully, but these errors were encountered:

smmaurer · 2017-06-29T19:54:29Z

Digging into it a bit more, I think the clearest justification for this implementation is in calculating sampling weights.

For J choosers (maybe millions) and K alternatives (maybe millions), we would need to generate J x K sampling weights, but only K of them would need to be in memory at any given time (for passing to np.random.choice).

Interaction data columns can be generated after the sampling, which would be easier in most cases than writing a subclass of InteractionGenerator(). For example:

mct = MergedChoiceTable(choosers, alternatives, sample_size=10)

# relative price = alternative's price / chooser's income
df = mct.to_frame()
df['relative_price'] = df.price / df.income

I think that would add the column directly into the object's underlying dataframe, since df is a reference, but we should probably write explicit methods for this.

smmaurer · 2018-09-07T18:07:30Z

Most of this is implemented in PR #37. Moving discussion to Issues #39, #40.

smmaurer mentioned this issue Jun 29, 2017

Sampling weights for MergedChoiceTable #5

Closed

smmaurer mentioned this issue Aug 16, 2018

[0.2.dev1] Better sampling support for MergedChoiceTable utility #37

Merged

This was referenced Sep 5, 2018

Sampling of alternatives: performance optimization #39

Open

Sampling of alternatives: additional features #40

Open

smmaurer closed this as completed Sep 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating interactions between chooser and alternative #4

Calculating interactions between chooser and alternative #4

smmaurer commented Jun 29, 2017

smmaurer commented Jun 29, 2017

smmaurer commented Sep 7, 2018

Calculating interactions between chooser and alternative #4

Calculating interactions between chooser and alternative #4

Comments

smmaurer commented Jun 29, 2017

smmaurer commented Jun 29, 2017

smmaurer commented Sep 7, 2018