Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating interactions between chooser and alternative #4

Closed
smmaurer opened this issue Jun 29, 2017 · 2 comments
Closed

Calculating interactions between chooser and alternative #4

smmaurer opened this issue Jun 29, 2017 · 2 comments

Comments

@smmaurer
Copy link
Member

We need a way to generate columns of data that represent interactions between chooser and alternative. This could be for distances between locations, for weights that vary depending on the category of chooser, and so on.

I'm proposing an InteractionGenerator() class for storing such relationships and calculating them on demand. This approach provides computational and memory efficiencies when there are very large numbers of choosers and alternatives.

InteractionGenerator() would be a template class. We'll provide a couple of implementations, like DistanceGenerator() for calculating distances, and advanced users can write their own.

Usage example:

choosers  # pd.DataFrame with index, lat, lng
alternatives  # pd.DataFrame with index, lat, lng

dg = DistanceGenerator(choosers, alternatives, type='straight_line')
print(dg.get_data(chooser_ids=[...], alternative_ids=[...])

# include the column in a merged & sampled table
merged_table = MergedChoiceTable(choosers, alternatives, sample_size=10, interactions=[dg])

Another common use case will be providing an InteractionGenerator() to specify sampling weights.

There is a rough sketch of these classes in my branch of the code: interaction.py#L21-L85

@smmaurer
Copy link
Member Author

Digging into it a bit more, I think the clearest justification for this implementation is in calculating sampling weights.

For J choosers (maybe millions) and K alternatives (maybe millions), we would need to generate J x K sampling weights, but only K of them would need to be in memory at any given time (for passing to np.random.choice).

Interaction data columns can be generated after the sampling, which would be easier in most cases than writing a subclass of InteractionGenerator(). For example:

mct = MergedChoiceTable(choosers, alternatives, sample_size=10)

# relative price = alternative's price / chooser's income
df = mct.to_frame()
df['relative_price'] = df.price / df.income

I think that would add the column directly into the object's underlying dataframe, since df is a reference, but we should probably write explicit methods for this.

@smmaurer
Copy link
Member Author

smmaurer commented Sep 7, 2018

Most of this is implemented in PR #37. Moving discussion to Issues #39, #40.

@smmaurer smmaurer closed this as completed Sep 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant