-
Notifications
You must be signed in to change notification settings - Fork 7
Cookbook
Asif Tamuri edited this page Oct 30, 2018
·
52 revisions
Add requests in this document
- Date arithmetic
- A regular event for individual in population
- Assign values to population with specified probability
Pandas timeseries documentation
Dates should be 'TimeStamp' objects and intervals should be 'Timedelta' objects.
Note: Pandas does not know how to handle partial years and months. Convert the interval to days e.g.
# pandas can handle partial days
>>> pd.to_timedelta([0.25, 0.5, 1, 1.5, 2], unit='d')
TimedeltaIndex(['0 days 06:00:00', '0 days 12:00:00', '1 days 00:00:00',
'1 days 12:00:00', '2 days 00:00:00'],
dtype='timedelta64[ns]', freq=None)
# pandas cannot handle partial months
>>> pd.to_timedelta([0.25, 0.5, 1, 1.5, 2], unit='M')
TimedeltaIndex([ '0 days 00:00:00', '0 days 00:00:00', '30 days 10:29:06',
'30 days 10:29:06', '60 days 20:58:12'],
dtype='timedelta64[ns]', freq=None)
# pandas cannot handle partial years
>>> pd.to_timedelta([0.25, 0.5, 1, 1.5, 2], unit='Y')
TimedeltaIndex([ '0 days 00:00:00', '0 days 00:00:00', '365 days 05:49:12',
'365 days 05:49:12', '730 days 11:38:24'],
dtype='timedelta64[ns]', freq=None)
The way to handle this is to multiply by average number of days in months or year. For example
partial_interval = pd.Series([0.25, 0.5, 1, 1.5, 2])
# we want timedelta for 0.25, 0.5, 1, 1.5 etc months, we need to convert to days
interval = pd.to_timedelta(partial_interval * 30.44, unit='d')
print(interval)
TimedeltaIndex([ '7 days 14:38:24', '15 days 05:16:48', '30 days 10:33:36',
'45 days 15:50:24', '60 days 21:07:12'],
dtype='timedelta64[ns]', freq=None)
# we want timedelta for 0.25, 0.5, 1, 1.5 etc years, we need to convert to days
interval = pd.to_timedelta(partial_interval * 365.25, unit='d')
print(interval)
TimedeltaIndex([ '91 days 07:30:00', '182 days 15:00:00', '365 days 06:00:00',
'547 days 21:00:00', '730 days 12:00:00'],
dtype='timedelta64[ns]', freq=None)
current_date = self.sim.date
# sample a list of numbers from an exponential distribution
# (remember to use self.rng in TLO code)
random_draw = np.random.exponential(scale=5, size=10)
# convert these numbers into years
# valid units are: [h]ours; [d]ays; [M]onths; [y]ears
# REMEMBER: Pandas cannot handle fractions of months or years
random_years = pd.to_timedelta(random_draw, unit='y')
# add to current date
future_dates = current_date + random_years
An event scheduled to run every day on a given person. Note the order of the mixin & superclass:
class MyRegularEventOnIndividual(IndividualScopeEventMixin, RegularEvent):
def __init__(self, module, person):
super().__init__(module=module, person=person, frequency=DateOffset(days=1))
def apply(self, person):
print('do something on person', person.index, 'on', self.sim.date)
Add to simulation e.g. in initialise_simulation()
:
sim.schedule_event(MyRegularEventOnIndividual(module=self, person=an_individual),
sim.date + DateOffset(days=1)
Assign True
to all individuals at probability p_true
(otherwise False
)
df = population.prop
random_draw = self.rng.random_sample(size=len(df)) # random sample for each person between 0 and 1
df['my_property'] = (p_true < random_draw)
or randomly sample a set of rows at the given probability:
df = population.prop
df['my_property'] = False
sampled_indices = np.random.choice(df.index.values, int(len(df) * p_true))
df.loc[sampled_indices, 'my_property'] = True
You can sample a proportion of the index and set those:
df = population.prop
df['my_property'] = False
df.loc[df.index.to_series().sample(frac=p_true).index, 'my_property'] = True
Imagine we have different rate of my_property
being true based on sex.
df = population.props
# create a dataframe to hold the probabilities (or read from an Excel workbook)
prob_by_sex = pd.DataFrame(data=[('M', 0.46), ('F', 0.62)], columns=['sex', 'p_true'])
# merge with the population dataframe
df_with_prob = df[['sex']].merge(prob_by_sex, left_on=['sex'], right_on=['sex'], how='left')
# randomly sample numbers between 0 and 1
random_draw = self.rng.random_sample(size=len(df))
# assign true or false based on draw and individual's p_true
df['my_property'] = (df_with_prob.p_true.values < random_draw)
df = population.props
# get the categories and probabilities (read from Excel file/in the code etc)
categories = [1, 2, 3, 4] # or categories = ['A', 'B', 'C', 'D']
probabilities = [0.1, 0.2, 0.3, 0.4]
random_choice = self.rng.choice(categories, size=len(df), p=probabilities)
# if 'categories' should be treated as a plain old number or string
df['my_category'] = random_choice
# else if 'categories' should be treated as a real Pandas Categorical
# i.e. property was set up using Types.CATEGORICAL
df['my_category'].values[:] = random_choice
TLO Model Wiki