Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequency variance #26

Open
joeflack4 opened this issue Aug 5, 2022 · 0 comments
Open

Frequency variance #26

joeflack4 opened this issue Aug 5, 2022 · 0 comments

Comments

@joeflack4
Copy link
Member

joeflack4 commented Aug 5, 2022

Overview

For each of the count categories (1-4) above, when is the occurrence of these topics, when are they more frequent / less frequent.

Additional thoughts

I think that the first dataset to generate would have the following fields / types:

    1. year: string: YYYY format
    1. month: string: MM format
    1. day: string: DD format
    1. keyword: string
    1. n_messages: integer

Dates dataframe
I think the first thing to do would be to create a dataframe with just columns 1-3. The existence of leap year makes this kind of annoying. For our initial implementation, we could just pretend every year has a leap day, or leave it out entirely. We could generate this initial dataframe a number of different ways.

Messages
Once we have that, then we can iterate over our cached messages.

We would look at the timestamp column, and then we could extract the year, month, and day. We could then update the given row in the dates dataframe, and increment the n_messages

Or, we could simplify this dataframe to be a similar format to dates dataframe, and then do something like a JOIN.

@joeflack4 joeflack4 mentioned this issue Aug 5, 2022
14 tasks
@joeflack4 joeflack4 changed the title 6c. Frequency variance Frequency variance Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant