Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Abstract type for summary statistics (Mean) #277

Closed
wants to merge 4 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions src/SummaryStats.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
"""
SummaryStats

Supertype for every type of summary statistic.

"""

abstract type SummaryStats end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this abstract type looks good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am wondering if SummaryStats should be an AbstractFunction instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about this, will look into it !


"""
Calculate confidence intervals for given estimates.

"""

function confint(estimate::Float64, std_dev::Float64; type::String="normal", alpha::Float64=0.05)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: After reviewing I think you can just copy the entire of _ci from 190 and rename it here.

# Parse type of CI & calculate critical value
if type == "normal"
critical_value = quantile(Normal(), 1 - alpha / 2)
end
# Calculate upper and lower estimates
lower = estimate .- critical_value .* std_dev
upper = estimate .+ critical_value .* std_dev
return (lower=lower, upper=upper)
end

"""

Mean <: SummaryStats

Population mean estimate for a column of a SurveyDesign object.

# Arguments:
- `x::Symbol`: the column to compute population mean statistics for.
- `design::SurveyDesign`: a SurveyDesign object.

```jldoctest
julia> using Survey;
julia> apiclus1 = load_data("apiclus1");
julia> dclus1 = SurveyDesign(apiclus1; clusters=:dnum, strata=:stype, weights=:pw);

julia> api00_mean = Mean(:api00, dclus1)
(x = :api00, estimate = 644.1693989071047, std_dev = 105.74886663549471)

julia> api00_mean.estimate
644.1693989071047

julia> api00_mean.std_dev
105.74886663549471

julia> api00_mean.CI
(lower = 436.90542889560527, upper = 851.4333689186041)

```

# TODO add :
#std_err = stderror(design, x) # standard error of the estimate of the mean

"""

struct Mean <: SummaryStats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This should be done as part of edits inside mean.jl
  • Im not sure whether the struct would just need two things x and design
  • I have gut feeling that Mean should be a Function, not a struct. Lets ask @ayushpatnaikgit this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the idea that it's easier to have all summary statistics as a struct in a single file (SummaryStats.jl) since it's easier to manage, if we want to add another statistic later on you would just add it to this one file.

The reason for having Mean as a struct (as well as a Function) is the same : if we want to add other statistics, and we want all statistics to have a set of basic elements (e.g. an estimate), you would write this into SummaryStats which is the "parent abstract type" and it would be automatically passed down to all the other statistics (since Mean <: SummaryStats). At least this is my understanding.

Also mean.jl (and quantile, ratio, total) has a lot of dispatched functions, so I thought it'd be more efficient and maintainable to have only 1 file which implements all summary stats, instead of 1 file per stat (mean.jl, quantile.jl etc) which has multiple dispatched functions inside.

I agree with having confint.jl separately though.

Thoughts ?

x::Symbol
design::SurveyDesign
end

function Mean(x::Symbol, design::SurveyDesign)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This should be an addition to the existing mean.jl file. You can add with multiple dispatch you can define mean(x::Symbol, design::ReplicateDesign, ci_type=normal ...) see WIP: Add general CI and integration with mean #190 line 169-170 of what the defintion of function can be
  • build for ReplicateDesign first, since SurveyDesign doesnt have std error yet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • to your first point : see my reply to your comment on line 60
  • to your 2nd point : yes that makes sense

column = design.data[!, x]
estimate = mean(column, weights(design.data[!, design.weights]))
std_dev = std(column)
CI = confint(estimate, std_dev)
return (x=column, estimate=estimate, std_dev=std_dev, CI=CI)
end