-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Overhaul Statistics #112
Comments
A big thank you to @reallyyy for reporting that the Descriptive Statistics course was no longer available, prompting this investigation. |
For individuals that would like to get a head start on identifying a suitable Introduction to Statistics course, below is a list of resources that you may start with. Remember:
MIT OCW Statistics For Applications |
For individuals that would like to get a head start on identifying a suitable Theory of Statistics course, below is a list of resources that you may start with. The notes about analysis in the comment above apply here as well. University of Arizona Theory of Statistics: Includes lectures and assignments, no solutions |
There's a conflation among the assertions put forward that descriptive statistics = "basic statistics" and therefore OSSU shouldn't spend the time on it because it's prerequisite material. In short, no. In long, noooooooooooooooooo. Mean, median, and mode, stem & leaf, and scatterplots together represent the entirety of statistics encountered in high school. But this is Day 1 material in a university-level descriptive statistics course (though this is also encountered in probability, and therefore these courses are typically taught jointly as an introductory probability-and-statistics course). After they spend roughly 60% of their time just cleaning their data, practicing data scientists spend roughly the next 20% of their time doing exploratory data analysis -- which leans heavily on descriptive statistics to characterize a dataset's distribution. The importance of mean, median and mode cannot be understated -- but other values like variance, IQR, mean absolute deviation, central moments, kurtosis, scedasticity, Kolmogorov–Smirnov test scores, etc. identify key descriptive signatures of a distribution. No, we need a descriptive statistics course. The OSSU data science curriculum goes up through multivariate calculus. I propose as a benchmark course Georgia Tech's ISYE 6739 (co-listed as ISYE 4739 for undergraduates). This combination probability/statistics course builds on a multivariate calculus foundation at a level appropriate for motivated undergraduates without prior exposure to probability or statistics. This is a rigorous yet effective combined probability/statistics course that does a good job of covering the basics to a point sufficient for further study, even graduate study. Prof. Goldsman really hits the Goldilocks Zone here -- none too esoteric, none too powderpuff. This course includes everything you need to set up further study in data analytics or operations research. |
To be clear, the descriptive stats course did not cover the advanced topics you list. But you are correct that I conflated all descriptive stats with basic stats. Assertion: OSSU Data Science curriculum should not recommend a basic stats course. This is prerequisite material; OSSU's focus is requisite material for undergraduate learners. |
Edited for clarity Hello Everyone, I'd like to recommend two courses for candidates in our Data Science Statistics program: Statistical Learning with Python by Stanford University on EdX Both courses are based on the same content, differing only in the programming language used. They're aligned with a free book available at www.statlearning.com. These courses offer an extensive introduction to statistical learning methods, crucial for anyone pursuing a career in data science. The authors are renowned figures in the data science community, and this book is frequently recommended on various Data Science, Machine Learning, and AI subreddits. Why These Courses Are Beneficial:
These courses are an invaluable resource for anyone aspiring to deeply understand and apply data science principles. |
@Smcgb The course describes itself as an "introductory-level course in supervised learning", so would follow an introduction to statistics. Can you open a separate RFC to recommend the addition of this course to the curriculum? We'll leave the RFC open for 1 month for others to comment. The change looks like a positive one to me. After a month comment period we can include the course in the curriculum. One optional edit that you can make to the RFC, is to link to some of the recommendations for the book that you mention. Thanks for looking for ways to improve the curriculum! |
Summary
OSSU should undertake a search for a number of new courses in statistics.
Background
OSSU currently recommends 2 courses on statistics:
The first of these is no longer offered.
Guidelines
OSSU Data Science uses the report Curriculum Guidelines for Undergraduate Programs in Data Science as our guide for course recommendation.
Section 6 "Transitioning To A Data Science Major Using Typical Existing Courses" states:
Subsection 6.3 "Courses in Statistics" states:
Gaise
For reference, the K-12 GAISE report uses a framework of 3 levels of sophistication with stats expected of K-12 students. This can be found on page 24.
The GAISE College Report includes both goals, recommendations and suggestions for topics that might be omitted.
Goals (summarized)
Recommendations
These are largely recommendations for how statistics courses should be taught.
Suggestions for Topics that Might be Omitted from Introductory Statistics Courses
Of note, the basic statistics section reads:
Assertions
Request for Comments
This RFC is asking specifically for comments on the assertions above. Are these the right steps? Are there other implications for OSSU's curriculum that are not identified?
There will be other RFCs for carrying out the individual steps (e.g. there will be a separate RFC for Identify an Introduction to Statistics course).
The text was updated successfully, but these errors were encountered: