-
Notifications
You must be signed in to change notification settings - Fork 3
/
ada511_detailed_topics.txt
152 lines (108 loc) · 4.53 KB
/
ada511_detailed_topics.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
## (14 weeks to be allotted)
* Statements and questions. The importance of unambiguous statements & questions.
- Examples like "which algorithm is better?".
- Historical example: Einstein and "simultaneity".
Emphasize this point constantly throughout the course, so the students learn it as a habit: every time a central statement or question appears, we spend 60 seconds to discuss whether it is unambiguous/well-posed.
* Statements about "data" and statements about "models"
* Truth-calculus.
No premises -> no conclusions
* Generalization of truth-calculus to probability-calculus.
* The three basic laws. Consequences: Bayes's theorem, law of "extension of discourse"
- Example: Monty Hall problem
- Example: Clinical diagnosis
- Example: ...
* Kinds of data:
** binary
** nominal
** ordinal (discrete)
** continuous - unbounded, bounded ("location quantities" and "scale quantities")
** censored
** 2D and 3D data: images
* Information that is not "data":
** orders of magnitude
** physical bounds
* Variate transformations:
** log
** probit
** logit
* Location, range/dispersion, resolution of data [maybe move below to "Summaries of distributions"]
* Distributions of probability
** Continuous distributions
** Difference between probability theory and statistics
* Representation of distributions
** density function
** difference between function and density function
** jacobian
** histogram
** scatter plot
** Their behaviour under variate transformations
* Relations between probability and frequency [connections with relative entropy]
* Summaries of distributions
** median, quantiles & quartiles, interquartile range, median absolute deviation
** mean, standard deviation
** robust vs non-robust summaries [mainly through discover-yourself examples]
** behaviour of summaries under variate transformations
Examples: Cauchy distribution
*** location: median, mean
*** range/dispersion: interquartile range, MAD, standard deviation, half-range
*** resolution: differential entropy
* Outliers and out-of-population data
Emphasize the difference
Warn against "tail cutting" and similar mindless practices
* Marginal and conditional distributions
Warning about different distributions with identical marginals
* Quirks of data and distributions in high dimensions
[here we can have sum fun with the examples]
* Sampling, subsampling
* Minimal representative sample:
*** How sampling often introduces bias
- Example: data with 14 binary variates, 10000 samples
*** Size of minimal representative sample = (2^entropy)/precision
*** Warning: in high dimensions, all datasets are outliers.
*** Warning: data splits and cross-validation cannot correct sampling biases
* Decisions, consequences, utilities
Basic concepts of utility theory
- Example: production line
- Example: medical diagnosis
- Check example at https://mariateresaherrerozamorano.medium.com/the-maths-of-covid-19-part-1-real-world-is-not-normal-616ba9e0d51b
* Maximization of expected utility
* The basic inference problem: units, predictors, predictands
Two main kinds of questions: Y given X, Y and X
connection with "supervised" and "unsupervised" learning
* The idea/device of a "full population" (past, present, future)
* Exchangeability
vs time series
* Basic solution of the inference problem through frequency of full population:
** the "Omni-Predictor Machine"
** data fit vs prior "reasonableness"
* Possible questions and answers about data
* Sources of uncertainty
Uncertainty about population frequency
Uncertainty about next outcome
Uncertainty about long-run outcomes
Uncertainty about data
* Discriminative algorithms
unknown Y given known X
* Generative algorithms
unknown Y,X
* Functional regression
Y assumed to be function of X
E(Y|X) from p(Y|X)
* Example of signal analysis (Bretthorst)?
* Examples of translation of machine-learning problems into this general exchangeable framework
** Approximation: replace average with value at mode
** Neural networks
Assumption: Y is function of X
** Random forests
Assumption: probability density has a crossword-like profile
** Support vector machines
* Algorithm comparison & performance: from a decision-theoretic perspective
base this on https://doi.org/10.31219/osf.io/7rz8t
** Discussion of popular performance metrics: warning about inconsistent ones:
*** Accuracy: OK
*** True-positive & False-positive rates: OK
*** Precision: avoid, inconsistent!
*** Matthews Correlation Coefficient: avoid, inconsistent!
*** F1-measure: avoid, inconsistent!
*** AUC: avoid, inconsistent!
** How to construct the problem-dependent appropriate performance metric