ada511_detailed_topics.txt

## (14 weeks to be allotted)


* Statements and questions. The importance of unambiguous statements & questions.
- Examples like "which algorithm is better?".
- Historical example: Einstein and "simultaneity".
Emphasize this point constantly throughout the course, so the students learn it as a habit: every time a central statement or question appears, we spend 60 seconds to discuss whether it is unambiguous/well-posed.

* Statements about "data" and statements about "models"

* Truth-calculus.
No premises -> no conclusions

* Generalization of truth-calculus to probability-calculus.

* The three basic laws. Consequences: Bayes's theorem, law of "extension of discourse"
- Example: Monty Hall problem
- Example: Clinical diagnosis
- Example: ...

* Kinds of data:
** binary
** nominal
** ordinal (discrete)
** continuous - unbounded, bounded ("location quantities" and "scale quantities")
** censored
** 2D and 3D data: images

* Information that is not "data":
** orders of magnitude
** physical bounds

* Variate transformations:
** log
** probit
** logit

* Location, range/dispersion, resolution of data [maybe move below to "Summaries of distributions"]

* Distributions of probability
** Continuous distributions
** Difference between probability theory and statistics

* Representation of distributions
** density function
** difference between function and density function
** jacobian
** histogram
** scatter plot
** Their behaviour under variate transformations

* Relations between probability and frequency [connections with relative entropy]

* Summaries of distributions
** median, quantiles & quartiles, interquartile range, median absolute deviation
** mean, standard deviation
** robust vs non-robust summaries [mainly through discover-yourself examples]
** behaviour of summaries under variate transformations
Examples: Cauchy distribution
*** location: median, mean
*** range/dispersion: interquartile range, MAD, standard deviation, half-range
*** resolution: differential entropy


* Outliers and out-of-population data
Emphasize the difference
Warn against "tail cutting" and similar mindless practices

* Marginal and conditional distributions
Warning about different distributions with identical marginals

* Quirks of data and distributions in high dimensions
[here we can have sum fun with the examples]

* Sampling, subsampling

* Minimal representative sample:
*** How sampling often introduces bias
- Example: data with 14 binary variates, 10000 samples
*** Size of minimal representative sample = (2^entropy)/precision
*** Warning: in high dimensions, all datasets are outliers.
*** Warning: data splits and cross-validation cannot correct sampling biases

* Decisions, consequences, utilities
Basic concepts of utility theory
- Example: production line
- Example: medical diagnosis
- Check example at https://mariateresaherrerozamorano.medium.com/the-maths-of-covid-19-part-1-real-world-is-not-normal-616ba9e0d51b

* Maximization of expected utility


* The basic inference problem: units, predictors, predictands
Two main kinds of questions: Y given X, Y and X
connection with "supervised" and "unsupervised" learning

* The idea/device of a "full population" (past, present, future)

* Exchangeability
vs time series

* Basic solution of the inference problem through frequency of full population:
** the "Omni-Predictor Machine"
** data fit vs prior "reasonableness"

* Possible questions and answers about data

* Sources of uncertainty
Uncertainty about population frequency
Uncertainty about next outcome
Uncertainty about long-run outcomes
Uncertainty about data

* Discriminative algorithms
unknown Y given known X

* Generative algorithms
unknown Y,X

* Functional regression
Y assumed to be function of X
E(Y|X) from p(Y|X)

* Example of signal analysis (Bretthorst)?

* Examples of translation of machine-learning problems into this general exchangeable framework

** Approximation: replace average with value at mode

** Neural networks
Assumption: Y is function of X

** Random forests
Assumption: probability density has a crossword-like profile

** Support vector machines

* Algorithm comparison & performance: from a decision-theoretic perspective
base this on https://doi.org/10.31219/osf.io/7rz8t
** Discussion of popular performance metrics: warning about inconsistent ones:
*** Accuracy: OK
*** True-positive & False-positive rates: OK
*** Precision: avoid, inconsistent!
*** Matthews Correlation Coefficient: avoid, inconsistent!
*** F1-measure: avoid, inconsistent!
*** AUC: avoid, inconsistent!
** How to construct the problem-dependent appropriate performance metric