Skip to content

Commit

Permalink
Add to chp 2 part 1
Browse files Browse the repository at this point in the history
  • Loading branch information
FabsOliveira committed Jun 3, 2024
1 parent bbb018a commit 1b2d492
Show file tree
Hide file tree
Showing 10 changed files with 898 additions and 4 deletions.
12 changes: 9 additions & 3 deletions course/content/chapter_1/1-the_farmers_problem.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ kernelspec:
(section:the_farmers_problem)=
# The farmer's problem

We start with the classic example from Birge and Louveaux: the farmer's problem. This is perhaps the most classical example used to discuss the notion of recourse and the interplay between information and decision. Before formalising all these concepts, let's have a close look at the example itself.
We start with the classic example from {cite}`birge2011introduction`: the farmer's problem. This is perhaps the most classical example used to discuss the notion of recourse and the interplay between information and decision. Before formalising all these concepts, let's have a close look at the example itself.

## The deterministic farmers problem

Expand Down Expand Up @@ -113,7 +113,7 @@ Let us now consider the other scenario, in which the yields are instead 20% lowe

The results in {numref}`farmers_optimal_20-` show that in this case, our optimal strategy changes in some way. Essentially, we are still following steps 1-3, but we never really reach step 3, as we are left with not enough land to satisfy our cattle feed constraints. As corn is cheaper to buy than wheat, we focus on fulfilling the need for wheat and plant the reminder of the land with corn, complementing it with an amount of 180 tons from the market.

## Considering multiple sceanrios at once
## Considering multiple scenarios at once

Although we can extract a logic on how to proceed, one may notice that our strategy has a fundamental flaw: it depends on *knowing what would be the yields* so we can plan the exact amount of acres that will yield 6000 tons of sugar beets. Clearly, it the yields are truly uncertain, we must design a strategy that can perform well *regardless* of the observed yield.

Expand Down Expand Up @@ -183,4 +183,10 @@ Moreover, the farmer's land allocation decisions are such that they are *hedging

Effectively, this encoding of the dynamics between decision-making and uncertainty observations is the one of the main focus of *stochastic programming*, i.e., how to incorporate within the model the notion of sequential decisions which are made prior or after information about the uncertainty becomes available.

%TODO: Include diagram with the farmers's first and second stge decisions.
%TODO: Include diagram with the farmers's first and second stge decisions.


## References

```{bibliography}
```
2 changes: 1 addition & 1 deletion course/content/chapter_1/5-multi-stage_problems.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ kernelspec:
language: julia
name: julia-1.10
---

(sec:MSSP)=
# Multi-stage stochastic programming (MSSP) problems

Now that we have clear understanding of the structure of two-stage stochastic programming problems, we can generalise the idea for an arbitrary number of stages.
Expand Down
110 changes: 110 additions & 0 deletions course/content/chapter_2/1-scenario_trees.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@

# Using scenarios for representing uncertainty

As we seen so far, stochastic programming models are mathematical programming models which encode the additional assumption that its some of its parameters behave as a *random variable*.

Therefore, the components form our stochastic programming problem are:

1. a mathematical programming model representing a deterministic version of the problem;
2. the values of the deterministic parameters; and
3. a description of the stochasticity, which can be formed from:
1. a known (or assumed) probability distribution;
2. historical data;
3. probability distribution properties (average, standard deviation, i.e., statistical moments).

We also seen that, for obtaining computationally tractable models, we typically rely on discrete representations of the uncertainty, which are referred to as *scenarios*. However, as one may suspect, when these scenarios are meant to be a discrete representation of a stochastic phenomena, they become an *approximation* of it and, as such, care regarding how to define these must be exercised.

## Scenario trees

A *scenario tree* provides a structure for the sequentially observed realisations of the random variable $\xi^t$, with $t \in \braces{1,\dots,H}$ where $H$ represents the numbers of stages.

Let us define $\xi = (\xi^t)_{t \in [H]}$, where $[H] = \braces{1,\dots,H}$ and $(\cdot)$ denotes a sequence. Also, we have that $\xi_t \in \Xi_t$, that is, at each stage, the stochastic process has its own support $\Xi_y$ while the support of $\xi$ is the Cartesian product $\prod_{t \in [H]}\Xi_t$.

A *scenario* is denoted $(\xi_sˆt)_{t \in [H]}$, forming a *path* through $\xi$. Thus, we can think of the scenario tree $\xi$ as being a set of paths, i.e., $\xi = \braces{\xi_s}_{s \in [S]}$ where $S$ is the number of scenarios.

%TODO: Diagram/ figure with scenario tree


## Taxonomy of scenario trees

Scenario trees have a particular terminology that is a contention not only in stochastic programming, but also in other fields that utilise tree-based representation of sequential decision making under uncertainty.

Consider figure X again. In that, each node represent a known *state* that one is in, meaning that the uncertainty at that point has been revealed and a decision is to be made. Now, each vertical line marks a *stage*, which is a point in time where a decision is made, not "knowing" how the tree will branch forward, but with the knowledge (at each state in that stage) of how the tree branched (or the uncertainty unveiled) up to that point.

The scenario tree is a representation of the stochastic process, and, as such, does not explicitly show at which points decisions are made. The first-stage decision is made at the root of the tree. At each of the nodes between the root and the leaves, there is a decision before observing the uncertainty associated with the node (state) and one immediately after, once the state is observed. The same hold for the leave nodes, i.e., in addition to a decision leading to the revelation of the last state, there is one after the leaf state is observed.

Stages are often associated with time steps in the decision process, as they denote the points in the sequence of decisions that information in obtained (or uncertainty is observed). However, it is common to pose multi-period problems with less stages than time periods represented in the model. A typical example is multi-period lot-sizing problems, where first-stage decisions represent capacity sizing or production commitments, whilst the second-stage represent the operation for all time periods considered.

So, what does it mean for a multi-period problem to only have two stages? Essentially, having no further states means that all of the uncertainty become revealed, once the first stage decision is made. This leads to a tree that is shaped as a *fan*, as described in figure Y.

% TODO: Add diagram/ figure of a fan tree

```{admonition} 2SSP v. MSSP for modelling multi-period problems
:class: note
This relates with the discussion about the need for MSSPs in {ref}`sec:MSSP`. Fan trees are essentially approximations of their multi-stage counterparts, as they do not correctly convey the nonanticipativity of the uncertainty.
```

## Generating scenario trees

A scenario tree is essentially a discrete approximation of a (typically continuous) stochastic process. As such, there are decisions that must be made that influence how well the scenario tree approximates the original stochastic process.

### Scenario tree shape

Two key parameters govern the geometry of a scenario tree:

1. Its **depth**, which is a consequence of the number of stages it possess
2. Its **breadth** (or width), which is due to the number of realisations per stage ($|\xi_t|$, $t \in [H]$).

The decision on the *number of stages* $H$ must reflect the need for adaptability to revealed information, which is connected to the representation of how gradually the uncertainty is revealed. On the other hand, the number of scenarios $S$ convey a more precise description of the uncertainty, and, in general, the more the better.

The relationship between the two influence the total number of scenarios the model will have. In particular, the number of scenarios will $O(N^H)$, where $|\xi_t| \le N$ for $t \in [H]$. This quantity is critical to be kept in mind, as the more scenarios the stochastic programming model has, the more computationally challenging it will be. Indeed, most scenario generation methods seek to find scenario trees with minimal $|\x|$ such that *representation quality* requirements are observed.

### Data source

In practical settings, we often can rely on pre-existent models of the stochastic process, or some data on past observations of the uncertainty. Some common sources that can be used for generating scenarios are

1. **Historical data:** can be used directly as surrogates for possible realisation of the uncertainty. Has as built-in premise of stability of the stochastic process, as it assumes that past realisations are good representations of possible future observations.
2. **Sampling from (simulation) models:** once a stochastic model is available, on can repeatedly sample from it to generate possible realisations via Monte Carlo sampling. This include not only classical stochatic models (e.g., time series models) but also simulation models such as systems dynamics, agent-based and discrete event simulation.
3. **Expert elicitation:** typically involve a small number of "handcrafted" scenarios with its likelihood being defined according to the expectation of one or a group of specialists. One drawback is that does not allow for (out-of-sample) testing.

Often, the process of generating scenarios involves a combination of the above. In particular, provided that enough data is available, it is often common that one would define some parametric (e.g., statistical machine learning) model from which observations, or samples, are then generated.

Clearly, this involves considerable care regarding modelling premises, statistical analyses, and experiment design. Questions such as which model better represent the stochastic phenomena, how to sample scenarios and many scenarios are necessary are only some of the questions that must be answered *before* we even obtain a stochastic programming model.

## Quality measures for scenario trees

One crucial aspect related to stochastic programming is that scenario generation plays a significant part of the modelling process. This is a point that is often overlooked in the literature on stochastic programming applications, which has nonnegligible consequences to the quality of the model obtained.

Figure {ref}`SP_flowchart` illustrates how should think about the process of developing stochastic programming models. That diagram highlights the central role that scenario generation has in the process.

(SP_flowchart)=
```{mermaid}
:align: center
:caption: A flowchart representing the modelling process using stochastic programming
%%{ init: { 'flowchart': { 'curve': 'stepAfter' } } }%%
flowchart TB
id1[Decision process]
id2[Stochastic process]
id3[Scenario tree]
id4[Stochastic programming model]
id1 --> id3
id2 --> id3
id3 --> id4
classDef default fill:white, stroke:black, stroke-width:2px;
````
As such, one common saying related to stochastic programming model is "garbage in equals garbage out". This refers to the fact that, having a sophisticated stochastic programming model, perhaps including many of the features we will discuss in the next chapters, is not enough for one to have a reliable model for analyses. One must, just as carefully, consider whether the quality of the uncertainty representation, as they majorly influence the quality of the solutions obtained.
### Measuring error and stability of scenario trees
There are two measures that one must consider when generating scenario trees:
1. Error: scenario trees naturally encode an inherent amount of error, as they are *approximations* of the a stochastic process. On the other hand,
2. Stability
2 changes: 2 additions & 0 deletions course/figures/.texpadtmp/test.aux
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
\relax
\gdef \@abspage@last{1}
Loading

0 comments on commit 1b2d492

Please sign in to comment.