forked from torproject/torspec
-
Notifications
You must be signed in to change notification settings - Fork 0
/
140-consensus-diffs.txt
233 lines (175 loc) · 8.52 KB
/
140-consensus-diffs.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
Filename: 140-consensus-diffs.txt
Title: Provide diffs between consensuses
Author: Peter Palfrader
Created: 13-Jun-2008
Implemented-In: 0.3.1.1-alpha
Status: Closed
Ticket: https://bugs.torproject.org/13339
0. History
22-May-2009: Restricted the ed format even more strictly for ease of
implementation. -nickm
25-May-2014: Adapted to the new dir-spec version 3 and made the diff urls
backwards-compatible. -mvdan
1-Mar-2017: Update to new stats, note newer proposals, note flavors,
diffs, add parameters, restore diff-only URLs, say what "Digest"
means. -nickm
3-May-2017: Add a notion of "digest-as-signed" vs "full digest", since
otherwise the fact that there are multiple encodings of the same valid
consensus signatures would make clients identify which encodings they
had been given as they asked for diffs.
4-May-2017: Remove support for truncated digest prefixes.
1. Overview.
Tor clients and servers need a list of which relays are on the
network. This list, the consensus, is created by authorities
hourly and clients fetch a copy of it, with some delay, hourly.
This proposal suggests that clients download diffs of consensuses
once they have a consensus instead of hourly downloading a full
consensus.
This does not only apply to ordinary directory consensuses, but to the
newer microdescriptor consensuses added in the third version of the
directory specification.
2. Numbers
After implementing proposal 138, which removed nodes that are not
running from the list, a consensus document was about 92 kilobytes
in size after compression... back in 2008 when this proposal was first
written.
But now in March 2017, that figure is more like 625 kilobytes.
The diff between two consecutive consensuses, in ed format, is on
average 37 kilobytes compressed. So by making this change, we could
save something like 94% of our consensus download bandwidth.
3. Proposal
3.0. Preliminaries.
Unless otherwise specified, all digests in this document are SHA3-256
digests, encoded in base64. This document also uses "hash" as
synonymous with "digest".
A "full digest" of a consensus document covers the entire document,
from the "network-status-version" through the newline after the final
"-----END SIGNATURE-----".
A "digest as signed" of a consensus document covers the same part that
the signatures cover: the "network-status-version" through the space
immediately after the "directory-signature" keyword on the first
"directory-signature" line.
3.1 Clients
If a client has a consensus that is recent enough it SHOULD
try to download a diff to get the latest consensus rather than
fetching a full one.
[XXX: what is recent enough?
time delta in hours / size of compressed diff
1: 38177
2: 66955
3: 93502
4: 118959
5: 143450
6: 167136
12: 291354
18: 404008
24: 416663
30: 431240
36: 443858
42: 454849
48: 464677
54: 476716
60: 487755
66: 497502
72: 506421
Data suggests that for the first few hours' diffs are very useful,
saving at least 50% for the first 12 hours. After that, returns seem to
be more marginal. But note the savings from proposals like 274-276, which
make diffs smaller over a much longer timeframe. ]
3.2 Servers
Directory authorities and servers need to keep a number of old consensus
documents so they can build diffs. (See section 5 below ). They should
offer a diff to the most recent consensus at the following request:
HTTP/1.0 GET /tor/status-vote/current/consensus{-Flavor}/<FPRLIST>.z
X-Or-Diff-From-Consensus: HASH1 HASH2...
where the hashes are the digests-as-signed of the consensuses the client
currently has, and FPRLIST is a list of (abbreviated) fingerprints of
authorities the client trusts.
Servers will only return a consensus if more than half of the requested
authorities have signed the document. Otherwise, a 404 error will be sent
back.
The advantage of using the same URL that is currently used for
consensuses is that the client doesn't need to know whether a server
supports consensus diffs. If it doesn't, it will simply ignore the
extra header and return the full consensus.
If a server cannot offer a diff from one of the consensuses identified
by one of the hashes but has a current consensus it MUST return the
full consensus.
[XXX: what should we do when the client already has the latest
consensus? I can think of the following options:
- send back 3xx not modified
- send back 200 ok and an empty diff
- send back 404 nothing newer here.
I currently lean towards the empty diff.]
Additionally, specific diff for a given consensus digest-as-signed
should be available a URL of the form:
/tor/status-vote/current/consensus{-Flavor}/diff/<HASH>/<FPRLIST>.z
This differs from the previous request type in that it should never
return a whole consensus: if a diff is not available, it should return
404.
4. Diff Format
Diffs start with the token "network-status-diff-version" followed by a
space and the version number, currently "1".
If a document does not start with network-status-diff it is assumed
to be a full consensus download and would therefore currently start
with "network-status-version 3".
Following the network-status-diff line is another header line,
starting with the token "hash" followed by the digest-as-signed of the
consensus that this diff applies to, and the full digest that the
resulting consensus should have.
Following the network-status-diff header lines is a diff, or patch, in
limited ed format. We choose this format because it is easy to create
and process with standard tools (patch, diff -e, ed). This will help
us in developing and testing this proposal and it should make future
debugging easier.
[ If at one point in the future we decide that the space benefits from
a custom diff format outweighs these benefits we can always
introduce a new diff format and offer it at for instance
../diff2/... ]
We support the following ed commands, each on a line by itself:
- "<n1>d" Delete line n1
- "<n1>,<n2>d" Delete lines n1 through n2, inclusive
- "<n1>,$d" Delete line n1 through the end of the file, inclusive.
- "<n1>c" Replace line n1 with the following block
- "<n1>,<n2>c" Replace lines n1 through n2, inclusive, with the
following block.
- "<n1>a" Append the following block after line n1.
- "a" Append the following block after the current line.
Note that line numbers always apply to the file after all previous
commands have already been applied. Note also that line numbers
are 1-indexed.
The commands MUST apply to the file from back to front, such that
lines are only ever referred to by their position in the original
file.
If there are any directory signatures on the original document, the
first command MUST be a "<n1>,$d" form to remove all of the directory
signatures. Using this format ensures that the client will
successfully apply the diff even if they have an unusual encoding for
the signatures.
The "current line" is either the first line of the file, if this is
the first command, the last line of a block we added in an append or
change command, or the line immediate following a set of lines we just
deleted (or the last line of the file if there are no lines after
that).
The replace and append command take blocks. These blocks are simply
appended to the diff after the line with the command. A line with
just a period (".") ends the block (and is not part of the lines
to add). Note that it is impossible to insert a line with just
a single dot.
4.1. Concatenating multiple diffs
Directory caches may, at their discretion, return the concatenation of
multiple diffs using the format above. Such diffs are to be applied from
first to last. This allows the caches to cache a smaller number of
compressed diffs, at the expense of some loss in bandwidth efficiency.
5. Networkstatus parameters
The following parameters govern how relays and clients use this protocol.
min-consensuses-age-to-cache-for-diff
(min 0, max 744, default 6)
max-consensuses-age-to-cache-for-diff
(min 0, max 8192, default 72)
These two parameters determine how much consensus history (in
hours) relays should try to cache in order to serve diffs.
try-diff-for-consensus-newer-than
(min 0, max 8192, default 72)
This parameter determines how old a consensus can be (in hours)
before a client should no longer try to find a diff for it.