forked from moj-analytical-services/splink
-
Notifications
You must be signed in to change notification settings - Fork 0
/
mkdocs.yml
200 lines (196 loc) · 7.98 KB
/
mkdocs.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
site_name: Splink
use_directory_urls: false
repo_url: https://github.com/moj-analytical-services/splink
edit_uri: edit/master/docs/
theme:
icon:
repo: fontawesome/brands/github
name: "material"
features:
- content.code.annotate
- content.code.copy
- content.tabs.link
- content.tooltips
- header.autohide
- search.highlight
- search.share
- search.suggest
- navigation.indexes
- navigation.footer
- content.action.edit
- navigation.tabs
- navigation.tabs.sticky
- navigation.top
- toc.follow
logo: "img/favicon.ico"
favicon: "img/favicon.ico"
palette:
- scheme: default
primary: indigo
accent: indigo
toggle:
icon: material/toggle-switch
name: Switch to dark mode
- scheme: slate
primary: purple
accent: red
toggle:
icon: material/toggle-switch-off-outline
name: Switch to light mode
custom_dir: docs/overrides
plugins:
- search
- semiliterate
- mknotebooks
- tags
- mkdocstrings:
default_handler: python
handlers:
python:
rendering:
show_source: false
custom_templates: templates
- git-revision-date-localized:
enable_creation_date: true
type: timeago
fallback_to_build_date: true
markdown_extensions:
- abbr
- attr_list
- meta
- admonition
- pymdownx.details
- pymdownx.arithmatex:
generic: true
- pymdownx.superfences:
custom_fences:
- name: mermaid
class: mermaid
format: !!python/name:pymdownx.superfences.fence_code_format
- pymdownx.snippets:
auto_append:
- includes/abbreviations.md
- pymdownx.tabbed:
alternate_style: true
- toc:
permalink: True
toc_depth: 3
- pymdownx.emoji:
emoji_index: !!python/name:materialx.emoji.twemoji
emoji_generator: !!python/name:materialx.emoji.to_svg
- footnotes
nav:
- Home: "index.md"
- Getting Started: "getting_started.md"
- Tutorial:
- Introduction: "demos/00_Tutorial_Introduction.ipynb"
- 1. Data prep prerequisites: "demos/01_Prerequisites.ipynb"
- 2. Exploratory analysis: "demos/02_Exploratory_analysis.ipynb"
- 3. Blocking: "demos/03_Blocking.ipynb"
- 4. Estimating model parameters: "demos/04_Estimating_model_parameters.ipynb"
- 5. Predicting results: "demos/05_Predicting_results.ipynb"
- 6. Visualising predictions: "demos/06_Visualising_predictions.ipynb"
- 7. Quality assurance: "demos/07_Quality_assurance.ipynb"
- Examples:
- Introduction: "examples_index.md"
- DuckDB:
- Deduplicate 50k rows historical persons: "demos/example_deduplicate_50k_synthetic.ipynb"
- Linking financial transactions: "demos/example_transactions.ipynb"
- Linking two tables of persons: "demos/example_link_only.ipynb"
- Real time record linkage: "demos/example_real_time_record_linkage.ipynb"
- QA from ground truth column: "demos/example_accuracy_analysis_from_labels_column.ipynb"
- Estimating m probabilities from labels: "demos/example_pairwise_labels.ipynb"
- Quick and dirty persons model: "demos/example_quick_and_dirty_persons.ipynb"
- Febrl3 Dedupe: "demos/example_febrl3.ipynb"
- Febrl4 link-only: "demos/example_febrl4.ipynb"
- PySpark:
- Deduplication using Pyspark: "demos/example_simple_pyspark.ipynb"
- Athena:
- Deduplicate 50k rows historical persons: "demos/athena_deduplicate_50k_synthetic.ipynb"
- Topic Guides:
- Introduction: "topic_guides/topic_guides_index.md"
- Record Linkage Theory:
- Why do we need record linkage?: "topic_guides/record_linkage.md"
- Probabilistic vs Deterministic linkage: "topic_guides/probabilistic_vs_deterministic.md"
- The Fellegi-Sunter Model: "topic_guides/fellegi_sunter.md"
- Linkage Models in Splink:
- Splink's SQL backends - Spark, DuckDB etc:
- Backends overview: "topic_guides/backends/backends.md"
- PostgreSQL: "topic_guides/backends/postgres.md"
- Link type - linking vs deduping: "topic_guides/link_type.md"
- Defining Splink models: "topic_guides/settings.md"
- Retrieving and querying Splink results: "topic_guides/querying_splink_results.md"
- Data Preparation:
- Feature Engineering: "topic_guides/feature_engineering.md"
- Blocking:
- Run times, performance and linking large data: "topic_guides/drivers_of_performance.md"
- Blocking rules for prediction vs estimation: "topic_guides/blocking_rules.md"
- Comparing Records:
- Defining and customising comparisons: "topic_guides/customising_comparisons.ipynb"
- Out-of-the-box comparisons: "topic_guides/comparison_templates.ipynb"
- Comparing strings:
- Choosing comparators and thresholds: "topic_guides/choosing_comparators.ipynb"
- String comparators: "topic_guides/comparators.md"
- Phonetic transformations: "topic_guides/phonetic.md"
- Term-Frequency adjustments: "topic_guides/term-frequency.md"
- Performance:
- Optimising Spark performance: "topic_guides/optimising_spark.md"
- Salting blocking rules: "topic_guides/salting.md"
- Documentation:
- Introduction: "documentation_index.md"
- API:
- Linker API:
- Full API: "linker.md"
- Exploratory analysis: "linkerexp.md"
- Estimating model parameters: "linkerest.md"
- Predicting results: "linkerpred.md"
- Visualisation and quality assurance: "linkerqa.md"
- Comparisons Library API:
- Comparison Template Library: "comparison_template_library.md"
- Comparison Library: "comparison_library.md"
- Comparison Level Library: "comparison_level_library.md"
- Comparison Composition: "comparison_level_composition.md"
- EM Training Session API: "em_training_session.md"
- SplinkDataFrame API: "SplinkDataFrame.md"
- Comparisons API:
- Comparison: "comparison.md"
- Comparison Level: "comparison_level.md"
- In-build datasets: "datasets.md"
- Settings:
- Settings Dictionary Reference: "settings_dict_guide.md"
- Interactive Settings Editor: "settingseditor/editor.md"
- Contributing:
- Introduction: "dev_guides/dev_guides_index.md"
- Making Changes to Splink:
- Building a Virtual Environment: "dev_guides/changing_splink/building_env_locally.md"
- Linting and Formatting: "dev_guides/changing_splink/lint_and_format.md"
- Testing: "dev_guides/changing_splink/testing.md"
- Building Docs: "dev_guides/changing_splink/build_docs_locally.md"
- Releasing a Package Version: "dev_guides/changing_splink/releases.md"
- How Splink works:
- Understanding and debugging Splink: "dev_guides/debug_modes.md"
- Transpilation using sqlglot: "dev_guides/transpilation.md"
- Performance and caching:
- Caching and pipelining: "dev_guides/caching.md"
- Spark caching: "dev_guides/spark_pipelining_and_caching.md"
- Comparison and comparison level libraries:
- Creating new comparisons and comparison levels: "dev_guides/comparisons/new_library_comparisons_and_levels.md"
- Extending existing comparisons and comparison levels: "dev_guides/comparisons/extending_library_comparisons_and_levels.md"
- User-Defined Functions: "dev_guides/udfs.md"
extra_css:
- css/custom.css
extra_javascript:
- javascripts/mathjax.js
- https://polyfill.io/v3/polyfill.min.js?features=es6
- https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
extra:
analytics:
provider: google
property: G-L6JT8NK528
social:
- icon: fontawesome/brands/github
link: https://github.com/moj-analytical-services/splink
- icon: fontawesome/brands/python
link: https://pypi.org/project/splink/
- icon: fontawesome/solid/chevron-right
link: https://www.robinlinacre.com/