nd format

PhilipMay · Nov 14, 2023 · a7e0fa9 · a7e0fa9
1 parent 37a23df
commit a7e0fa9
Show file tree

Hide file tree

Showing 7 changed files with 23 additions and 13 deletions.
diff --git a/source/blog/2022-02-20-lightgbm-optuna-demo.md b/source/blog/2022-02-20-lightgbm-optuna-demo.md
@@ -4,7 +4,7 @@ This week I published a project to show how to combine
 LightGBM and Optuna efficiently to train good models.
 The purpose of this work is to be able to be reused as a template for new projects.
 
-:::{figure} ../_static/img/lightgbm-optuna.png
+:::{figure} ../\_static/img/lightgbm-optuna.png
 :width: 50 %
 
 LightGBM & Optuna

diff --git a/source/blog/2022-02-22-german-wikipedia-corpus-released.md b/source/blog/2022-02-22-german-wikipedia-corpus-released.md
@@ -2,7 +2,7 @@
 
 Today I published a new Wikipedia-based German text corpus. It is to be used for NLP machine learning tasks.
 
-:::{figure} ../_static/img/wikipedia.png
+:::{figure} ../\_static/img/wikipedia.png
 :width: 50 %
 
 Wikipedia

diff --git a/source/blog/2022-02-23-mlsum-anomalies.md b/source/blog/2022-02-23-mlsum-anomalies.md
@@ -5,7 +5,7 @@ my colleague [Michal Harakal](https://www.harakal.de/) and I noticed that in man
 sentence of the input text.
 Instead, it should generate an independent summary of the whole text.
 
-:::{figure} ../_static/img/text-unsplash.jpg
+:::{figure} ../\_static/img/text-unsplash.jpg
 :width: 50 %
 
 Photo by [Sandy Millar](https://unsplash.com/@sandym10?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash) on [Unsplash](https://unsplash.com/photos/a-close-up-of-a-book-with-some-type-of-text-Kl4LNdg6on4?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash)

diff --git a/source/blog/2022-07-23-python-conda-pip.md b/source/blog/2022-07-23-python-conda-pip.md
@@ -5,6 +5,7 @@ It is a subjective article and represents my own opinion and experience.
 The article is structured by several recommendations.
 
 ## Recommendation 1: Never install Python
+
 This sounds a bit strange but the first recommendation is to never install Python itself.
 The reason is that otherwise you would commit to a single very concrete Python version.
 However, you don't want that in principle, because there are different packages that have
@@ -13,6 +14,7 @@ different version requirements.
 But how do you install Python without installing it?
 
 ## Recommendation 2: Use conda to install and manage Python
+
 You should use [conda](https://docs.conda.io/) to install and manage Python:
 
 > Conda is an open source package management system and environment management system that
@@ -30,19 +32,22 @@ More details about the use and installation of conda you can find on my
 [conda page](/python/conda/).
 
 ## Recommendation 3: Disable conda automatic base Activation
+
 After the conda installation, the so-called base environment is automatically activated in every shell.
 If you now install a package - without explicitly activating another environment before - then
 the package will be installed into this base environment. This clutters up the base environment and
 is annoying. So to force an explicit environment activation you can disable conda automatic base activation.
 This is done with the following command: `conda config --set auto_activate_base false`
 
 ## Recommendation 4: Never install Anaconda
+
 Anaconda also includes conda. During the installation, however, numerous other packages are installed
 completely unnecessarily. This is the reason why Anaconda is just an unnecessary and
 completely bloated software that I cannot recommend to anyone.
 Nothing more needs to be said about this.
 
 ## Recommendation 5: Do not use conda to install Packages
+
 Conda can be used not only to manage environments and
 different Python versions, but also to install Python packages like NumPy or pandas.
 
@@ -57,6 +62,7 @@ Many maintainers release only unofficially or not at all a conda version of thei
 Then the conda package is maintained by someone completely different.
 
 ## Recommendation 6: Use pip to install Packages
+
 To avoid the problem described above, I always use [pip](https://pip.pypa.io/en/stable/)
 for package installation.
 Conda is then only used to create and manage the environments and to install Python.

diff --git a/source/blog/2022-10-12-date-encoding.md b/source/blog/2022-10-12-date-encoding.md
@@ -9,20 +9,23 @@ The general options to encode the time dimension like the birth date of a custom
 3. relative to "today" - e.g. number of days before today
 
 ## Pros and cons: separate encoding of year, month and maybe also day and weekday
+
 If you believe in astrology this might be your favorite to encode a birth date since the month is preserved. If you want to encode a *production date* it also might be useful to encode the weekday. That is because there might be a relation between product quality in production and weekday. Parts manufactured on Mondays may have the most severe quality variations.
 
 The disadvantage is that you need multiple columns to encode the date.
 Furthermore, this approach also suffers from a
 [concept drift](https://en.wikipedia.org/wiki/Concept_drift) problem (see below).
 
 ## Pros and cons: relative to a certain point in time in the past
+
 This is easy to calculate because the "point in time in the past" (January 1st 1900 for example) is a fixed point in time. This contrasts with the encoding which is relative to "today". But the problem with this encoding is the following:
 
 There are circumstances that in reality are not related to the date itself, but much more to the age. The remaining service life of a technical device is much more directly related to its age than to its production date. Whether a customer is interested in an airplane trip or a train ticket also depends on age and not so much on the date of birth. So, if you represent the date of birth relative to a time in the past, then the resulting model would have a built-in [concept drift](https://en.wikipedia.org/wiki/Concept_drift).
 
 For example, two predictions are made for the same person with his or her date of birth. One prediction on January 2022 and one on January 2023. Then the person is obviously one year older at the second prediction in January 2023. But this would not be visible in the encoding of the date of birth (if you encode it relative to a point in time in the past). The model would therefore experience a concept drift and would have to be re-trained.
 
 ## Pros and cons: relative to "today"
+
 This would be the encoding of choice if there is a relation between age and prediction. It would prevent the concept drift described above. The disadvantage of the coding is that the reference day "today" is very dynamic and not fixed. So you have to be very careful how you set "today".
 
 A distinction is made between the generation of the training data (validation- and testdata) and the prediction at production time. The prediction at production time is easy to understand. The "today" is just the day where the prediction is made. However, generating the training data is a bit more difficult. "Today" must not be the day on which the training data was generated. Instead, the day "today" is relative to the day on which the label was "created". The easiest way to explain this is to use an example:

diff --git a/source/index.md b/source/index.md
@@ -42,6 +42,7 @@ Talk about this model:\
 :::
 
 :::{grid-item-card} German T5 models in 3 different sizes
+
 - [GermanT5/t5-efficient-gc4-all-german-large-nl36](https://huggingface.co/GermanT5/t5-efficient-gc4-all-german-large-nl36)
 - [GermanT5/t5-efficient-gc4-german-base-nl36](https://huggingface.co/GermanT5/t5-efficient-gc4-german-base-nl36)
 - [GermanT5/t5-efficient-gc4-all-german-small-el32](https://huggingface.co/GermanT5/t5-efficient-gc4-all-german-small-el32)
@@ -103,11 +104,11 @@ Includes also a prepared corpus for English and German language.
 This repository contains two datasets:
 
 1. A labeled multi-domain (21 domains) German and
-English dataset with 25K user utterances for human-robot interaction.
-It is also available as a Hugging Face dataset:
-[deutsche-telekom/NLU-Evaluation-Data-en-de](https://huggingface.co/datasets/deutsche-telekom/NLU-Evaluation-Data-en-de)
+   English dataset with 25K user utterances for human-robot interaction.
+   It is also available as a Hugging Face dataset:
+   [deutsche-telekom/NLU-Evaluation-Data-en-de](https://huggingface.co/datasets/deutsche-telekom/NLU-Evaluation-Data-en-de)
 2. A dataset with 1,127 German sentence pairs with a similarity score. The sentences originate from the first data set.
-:::
+   :::
 
 :::{grid-item-card} [deutsche-telekom/NLU-few-shot-benchmark-en-de](https://huggingface.co/datasets/deutsche-telekom/NLU-few-shot-benchmark-en-de)
 This is a few-shot training dataset from the domain of human-robot interaction.
@@ -197,22 +198,22 @@ An [Arch Linux](https://archlinux.org/) package ([AUR](https://wiki.archlinux.or
 - refactor slow sentencepiece tokenizers and add tests: [#11716](https://github.com/huggingface/transformers/pull/11716),
   [#11737](https://github.com/huggingface/transformers/pull/11737)
 - [more fixes and improvements](https://github.com/huggingface/transformers/pulls?q=is%3Apr+author%3APhilipMay)
-:::
+  :::
 
 :::{grid-item-card} [Optuna](https://github.com/optuna/optuna)
 
 - add MLflow integration callback: [#1028](https://github.com/optuna/optuna/pull/1028)
 - trial level suggest for same variable with different parameters give warning: [#908](https://github.com/optuna/optuna/pull/908)
 - [more fixes and improvements](https://github.com/optuna/optuna/pulls?q=is%3Apr+author%3APhilipMay)
-:::
+  :::
 
 :::{grid-item-card} [Sentence Transformers](https://github.com/UKPLab/sentence-transformers)
 
 - add callback so we can do pruning and check for nan values: [#327](https://github.com/UKPLab/sentence-transformers/pull/327)
 - add option to pass params to tokenizer: [#342](https://github.com/UKPLab/sentence-transformers/pull/342)
 - always store best_score: [#439](https://github.com/UKPLab/sentence-transformers/pull/439)
 - fix for OOM problems on GPU with large datasets: [#525](https://github.com/UKPLab/sentence-transformers/pull/525)
-:::
+  :::
 
 :::{grid-item-card} [SetFit - Efficient Few-shot Learning with Sentence Transformers](https://github.com/huggingface/setfit)
 
@@ -222,7 +223,7 @@ An [Arch Linux](https://archlinux.org/) package ([AUR](https://wiki.archlinux.or
 - add option to use amp / FP16 [#134](https://github.com/huggingface/setfit/pull/134)
 - add num_epochs to train_step calculation [#139](https://github.com/huggingface/setfit/pull/134)
 - add more loss function options [#159](https://github.com/huggingface/setfit/pull/159)
-:::
+  :::
 
 :::{grid-item-card} Other Fixes and Improvements
 
@@ -233,7 +234,7 @@ An [Arch Linux](https://archlinux.org/) package ([AUR](https://wiki.archlinux.or
 - [deepset-ai/FARM](https://github.com/deepset-ai/FARM): [various fixes and improvements](https://github.com/deepset-ai/FARM/pulls?q=is%3Apr+author%3APhilipMay)
 - [hyperopt/hyperopt](https://github.com/hyperopt/hyperopt): add progressbar with tqdm [#455](https://github.com/hyperopt/hyperopt/pull/455)
 - [mlflow/mlflow](https://github.com/mlflow/mlflow): add possibility to use client cert. with tracking API [#2843](https://github.com/mlflow/mlflow/pull/2843)
-:::
+  :::
 
 ::::
 

diff --git a/source/it/freifunk.md b/source/it/freifunk.md
@@ -88,7 +88,7 @@
   - USB Buchse ausgelötet - siehe Foto unten
   - WPS und WLAN Schalter abgekniffen - siehe Foto unten
 
-:::{figure} ../_static/img/passiv-poe-umbau-fritz-box-4020.jpg
+:::{figure} ../\_static/img/passiv-poe-umbau-fritz-box-4020.jpg
 
 Photo of the hardware modification
 :::