diff --git a/adjust_logs.html b/adjust_logs.html index 043968e..45def7a 100644 --- a/adjust_logs.html +++ b/adjust_logs.html @@ -13,15 +13,15 @@
eventlog()
/activitylog()
set_
-functionsset_
-functionsRead more:
%>%
patients process_map(type = frequency("relative_case"),
sec = frequency("absolute"))
Both primary and secondary layers can be differentiated between nodes and edges.
%>%
@@ -685,8 +685,8 @@ patients Adding secondary information
type_edges = performance(units = "hours"),
sec_nodes = frequency("absolute"),
sec_edges = performance(FUN = max, units = "hours"))
%>%
patients process_map(type_nodes = frequency("relative_case", color_scale = "PuBu"),
type_edges = performance(mean, color_edges = "dodgerblue4"))
eventdataR
package.
A basic animation with static color and token size:
animate_process(patients)
Default token color, size, or image can be changed as follows:
animate_process(patients, mapping = token_aes(size = token_scale(12), shape = "rect"))
animate_process(patients, mapping = token_aes(color = token_scale("red")))
The example animation on the top of this site:
animate_process(patients, mode = "relative", jitter = 10, legend = "color",
mapping = token_aes(color = token_scale("employee",
scale = "ordinal",
range = RColorBrewer::brewer.pal(7, "Paired"))))
Tokens can also be assigned images, for example:
animate_process(patients,
mapping = token_aes(shape = "image",
size = token_scale(10),
image = token_scale("https://upload.wikimedia.org/wikipedia/en/5/5f/Pacman.gif")))
It is possible to use a secondary data frame to determine the @@ -730,8 +730,8 @@
processanimateR
animation can be also used interactively
as part of a (Shiny) web-application. Here, an example application that
expects attributes are of an appropriate data type and automatically
@@ -996,8 +996,8 @@
The colors can be adjusted by the range
argument. In
this case the scale is reversed with rev()
to go from blue
to red. See Ordinal scales
mapping = token_aes(color = token_scale("employee",
scale = "ordinal",
range = RColorBrewer::brewer.pal(8, "Paired"))))
Source: https://bupaverse.github.io/processanimateR/
Feeding the resulting table back to traffic_fines
with
augment()
makes the trace length metric available as a case
attribute for further analysis.
%>%
patients processing_time(level = "activity", units = "hours")
## # A tibble: 7 × 11
-## handling min q1 mean median q3 max st_dev iqr total relat…¹
-## <fct> <drt> <drt> <drt> <drtn> <drt> <drt> <dbl> <dbl> <drt> <dbl>
-## 1 Registration 0.82… 2.0… 2.7… 2.71… 3.4… 5.6… 0.954 1.33 1376… 0.184
-## 2 Triage and As… 5.86… 11.3… 13.1… 13.34… 15.0… 18.8… 2.76 3.68 6552… 0.184
-## 3 Discuss Resul… 1.33… 2.3… 2.7… 2.77… 3.2… 4.5… 0.628 0.906 1374… 0.182
-## 4 Check-out 0.66… 1.6… 2.0… 2.07… 2.4… 3.8… 0.620 0.860 1014… 0.181
-## 5 X-Ray 2.29… 3.8… 4.8… 4.79… 5.6… 8.1… 1.28 1.76 1264… 0.0959
-## 6 Blood test 3.08… 4.7… 5.5… 5.46… 6.2… 8.1… 1.06 1.51 1311… 0.0871
-## 7 MRI SCAN 2.48… 3.6… 4.1… 4.09… 4.6… 5.9… 0.735 1.09 979… 0.0867
-## # … with abbreviated variable name ¹relative_frequency
+## handling min q1 mean median q3 max st_dev iqr total
+## <fct> <drtn> <drt> <drt> <drtn> <drt> <drt> <dbl> <dbl> <drt>
+## 1 Registration 0.828… 2.0… 2.7… 2.71… 3.4… 5.6… 0.954 1.33 1376…
+## 2 Triage and Assessment 5.868… 11.3… 13.1… 13.34… 15.0… 18.8… 2.76 3.68 6552…
+## 3 Discuss Results 1.333… 2.3… 2.7… 2.77… 3.2… 4.5… 0.628 0.906 1374…
+## 4 Check-out 0.667… 1.6… 2.0… 2.07… 2.4… 3.8… 0.620 0.860 1014…
+## 5 X-Ray 2.294… 3.8… 4.8… 4.79… 5.6… 8.1… 1.28 1.76 1264…
+## 6 Blood test 3.089… 4.7… 5.5… 5.46… 6.2… 8.1… 1.06 1.51 1311…
+## 7 MRI SCAN 2.489… 3.6… 4.1… 4.09… 4.6… 5.9… 0.735 1.09 979…
+## # ℹ 1 more variable: relative_frequency <dbl>
Calling augment
without any further arguments will add
all columns, from min until relative_frequency to the
data.
Naturally, both conditions can use any of the available variables. The following selects all cases that started between midnight and 6am. Note that no condition is applied on the end activity instance using the @@ -1134,23 +1129,23 @@
The interval
can be defined as half-open using
NA
for the first or second element. Below select cases
where payment is followed after 4 weeks.
Note that we can also use reverse = TRUE
. However, this
will also include cases where Create Fine is
not followed by Payment at all. Therefore, the
@@ -1211,23 +1206,23 @@
%>% process_map() traffic_fines
In this map, we can observe several unique directly follows relations, as well as flows occurring only 2 or 3 times. Using the filter, we can remove the cases that lead to these flows as follows:
%>%
traffic_fines filter_infrequent_flows(min_n = 5) %>%
process_map()
We can immediately observe less very infrequent flows in the process map.
It is important to note that filter_infrequent_flows()
@@ -1378,12 +1373,11 @@
## # A tibble: 3 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 Registration,Triage and Assessment,Blood test,MRI SCAN,Discus… 234 0.987
-## 2 Registration,Triage and Assessment,Blood test,MRI SCAN,Discus… 2 0.00844
-## 3 Registration,Triage and Assessment,Blood test 1 0.00422
-## # … with abbreviated variable names ¹absolute_frequency, ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 Registration,Triage and Assessment,Bloo… 234 0.987
+## 2 Registration,Triage and Assessment,Bloo… 2 0.00844
+## 3 Registration,Triage and Assessment,Bloo… 1 0.00422
The following selects cases where Triage and Assessment is eventually followed by both Blood test and X-Ray, which never happens.
@@ -1405,15 +1399,14 @@## # A tibble: 6 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 Registration,Triage and Assessment,X-Ray,Discuss Results,Chec… 258 0.518
-## 2 Registration,Triage and Assessment,Blood test,MRI SCAN,Discus… 234 0.470
-## 3 Registration,Triage and Assessment,Blood test,MRI SCAN,Discus… 2 0.00402
-## 4 Registration,Triage and Assessment,X-Ray 2 0.00402
-## 5 Registration,Triage and Assessment,X-Ray,Discuss Results 1 0.00201
-## 6 Registration,Triage and Assessment,Blood test 1 0.00201
-## # … with abbreviated variable names ¹absolute_frequency, ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 Registration,Triage and Assessment,X-Ra… 258 0.518
+## 2 Registration,Triage and Assessment,Bloo… 234 0.470
+## 3 Registration,Triage and Assessment,Bloo… 2 0.00402
+## 4 Registration,Triage and Assessment,X-Ray 2 0.00402
+## 5 Registration,Triage and Assessment,X-Ra… 1 0.00201
+## 6 Registration,Triage and Assessment,Bloo… 1 0.00201
This final example only retains cases where Triage and Assessment is not followed by any of the three consequent activities. The result is 2 incomplete cases where the last activity was @@ -1450,7 +1443,7 @@
## EMPTY EVENT LOG
## # A tibble: 0 × 20
-## # … with 20 variables: case_id <chr>, activity <fct>, lifecycle <fct>,
+## # ℹ 20 variables: case_id <chr>, activity <fct>, lifecycle <fct>,
## # resource <fct>, timestamp <dttm>, amount <chr>, article <dbl>,
## # dismissal <chr>, expense <chr>, lastsent <chr>, matricola <dbl>,
## # notificationtype <chr>, paymentamount <dbl>, points <dbl>,
@@ -1710,52 +1703,49 @@ Interval-based
filter_trace_frequency(interval = c(10,50)) %>%
traces()
## # A tibble: 5 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 ER Registration,ER Triage,ER Sepsis Triage 35 0.333
-## 2 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP 24 0.229
-## 3 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes 22 0.210
-## 4 ER Registration,ER Triage,ER Sepsis Triage,CRP,LacticAcid,Leu… 13 0.124
-## 5 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,Lac… 11 0.105
-## # … with abbreviated variable names ¹absolute_frequency, ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 ER Registration,ER Triage,ER Sepsis Tri… 35 0.333
+## 2 ER Registration,ER Triage,ER Sepsis Tri… 24 0.229
+## 3 ER Registration,ER Triage,ER Sepsis Tri… 22 0.210
+## 4 ER Registration,ER Triage,ER Sepsis Tri… 13 0.124
+## 5 ER Registration,ER Triage,ER Sepsis Tri… 11 0.105
Also here you can use half-open intervals.
%>%
sepsis filter_trace_frequency(interval = c(5,NA)) %>%
traces()
## # A tibble: 11 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 ER Registration,ER Triage,ER Sepsis Triage 35 0.248
-## 2 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP 24 0.170
-## 3 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes 22 0.156
-## 4 ER Registration,ER Triage,ER Sepsis Triage,CRP,LacticAcid,Le… 13 0.0922
-## 5 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 11 0.0780
-## 6 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 9 0.0638
-## 7 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,Lactic… 7 0.0496
-## 8 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,Ad… 5 0.0355
-## 9 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 5 0.0355
-## 10 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes,La… 5 0.0355
-## 11 ER Registration,ER Triage,CRP,Leucocytes,ER Sepsis Triage 5 0.0355
-## # … with abbreviated variable names ¹absolute_frequency, ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 ER Registration,ER Triage,ER Sepsis Tr… 35 0.248
+## 2 ER Registration,ER Triage,ER Sepsis Tr… 24 0.170
+## 3 ER Registration,ER Triage,ER Sepsis Tr… 22 0.156
+## 4 ER Registration,ER Triage,ER Sepsis Tr… 13 0.0922
+## 5 ER Registration,ER Triage,ER Sepsis Tr… 11 0.0780
+## 6 ER Registration,ER Triage,ER Sepsis Tr… 9 0.0638
+## 7 ER Registration,ER Triage,ER Sepsis Tr… 7 0.0496
+## 8 ER Registration,ER Triage,ER Sepsis Tr… 5 0.0355
+## 9 ER Registration,ER Triage,ER Sepsis Tr… 5 0.0355
+## 10 ER Registration,ER Triage,ER Sepsis Tr… 5 0.0355
+## 11 ER Registration,ER Triage,CRP,Leucocyt… 5 0.0355
And use reverse = TRUE
.
%>%
sepsis filter_trace_frequency(interval = c(5,NA), reverse = TRUE) %>%
traces()
## # A tibble: 835 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 ER Registration,ER Triage,ER Sepsis Triage,CRP,LacticAcid,Le… 4 0.00440
-## 2 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 4 0.00440
-## 3 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,CRP,Leu… 4 0.00440
-## 4 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes,Ad… 4 0.00440
-## 5 ER Registration,ER Triage,Leucocytes,CRP,ER Sepsis Triage 4 0.00440
-## 6 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 4 0.00440
-## 7 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,Lactic… 4 0.00440
-## 8 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,Leucocy… 3 0.00330
-## 9 ER Registration,ER Triage,LacticAcid,Leucocytes,CRP,ER Sepsi… 3 0.00330
-## 10 ER Registration,ER Triage,CRP,LacticAcid,Leucocytes,ER Sepsi… 3 0.00330
-## # … with 825 more rows, and abbreviated variable names ¹absolute_frequency,
-## # ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00440
+## 2 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00440
+## 3 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00440
+## 4 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00440
+## 5 ER Registration,ER Triage,Leucocytes,C… 4 0.00440
+## 6 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00440
+## 7 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00440
+## 8 ER Registration,ER Triage,ER Sepsis Tr… 3 0.00330
+## 9 ER Registration,ER Triage,LacticAcid,L… 3 0.00330
+## 10 ER Registration,ER Triage,CRP,LacticAc… 3 0.00330
+## # ℹ 825 more rows
## # A tibble: 846 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 ER Registration,ER Triage,ER Sepsis Triage 35 0.0333
-## 2 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP 24 0.0229
-## 3 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes 22 0.0210
-## 4 ER Registration,ER Triage,ER Sepsis Triage,CRP,LacticAcid,Le… 13 0.0124
-## 5 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 11 0.0105
-## 6 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 9 0.00857
-## 7 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,Lactic… 7 0.00667
-## 8 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,Ad… 5 0.00476
-## 9 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 5 0.00476
-## 10 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes,La… 5 0.00476
-## # … with 836 more rows, and abbreviated variable names ¹absolute_frequency,
-## # ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 ER Registration,ER Triage,ER Sepsis Tr… 35 0.0333
+## 2 ER Registration,ER Triage,ER Sepsis Tr… 24 0.0229
+## 3 ER Registration,ER Triage,ER Sepsis Tr… 22 0.0210
+## 4 ER Registration,ER Triage,ER Sepsis Tr… 13 0.0124
+## 5 ER Registration,ER Triage,ER Sepsis Tr… 11 0.0105
+## 6 ER Registration,ER Triage,ER Sepsis Tr… 9 0.00857
+## 7 ER Registration,ER Triage,ER Sepsis Tr… 7 0.00667
+## 8 ER Registration,ER Triage,ER Sepsis Tr… 5 0.00476
+## 9 ER Registration,ER Triage,ER Sepsis Tr… 5 0.00476
+## 10 ER Registration,ER Triage,ER Sepsis Tr… 5 0.00476
+## # ℹ 836 more rows
You can again set reverse = TRUE
if you instead want 80%
of the cases with the lowest frequency.
%>%
sepsis filter_trace_frequency(percentage = 0.2, reverse = TRUE) %>%
traces()
## # A tibble: 784 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 1 0.00128
-## 2 ER Registration,ER Triage,ER Sepsis Triage,IV Antibiotics,Le… 1 0.00128
-## 3 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes,IV… 1 0.00128
-## 4 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 1 0.00128
-## 5 ER Registration,IV Liquid,ER Triage,Leucocytes,CRP,LacticAci… 1 0.00128
-## 6 ER Registration,ER Triage,ER Sepsis Triage,CRP,LacticAcid,Le… 1 0.00128
-## 7 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 1 0.00128
-## 8 ER Registration,ER Triage,CRP,LacticAcid,Leucocytes,ER Sepsi… 1 0.00128
-## 9 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 1 0.00128
-## 10 ER Registration,ER Triage,Leucocytes,CRP,LacticAcid,ER Sepsi… 1 0.00128
-## # … with 774 more rows, and abbreviated variable names ¹absolute_frequency,
-## # ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 ER Registration,ER Triage,ER Sepsis Tr… 1 0.00128
+## 2 ER Registration,ER Triage,ER Sepsis Tr… 1 0.00128
+## 3 ER Registration,ER Triage,ER Sepsis Tr… 1 0.00128
+## 4 ER Registration,ER Triage,ER Sepsis Tr… 1 0.00128
+## 5 ER Registration,IV Liquid,ER Triage,Le… 1 0.00128
+## 6 ER Registration,ER Triage,ER Sepsis Tr… 1 0.00128
+## 7 ER Registration,ER Triage,ER Sepsis Tr… 1 0.00128
+## 8 ER Registration,ER Triage,CRP,LacticAc… 1 0.00128
+## 9 ER Registration,ER Triage,ER Sepsis Tr… 1 0.00128
+## 10 ER Registration,ER Triage,Leucocytes,C… 1 0.00128
+## # ℹ 774 more rows
Note that the obtained percentage of cases will not always be exactly
the specified percentage, as there can be ties. For example, in the
sepsis
data set, 784 of the 1050 cases (75%) follow a
@@ -1838,60 +1826,57 @@
## # A tibble: 703 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 4 0.00540
-## 2 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 4 0.00540
-## 3 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,Lactic… 4 0.00540
-## 4 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,IV Anti… 3 0.00405
-## 5 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 3 0.00405
-## 6 ER Registration,ER Triage,ER Sepsis Triage,CRP,LacticAcid,Le… 3 0.00405
-## 7 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,Leucocy… 3 0.00405
-## 8 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 2 0.00270
-## 9 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,CRP,Le… 2 0.00270
-## 10 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 2 0.00270
-## # … with 693 more rows, and abbreviated variable names ¹absolute_frequency,
-## # ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00540
+## 2 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00540
+## 3 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00540
+## 4 ER Registration,ER Triage,ER Sepsis Tr… 3 0.00405
+## 5 ER Registration,ER Triage,ER Sepsis Tr… 3 0.00405
+## 6 ER Registration,ER Triage,ER Sepsis Tr… 3 0.00405
+## 7 ER Registration,ER Triage,ER Sepsis Tr… 3 0.00405
+## 8 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00270
+## 9 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00270
+## 10 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00270
+## # ℹ 693 more rows
Also here you can use half-open intervals.
%>%
sepsis filter_trace_length(interval = c(10,NA)) %>%
traces()
## # A tibble: 715 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 4 0.00531
-## 2 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 4 0.00531
-## 3 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,Lactic… 4 0.00531
-## 4 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,IV Anti… 3 0.00398
-## 5 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 3 0.00398
-## 6 ER Registration,ER Triage,ER Sepsis Triage,CRP,LacticAcid,Le… 3 0.00398
-## 7 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,Leucocy… 3 0.00398
-## 8 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 2 0.00266
-## 9 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,CRP,Le… 2 0.00266
-## 10 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 2 0.00266
-## # … with 705 more rows, and abbreviated variable names ¹absolute_frequency,
-## # ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00531
+## 2 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00531
+## 3 ER Registration,ER Triage,ER Sepsis Tr… 4 0.00531
+## 4 ER Registration,ER Triage,ER Sepsis Tr… 3 0.00398
+## 5 ER Registration,ER Triage,ER Sepsis Tr… 3 0.00398
+## 6 ER Registration,ER Triage,ER Sepsis Tr… 3 0.00398
+## 7 ER Registration,ER Triage,ER Sepsis Tr… 3 0.00398
+## 8 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00266
+## 9 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00266
+## 10 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00266
+## # ℹ 705 more rows
And use reverse = TRUE
.
%>%
sepsis filter_trace_length(interval = c(10,NA), reverse = TRUE) %>%
traces()
## # A tibble: 131 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 ER Registration,ER Triage,ER Sepsis Triage 35 0.118
-## 2 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP 24 0.0808
-## 3 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes 22 0.0741
-## 4 ER Registration,ER Triage,ER Sepsis Triage,CRP,LacticAcid,Le… 13 0.0438
-## 5 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 11 0.0370
-## 6 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 9 0.0303
-## 7 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,Lactic… 7 0.0236
-## 8 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,Ad… 5 0.0168
-## 9 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 5 0.0168
-## 10 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes,La… 5 0.0168
-## # … with 121 more rows, and abbreviated variable names ¹absolute_frequency,
-## # ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 ER Registration,ER Triage,ER Sepsis Tr… 35 0.118
+## 2 ER Registration,ER Triage,ER Sepsis Tr… 24 0.0808
+## 3 ER Registration,ER Triage,ER Sepsis Tr… 22 0.0741
+## 4 ER Registration,ER Triage,ER Sepsis Tr… 13 0.0438
+## 5 ER Registration,ER Triage,ER Sepsis Tr… 11 0.0370
+## 6 ER Registration,ER Triage,ER Sepsis Tr… 9 0.0303
+## 7 ER Registration,ER Triage,ER Sepsis Tr… 7 0.0236
+## 8 ER Registration,ER Triage,ER Sepsis Tr… 5 0.0168
+## 9 ER Registration,ER Triage,ER Sepsis Tr… 5 0.0168
+## 10 ER Registration,ER Triage,ER Sepsis Tr… 5 0.0168
+## # ℹ 121 more rows
## # A tibble: 514 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 2 0.00381
-## 2 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,IV Anti… 2 0.00381
-## 3 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 2 0.00381
-## 4 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 2 0.00381
-## 5 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 2 0.00381
-## 6 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes,La… 2 0.00381
-## 7 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 2 0.00381
-## 8 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,Leucocy… 2 0.00381
-## 9 ER Registration,ER Triage,ER Sepsis Triage,CRP,LacticAcid,Le… 2 0.00381
-## 10 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,CRP,Lac… 2 0.00381
-## # … with 504 more rows, and abbreviated variable names ¹absolute_frequency,
-## # ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00381
+## 2 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00381
+## 3 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00381
+## 4 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00381
+## 5 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00381
+## 6 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00381
+## 7 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00381
+## 8 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00381
+## 9 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00381
+## 10 ER Registration,ER Triage,ER Sepsis Tr… 2 0.00381
+## # ℹ 504 more rows
You can again set reverse = TRUE
if you instead want 50%
of the cases with the lowest frequency.
Percentage-based filter_trace_length(percentage = 0.5, reverse = TRUE) %>% traces()
## # A tibble: 337 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 ER Registration,ER Triage,ER Sepsis Triage 35 0.0667
-## 2 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP 24 0.0457
-## 3 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes 22 0.0419
-## 4 ER Registration,ER Triage,ER Sepsis Triage,CRP,LacticAcid,Le… 13 0.0248
-## 5 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 11 0.0210
-## 6 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,La… 9 0.0171
-## 7 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,Lactic… 7 0.0133
-## 8 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,Ad… 5 0.00952
-## 9 ER Registration,ER Triage,ER Sepsis Triage,LacticAcid,Leucoc… 5 0.00952
-## 10 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucocytes,La… 5 0.00952
-## # … with 327 more rows, and abbreviated variable names ¹absolute_frequency,
-## # ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 ER Registration,ER Triage,ER Sepsis Tr… 35 0.0667
+## 2 ER Registration,ER Triage,ER Sepsis Tr… 24 0.0457
+## 3 ER Registration,ER Triage,ER Sepsis Tr… 22 0.0419
+## 4 ER Registration,ER Triage,ER Sepsis Tr… 13 0.0248
+## 5 ER Registration,ER Triage,ER Sepsis Tr… 11 0.0210
+## 6 ER Registration,ER Triage,ER Sepsis Tr… 9 0.0171
+## 7 ER Registration,ER Triage,ER Sepsis Tr… 7 0.0133
+## 8 ER Registration,ER Triage,ER Sepsis Tr… 5 0.00952
+## 9 ER Registration,ER Triage,ER Sepsis Tr… 5 0.00952
+## 10 ER Registration,ER Triage,ER Sepsis Tr… 5 0.00952
+## # ℹ 327 more rows
Note that the obtained percentage of cases will not always be exactly the specified percentage, as there can be ties.
diff --git a/case_filters_files/figure-html/unnamed-chunk-20-1.png b/case_filters_files/figure-html/unnamed-chunk-20-1.png index 27cfc2b..0f350cc 100644 Binary files a/case_filters_files/figure-html/unnamed-chunk-20-1.png and b/case_filters_files/figure-html/unnamed-chunk-20-1.png differ diff --git a/collapse.html b/collapse.html index 4fc51b3..11ac204 100644 --- a/collapse.html +++ b/collapse.html @@ -13,19 +13,19 @@Let’s say we want to combine the activities Blood test, MRI SCAN and X-Ray scan into a single Examination activity. This can be done as follows:
%>%
patients act_collapse(Examination = c("Blood test","MRI SCAN","X-Ray")) %>%
process_map()
Read more:
-library(bupaR)
Activity presence shows in what percentage of cases an activity is present. It has no level-argument.
-%>% activity_presence() %>%
- patients plot
%>% activity_presence() %>%
+ patients plot
The frequency of activities can be calculated using the activity_frequency function, at the levels log, trace and activity.
-%>%
- patients activity_frequency("activity")
%>%
+ patients activity_frequency("activity")
## # A tibble: 7 × 3
## handling absolute relative
## <fct> <int> <dbl>
@@ -675,13 +678,13 @@ Activity Frequency
## 6 Blood test 237 0.0871
## 7 MRI SCAN 236 0.0867
The start of cases can be described using the start_activities function. Available levels are activity, case, log, resource and resource activity.
-%>%
- patients start_activities("resource-activity")
%>%
+ patients start_activities("resource-activity")
## # A tibble: 1 × 5
## employee handling absolute relative cum_sum
## <fct> <fct> <int> <dbl> <dbl>
@@ -689,13 +692,13 @@ Start Activities
This shows that in this event log, all cases are started with the
Registration by resource r1.
Conversely, the end_activities functions describes the end of cases, using the same levels: log, case, activity, resource and resource-activity.
-%>%
- patients end_activities("resource-activity")
%>%
+ patients end_activities("resource-activity")
## # A tibble: 5 × 5
## employee handling absolute relative cum_sum
## <fct> <fct> <int> <dbl> <dbl>
@@ -707,37 +710,81 @@ End Activities
In contract to the start of cases, the end of cases seems to differ
more frequently, although it is mostly the Check-Out activity.
The trace coverage metric shows the relationship between the number of different activity sequences (i.e. traces) and the number of cases they cover.
-%>%
- patients trace_coverage("trace") %>%
- plot()
%>%
+ patients trace_coverage("trace") %>%
+ plot()
In the patients log, there are only 7 different traces, and 2 of them cover nearly 100% of the event log.
The trace length metric describes the length of traces, i.e. the number of activity instances for each case. It can be computed at the levels case, trace and log.
-%>%
- patients trace_length("log") %>%
- plot
%>%
+ patients trace_length("log") %>%
+ plot
It can be seen that in this simple event log, most cases have a trace length of 5 or 6, while a minority has a trace length lower than 5.
Documentation coming soon
+Several metrics to measure rework (repeated work) are provided by +bupaR. A distinction is made between self-loops and repetitions. A +self-loop is an immediate recurrence of the same activity (i.e. no other +activity in between), while a repetition is a recurrence after some +other activities.
+The metrics number_of_repetitions
and
+number_of_selfloops
can be used to analyse these
+occurrences at the levels of log, case, activity, resource and
+resource-activity. The metrics size_of_repetitions
and
+size_of_selfloops
(available at the same levels) provide
+further insight into the extent of the repeats within a single case
+(e.g. is it repeated only once, or multiple times?). Finally, all these
+metrics are able to distinguish between two types of rework:
+repeat rework, where the same resource does the rework, and
+redo rework, where the rework is done by another research. This
+can be specified with the type
argument. Specifying
+type = all
makes no distinction based on resources.
%>%
+ sepsis number_of_repetitions()
## # A tibble: 1 × 8
+## min q1 median mean q3 max st_dev iqr
+## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+## 1 0 0 2 1.64 3 5 1.28 3
+%>%
+ sepsis number_of_selfloops()
## # A tibble: 1 × 8
+## min q1 median mean q3 max st_dev iqr
+## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+## 1 0 0 0 0.827 1 33 1.82 1
+%>%
+ sepsis size_of_repetitions()
## Using default type: all
+## Using default level: log
+## # A tibble: 1 × 8
+## min q1 median mean q3 max st_dev iqr
+## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+## 1 1 1 2 2.67 3 58 3.72 2
+%>%
+ sepsis size_of_selfloops()
## Using default type: all
+## Using default level: log
+## # A tibble: 1 × 8
+## min q1 median mean q3 max st_dev iqr
+## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+## 1 1 1 1 1.19 1 8 0.717 0
+Using the packages processcheckr
prodecural rules can be
checked in an event log. Checking rules will add a boolean case
attribute, which can be used for filtering or in analysis.
In the following example, the first rule checks the starting activity, while the second rule checks whether CRP and LacticAcid occur together.
-library(bupaR)
-library(processcheckR)
-%>%
- sepsis # check if cases starts with "ER Registration"
- check_rule(starts("ER Registration"), label = "r1") %>%
- # check if activities "CRP" and "LacticAcid" occur together
- check_rule(and("CRP","LacticAcid"), label = "r2") %>%
- group_by(r1, r2) %>%
- n_cases()
library(bupaR)
+library(processcheckR)
+%>%
+ sepsis # check if cases starts with "ER Registration"
+ check_rule(starts("ER Registration"), label = "r1") %>%
+ # check if activities "CRP" and "LacticAcid" occur together
+ check_rule(and("CRP","LacticAcid"), label = "r2") %>%
+ group_by(r1, r2) %>%
+ n_cases()
## # A tibble: 4 × 3
## r1 r2 n_cases
## <lgl> <lgl> <int>
@@ -769,12 +816,12 @@ Checking multiple rules
Using the function check_rules
, multiple rules can be
checked with one function call, by providing them as named arguments.
The following code is equivalent to that above.
-%>%
- sepsis check_rules(
- r1 = starts("ER Registration"),
- r2 = and("CRP","LacticAcid")) %>%
- group_by(r1, r2) %>%
- n_cases()
+%>%
+ sepsis check_rules(
+ r1 = starts("ER Registration"),
+ r2 = and("CRP","LacticAcid")) %>%
+ group_by(r1, r2) %>%
+ n_cases()
## # A tibble: 4 × 3
## r1 r2 n_cases
## <lgl> <lgl> <int>
@@ -788,16 +835,15 @@ Rule-based filtering
Instead of adding logical values for each rule, you can also
immediately filter the cases which adhere to one or more rules, using
the filter_rules
-%>%
- sepsis filter_rules(
- r1 = starts("ER Registration"),
- r2 = and("CRP","LacticAcid")) %>%
- n_cases()
+%>%
+ sepsis filter_rules(
+ r1 = starts("ER Registration"),
+ r2 = and("CRP","LacticAcid")) %>%
+ n_cases()
## [1] 858
Currently the following declarative rules can be checked:
Cardinality rules:
The available rules are explained in more detail below.
-Arguments:
activity
: a single activity name.[Example] How many cases have three or more occurences of Leucocytes?
-%>%
- sepsis check_rule(processcheckR::contains("Leucocytes", n = 3)) %>%
- group_by(contains_Leucocytes_3) %>%
- n_cases()
%>%
+ sepsis check_rule(processcheckR::contains("Leucocytes", n = 3)) %>%
+ group_by(contains_Leucocytes_3) %>%
+ n_cases()
## # A tibble: 2 × 2
## contains_Leucocytes_3 n_cases
## <lgl> <int>
## 1 FALSE 590
## 2 TRUE 460
Arguments:
activity
: a single activity name.[Example] How many cases have exactly four more occurences of Leucocytes?
-%>%
- sepsis check_rule(contains_exactly("Leucocytes", n = 4), label = "r1") %>%
- group_by(r1) %>%
- n_cases()
%>%
+ sepsis check_rule(contains_exactly("Leucocytes", n = 4), label = "r1") %>%
+ group_by(r1) %>%
+ n_cases()
## # A tibble: 2 × 2
## r1 n_cases
## <lgl> <int>
@@ -873,8 +919,8 @@ contains_exactly
## 2 TRUE 90
Returns: cases where activity
occurs n
.
Arguments:
activity
: a single activity name.min
and max
times.
[Example] How many cases have between 0 and 10 occurences of Leucocytes?
-%>%
- sepsis check_rule(contains_between("Leucocytes", min = 0, max = 10), label = "r1") %>%
- group_by(r1) %>%
- n_cases()
%>%
+ sepsis check_rule(contains_between("Leucocytes", min = 0, max = 10), label = "r1") %>%
+ group_by(r1) %>%
+ n_cases()
## # A tibble: 2 × 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 38
## 2 TRUE 1012
Arguments:
activity
: a single activity name.contains_between(min = 0, max = x)
[Example] How many cases have between 0 and 10 occurences of Leucocytes?
-%>%
- sepsis check_rule(absent("Leucocytes", n = 10), label = "r1") %>%
- group_by(r1) %>%
- n_cases()
%>%
+ sepsis check_rule(absent("Leucocytes", n = 10), label = "r1") %>%
+ group_by(r1) %>%
+ n_cases()
## # A tibble: 2 × 2
## r1 n_cases
## <lgl> <int>
@@ -922,46 +968,46 @@ absent
## 2 TRUE 38
Arguments:
activity
: a single activity nameReturns: cases that start with activity
.
[Example] How many cases start with “ER Registration”
-%>%
- sepsis check_rule(starts("ER Registration"), label = "r1") %>%
- group_by(r1) %>%
- n_cases()
%>%
+ sepsis check_rule(starts("ER Registration"), label = "r1") %>%
+ group_by(r1) %>%
+ n_cases()
## # A tibble: 2 × 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 55
## 2 TRUE 995
Arguments:
activity
: a single activity nameReturns: cases that end with activity
.
[Example] How many cases end with “Release A”
-%>%
- sepsis check_rule(ends("Release A"), label = "r1") %>%
- group_by(r1) %>%
- n_cases()
%>%
+ sepsis check_rule(ends("Release A"), label = "r1") %>%
+ group_by(r1) %>%
+ n_cases()
## # A tibble: 2 × 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 657
## 2 TRUE 393
Arguments:
activity_a
: a single activity nameactivity_b
, if
either activity_a
or activity_b
occurs.
[Example] How many cases is “ER Sepsis Triage” succeeded by “CRP”
-%>%
- sepsis check_rule(succession("ER Sepsis Triage","CRP"), label = "r1") %>%
- group_by(r1) %>%
- n_cases()
%>%
+ sepsis check_rule(succession("ER Sepsis Triage","CRP"), label = "r1") %>%
+ group_by(r1) %>%
+ n_cases()
## # A tibble: 2 × 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 229
## 2 TRUE 821
Arguments:
activity_a
: a single activity nameactivity_b
, if
activity_a
occurs. [Example] How many cases is “ER Sepsis
Triage” followed by “CRP”, if “ER Sespis Triage” occurs.
-%>%
- sepsis check_rule(response("ER Sepsis Triage","CRP"), label = "r1") %>%
- group_by(r1) %>%
- n_cases()
%>%
+ sepsis check_rule(response("ER Sepsis Triage","CRP"), label = "r1") %>%
+ group_by(r1) %>%
+ n_cases()
## # A tibble: 2 × 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 106
## 2 TRUE 944
Arguments:
activity_a
: a single activity nameactivity_b
occurs.
[Example] How many cases is “CRP” preceded “ER Sepsis Triage”, if “CPR” occurs.
-%>%
- sepsis check_rule(precedence("ER Sepsis Triage","CRP"), label = "r1") %>%
- group_by(r1) %>%
- n_cases()
%>%
+ sepsis check_rule(precedence("ER Sepsis Triage","CRP"), label = "r1") %>%
+ group_by(r1) %>%
+ n_cases()
## # A tibble: 2 × 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 186
## 2 TRUE 864
Arguments:
activity_a
: a single activity nameactivity_b
occurs (but not vice versa)
[Example] How many cases contain both “CRP” and “ER Sepsis Triage”, if “CPR” occurs.
-%>%
- sepsis check_rule(responded_existence("CRP", "ER Sepsis Triage"), label = "r1") %>%
- group_by(r1) %>%
- n_cases()
%>%
+ sepsis check_rule(responded_existence("CRP", "ER Sepsis Triage"), label = "r1") %>%
+ group_by(r1) %>%
+ n_cases()
## # A tibble: 2 × 2
## r1 n_cases
## <lgl> <int>
@@ -1046,10 +1092,10 @@ responded_existence
## 2 TRUE 1049
Arguments:
activity_a
: a single activity nameactivity_b
occur or both are absent
[Example] How many cases contain both “CRP” and “ER Sepsis Triage”.
-%>%
- sepsis check_rule(and("CRP", "ER Sepsis Triage"), label = "r1") %>%
- group_by(r1) %>%
- n_cases()
%>%
+ sepsis check_rule(and("CRP", "ER Sepsis Triage"), label = "r1") %>%
+ group_by(r1) %>%
+ n_cases()
## # A tibble: 2 × 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 44
## 2 TRUE 1006
Arguments:
activity_a
: a single activity nameReturns: cases where either activity_a
or
activity_b
occur, but not both.
[Example] How many cases contain “CRP” OR “ER Sepsis Triage”.
-%>%
- sepsis check_rule(xor("CRP", "ER Sepsis Triage"), label = "r1") %>%
- group_by(r1) %>%
- n_cases()
%>%
+ sepsis check_rule(xor("CRP", "ER Sepsis Triage"), label = "r1") %>%
+ group_by(r1) %>%
+ n_cases()
## # A tibble: 2 × 2
## r1 n_cases
## <lgl> <int>
@@ -1091,6 +1137,8 @@ xor
Copyright © 2023 bupaR - Hasselt University
The table below show the same data as above, but now using the
activitylog
format. It can be seen that there are now just
@@ -859,27 +817,27 @@
eventlog
vs activitylog
## # Log of 10 events consisting of:
+## # Log of 12 events consisting of:
## 1 trace
## 1 case
-## 5 instances of 5 activities
+## 6 instances of 6 activities
## 0 resources
-## Events occurred from 2017-07-03 05:21:36 until 2017-07-09 18:01:06
+## Events occurred from 2018-03-20 19:07:17 until 2018-04-12 21:41:01
##
## # Variables were mapped as follows:
## Case identifier: patient
@@ -1015,14 +979,15 @@ Scenario 1
## Resource identifier: employee
## Timestamps: start, complete
##
-## # A tibble: 5 × 5
+## # A tibble: 6 × 5
## patient handling start complete .order
## <chr> <fct> <dttm> <dttm> <int>
-## 1 188 Check-out 2017-07-09 16:27:02 2017-07-09 18:01:06 1
-## 2 188 Discuss Results 2017-07-09 12:17:58 2017-07-09 16:27:02 2
-## 3 188 Registration 2017-07-03 05:21:36 2017-07-03 09:22:46 3
-## 4 188 Triage and Assessment 2017-07-03 16:49:13 2017-07-04 08:14:12 4
-## 5 188 X-Ray 2017-07-09 01:43:34 2017-07-09 06:55:29 5
+## 1 464 Blood test 2018-04-06 20:04:09 2018-04-07 01:18:17 1
+## 2 464 Check-out 2018-04-12 19:02:11 2018-04-12 21:41:01 2
+## 3 464 Discuss Results 2018-04-12 11:00:16 2018-04-12 13:59:44 3
+## 4 464 MRI SCAN 2018-04-07 06:30:56 2018-04-07 09:37:26 4
+## 5 464 Registration 2018-03-20 19:07:17 2018-03-20 21:15:41 5
+## 6 464 Triage and Assessment 2018-03-21 15:58:55 2018-03-22 05:21:56 6
Note that in case a resource identifier is available, this
information can be added in the activitylog
call.
## Warning in validate_eventlog(eventlog): The following activity instances are
-## connected to more than one resource: 1932,2427,698
-## # Log of 10 events consisting of:
+## connected to more than one resource: 1054,116,1291,1850,2345,616
+## # Log of 12 events consisting of:
## 1 trace
## 1 case
-## 5 instances of 5 activities
-## 5 resources
-## Events occurred from 2017-07-12 06:43:07 until 2017-07-19 12:13:53
+## 6 instances of 6 activities
+## 6 resources
+## Events occurred from 2017-04-29 03:24:59 until 2017-05-03 06:16:03
##
## # Variables were mapped as follows:
## Case identifier: patient
@@ -1334,20 +1315,22 @@ Scenario 3
## Timestamp: time
## Lifecycle transition: registration_type
##
-## # A tibble: 10 × 7
-## patient handling emplo…¹ handl…² regis…³ time .order
-## <chr> <fct> <fct> <chr> <fct> <dttm> <int>
-## 1 198 Registration r6 198 start 2017-07-12 06:43:07 1
-## 2 198 Registration r6 198 comple… 2017-07-12 10:27:51 2
-## 3 198 Triage and Assess… r7 698 start 2017-07-12 15:46:36 3
-## 4 198 Triage and Assess… r5 698 comple… 2017-07-13 06:31:09 4
-## 5 198 X-Ray r2 1576 start 2017-07-18 14:10:06 5
-## 6 198 X-Ray r2 1576 comple… 2017-07-18 20:56:08 6
-## 7 198 Discuss Results r5 1932 start 2017-07-19 06:02:50 7
-## 8 198 Discuss Results r1 1932 comple… 2017-07-19 08:20:45 8
-## 9 198 Check-out r1 2427 start 2017-07-19 10:33:43 9
-## 10 198 Check-out r7 2427 comple… 2017-07-19 12:13:53 10
-## # … with abbreviated variable names ¹employee, ²handling_id, ³registration_type
+## # A tibble: 12 × 7
+## patient handling employee handling_id registration_type time
+## <chr> <fct> <fct> <chr> <fct> <dttm>
+## 1 116 Registrat… r2 116 start 2017-04-29 03:24:59
+## 2 116 Registrat… r6 116 complete 2017-04-29 06:23:09
+## 3 116 Triage an… r1 616 start 2017-04-29 15:41:27
+## 4 116 Triage an… r7 616 complete 2017-04-30 03:04:21
+## 5 116 Blood test r4 1054 start 2017-04-30 15:13:28
+## 6 116 Blood test r6 1054 complete 2017-04-30 21:24:18
+## 7 116 MRI SCAN r1 1291 start 2017-05-01 01:12:51
+## 8 116 MRI SCAN r4 1291 complete 2017-05-01 05:32:37
+## 9 116 Discuss R… r3 1850 start 2017-05-01 09:44:20
+## 10 116 Discuss R… r7 1850 complete 2017-05-01 14:00:48
+## 11 116 Check-out r3 2345 start 2017-05-03 04:02:35
+## 12 116 Check-out r2 2345 complete 2017-05-03 06:16:03
+## # ℹ 1 more variable: .order <int>
Note that we need an eventlog
irrespective of which
attribute values are differing, i.e. it can be resources, but also any
additional variables you have in your data set. For the special case of
@@ -1493,95 +1476,113 @@
If you have a large dataset, and want to have an overview of the @@ -1590,13 +1591,15 @@
%>%
log detect_resource_inconsistencies()
## # A tibble: 4 × 5
+## # A tibble: 6 × 5
## patient handling handling_id complete start
## <chr> <fct> <chr> <chr> <chr>
-## 1 232 Check-out 2461 r2 r5
-## 2 232 Registration 232 r7 r5
-## 3 232 Triage and Assessment 732 r1 r7
-## 4 232 X-Ray 1596 r2 r1
+## 1 206 Blood test 1100 r3 r1
+## 2 206 Check-out 2435 r7 r1
+## 3 206 Discuss Results 1940 r4 r2
+## 4 206 MRI SCAN 1337 r2 r6
+## 5 206 Registration 206 r3 r4
+## 6 206 Triage and Assessment 706 r7 r6
If you want to remove these inconsistencies, a quick fix is to merge
the resource labels together with
fix_resource_inconsistencies()
. Note that this is not
@@ -1609,22 +1612,24 @@
%>%
log fix_resource_inconsistencies()
## *** OUTPUT ***
-## A total of 4 activity executions in the event log are classified as inconsistencies.
+## A total of 6 activity executions in the event log are classified as inconsistencies.
## They are spread over the following cases and activities:
-## # A tibble: 4 × 5
+## # A tibble: 6 × 5
## patient handling handling_id complete start
## <chr> <fct> <chr> <chr> <chr>
-## 1 232 Check-out 2461 r2 r5
-## 2 232 Registration 232 r7 r5
-## 3 232 Triage and Assessment 732 r1 r7
-## 4 232 X-Ray 1596 r2 r1
+## 1 206 Blood test 1100 r3 r1
+## 2 206 Check-out 2435 r7 r1
+## 3 206 Discuss Results 1940 r4 r2
+## 4 206 MRI SCAN 1337 r2 r6
+## 5 206 Registration 206 r3 r4
+## 6 206 Triage and Assessment 706 r7 r6
## Inconsistencies solved succesfully.
-## # Log of 10 events consisting of:
+## # Log of 12 events consisting of:
## 1 trace
## 1 case
-## 5 instances of 5 activities
-## 5 resources
-## Events occurred from 2017-08-13 19:50:42 until 2017-08-22 16:55:37
+## 6 instances of 6 activities
+## 6 resources
+## Events occurred from 2017-07-19 15:48:14 until 2017-07-28 03:55:13
##
## # Variables were mapped as follows:
## Case identifier: patient
@@ -1634,20 +1639,22 @@ Inconsistent Resources
## Timestamp: time
## Lifecycle transition: registration_type
##
-## # A tibble: 10 × 7
-## patient handling emplo…¹ handl…² regis…³ time .order
-## <chr> <fct> <chr> <chr> <fct> <dttm> <int>
-## 1 232 Registration r7 - r5 232 start 2017-08-13 19:50:42 1
-## 2 232 Registration r7 - r5 232 comple… 2017-08-13 23:07:38 2
-## 3 232 Triage and Assess… r1 - r7 732 start 2017-08-14 16:33:50 3
-## 4 232 Triage and Assess… r1 - r7 732 comple… 2017-08-15 05:41:07 4
-## 5 232 X-Ray r2 - r1 1596 start 2017-08-15 22:43:19 5
-## 6 232 X-Ray r2 - r1 1596 comple… 2017-08-16 03:55:58 6
-## 7 232 Discuss Results r6 1966 start 2017-08-16 13:20:54 7
-## 8 232 Discuss Results r6 1966 comple… 2017-08-16 17:14:01 8
-## 9 232 Check-out r2 - r5 2461 start 2017-08-22 15:38:38 9
-## 10 232 Check-out r2 - r5 2461 comple… 2017-08-22 16:55:37 10
-## # … with abbreviated variable names ¹employee, ²handling_id, ³registration_type
+## # A tibble: 12 × 7
+## patient handling employee handling_id registration_type time
+## <chr> <fct> <chr> <chr> <fct> <dttm>
+## 1 206 Registrat… r3 - r4 206 start 2017-07-19 15:48:14
+## 2 206 Triage an… r7 - r6 706 start 2017-07-19 17:03:44
+## 3 206 Registrat… r3 - r4 206 complete 2017-07-19 17:03:44
+## 4 206 Triage an… r7 - r6 706 complete 2017-07-20 07:28:53
+## 5 206 Blood test r3 - r1 1100 start 2017-07-25 03:02:14
+## 6 206 Blood test r3 - r1 1100 complete 2017-07-25 08:14:46
+## 7 206 MRI SCAN r2 - r6 1337 start 2017-07-25 12:37:36
+## 8 206 MRI SCAN r2 - r6 1337 complete 2017-07-25 16:52:16
+## 9 206 Discuss R… r4 - r2 1940 start 2017-07-26 07:36:36
+## 10 206 Discuss R… r4 - r2 1940 complete 2017-07-26 11:08:03
+## 11 206 Check-out r7 - r1 2435 start 2017-07-28 02:54:17
+## 12 206 Check-out r7 - r1 2435 complete 2017-07-28 03:55:13
+## # ℹ 1 more variable: .order <int>
Read more:
Or, the duration of “Treatment” should be within 0 to 15 minutes.
%>%
hospital_actlog detect_duration_outliers(Treatment = duration_within(lower_bound = 0, upper_bound = 15))
%>%
hospital_actlog detect_missing_values(level_of_aggregation = "activity")
## Selected level of aggregation:activity
## *** OUTPUT ***
## Absolute number of missing values per column (per activity):
## # A tibble: 9 × 8
-## activity patient_vi…¹ origi…² start compl…³ triag…⁴ speci…⁵ .order
-## <chr> <int> <int> <int> <int> <int> <int> <int>
-## 1 0 0 1 0 0 0 0 0
-## 2 Clinical exam 0 0 1 0 1 0 0
-## 3 Registration 0 1 0 0 0 0 0
-## 4 Trage 0 0 0 0 0 0 0
-## 5 Treatment 0 0 0 0 0 0 0
-## 6 Treatment evaluation 0 0 0 0 0 0 0
-## 7 Triaga 0 0 0 0 0 0 0
-## 8 Triage 0 0 0 0 0 0 0
-## 9 registration 0 0 0 0 0 0 0
-## # … with abbreviated variable names ¹patient_visit_nr, ²originator, ³complete,
-## # ⁴triagecode, ⁵specialization
+## activity patient_visit_nr originator start complete triagecode specialization
+## <chr> <int> <int> <int> <int> <int> <int>
+## 1 0 0 1 0 0 0 0
+## 2 Clinical… 0 0 1 0 1 0
+## 3 Registra… 0 1 0 0 0 0
+## 4 Trage 0 0 0 0 0 0
+## 5 Treatment 0 0 0 0 0 0
+## 6 Treatmen… 0 0 0 0 0 0
+## 7 Triaga 0 0 0 0 0 0
+## 8 Triage 0 0 0 0 0 0
+## 9 registra… 0 0 0 0 0 0
+## # ℹ 1 more variable: .order <int>
## Relative number of missing values per column (per activity, expressed as percentage):
## # A tibble: 9 × 8
-## activity patient_vi…¹ origi…² start compl…³ triag…⁴ speci…⁵ .order
-## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-## 1 0 0 1 0 0 0 0 0
-## 2 Clinical exam 0 0 0.111 0 0.111 0 0
-## 3 Registration 0 0.0714 0 0 0 0 0
-## 4 Trage 0 0 0 0 0 0 0
-## 5 Treatment 0 0 0 0 0 0 0
-## 6 Treatment evaluation 0 0 0 0 0 0 0
-## 7 Triaga 0 0 0 0 0 0 0
-## 8 Triage 0 0 0 0 0 0 0
-## 9 registration 0 0 0 0 0 0 0
-## # … with abbreviated variable names ¹patient_visit_nr, ²originator, ³complete,
-## # ⁴triagecode, ⁵specialization
+## activity patient_visit_nr originator start complete triagecode specialization
+## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+## 1 0 0 1 0 0 0 0
+## 2 Clinical… 0 0 0.111 0 0.111 0
+## 3 Registra… 0 0.0714 0 0 0 0
+## 4 Trage 0 0 0 0 0 0
+## 5 Treatment 0 0 0 0 0 0
+## 6 Treatmen… 0 0 0 0 0 0
+## 7 Triaga 0 0 0 0 0 0
+## 8 Triage 0 0 0 0 0 0
+## 9 registra… 0 0 0 0 0 0
+## # ℹ 1 more variable: .order <dbl>
## Overview of activity log rows which are incomplete:
## # Log of 7 events consisting of:
## 3 traces
@@ -981,14 +974,13 @@ Missing Values
## Timestamps: start, complete
##
## # A tibble: 4 × 8
-## patient_visi…¹ activ…² origi…³ start complete triag…⁴
-## <dbl> <chr> <chr> <dttm> <dttm> <dbl>
-## 1 510 Clinic… Doctor… 2017-11-20 11:35:01 2017-11-20 11:36:09 NA
-## 2 533 0 <NA> 2017-11-22 18:35:00 2017-11-22 18:37:00 7
-## 3 534 Regist… <NA> 2017-11-22 18:35:00 2017-11-22 18:37:00 0
-## 4 512 Clinic… Doctor… NA 2017-11-20 11:33:57 3
-## # … with 2 more variables: specialization <chr>, .order <int>, and abbreviated
-## # variable names ¹patient_visit_nr, ²activity, ³originator, ⁴triagecode
+## patient_visit_nr activity originator start complete
+## <dbl> <chr> <chr> <dttm> <dttm>
+## 1 510 Clinical … Doctor 7 2017-11-20 11:35:01 2017-11-20 11:36:09
+## 2 533 0 <NA> 2017-11-22 18:35:00 2017-11-22 18:37:00
+## 3 534 Registrat… <NA> 2017-11-22 18:35:00 2017-11-22 18:37:00
+## 4 512 Clinical … Doctor 7 NA 2017-11-20 11:33:57
+## # ℹ 3 more variables: triagecode <dbl>, specialization <chr>, .order <int>
%>%
hospital_actlog detect_missing_values(
level_of_aggregation = "column",
@@ -1013,11 +1005,10 @@ Missing Values
## Timestamps: start, complete
##
## # A tibble: 1 × 8
-## patient_visi…¹ activ…² origi…³ start complete triag…⁴
-## <dbl> <chr> <chr> <dttm> <dttm> <dbl>
-## 1 510 Clinic… Doctor… 2017-11-20 11:35:01 2017-11-20 11:36:09 NA
-## # … with 2 more variables: specialization <chr>, .order <int>, and abbreviated
-## # variable names ¹patient_visit_nr, ²activity, ³originator, ⁴triagecode
+## patient_visit_nr activity originator start complete
+## <dbl> <chr> <chr> <dttm> <dttm>
+## 1 510 Clinical … Doctor 7 2017-11-20 11:35:01 2017-11-20 11:36:09
+## # ℹ 3 more variables: triagecode <dbl>, specialization <chr>, .order <int>
%>%
hospital_actlog detect_unique_values(column_labels = c("activity", "originator"))
Read more:
## EMPTY EVENT LOG
## # A tibble: 0 × 7
-## # … with 7 variables: handling <fct>, patient <chr>, employee <fct>,
+## # ℹ 7 variables: handling <fct>, patient <chr>, employee <fct>,
## # handling_id <chr>, registration_type <fct>, time <dttm>, .order <int>
just as the following will not work.
%>%
patients filter("patient" == 1)
## EMPTY EVENT LOG
## # A tibble: 0 × 7
-## # … with 7 variables: handling <fct>, patient <chr>, employee <fct>,
+## # ℹ 7 variables: handling <fct>, patient <chr>, employee <fct>,
## # handling_id <chr>, registration_type <fct>, time <dttm>, .order <int>
In order to successfully do this, we could use the symbol:
%>%
@@ -785,21 +785,21 @@ patients Object classes
## Lifecycle transition: registration_type
##
## # A tibble: 12 × 7
-## handling patient emplo…¹ handl…² regis…³ time .order
-## <fct> <chr> <fct> <chr> <fct> <dttm> <int>
-## 1 Registration 1 r1 1 start 2017-01-02 11:41:53 1
-## 2 Triage and Assess… 1 r2 501 start 2017-01-02 12:40:20 2
-## 3 Blood test 1 r3 1001 start 2017-01-05 08:59:04 3
-## 4 MRI SCAN 1 r4 1238 start 2017-01-05 21:37:12 4
-## 5 Discuss Results 1 r6 1735 start 2017-01-07 07:57:49 5
-## 6 Check-out 1 r7 2230 start 2017-01-09 17:09:43 6
-## 7 Registration 1 r1 1 comple… 2017-01-02 12:40:20 7
-## 8 Triage and Assess… 1 r2 501 comple… 2017-01-02 22:32:25 8
-## 9 Blood test 1 r3 1001 comple… 2017-01-05 14:34:27 9
-## 10 MRI SCAN 1 r4 1238 comple… 2017-01-06 01:54:23 10
-## 11 Discuss Results 1 r6 1735 comple… 2017-01-07 10:18:08 11
-## 12 Check-out 1 r7 2230 comple… 2017-01-09 19:45:45 12
-## # … with abbreviated variable names ¹employee, ²handling_id, ³registration_type
+## handling patient employee handling_id registration_type time
+## <fct> <chr> <fct> <chr> <fct> <dttm>
+## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
+## 2 Triage an… 1 r2 501 start 2017-01-02 12:40:20
+## 3 Blood test 1 r3 1001 start 2017-01-05 08:59:04
+## 4 MRI SCAN 1 r4 1238 start 2017-01-05 21:37:12
+## 5 Discuss R… 1 r6 1735 start 2017-01-07 07:57:49
+## 6 Check-out 1 r7 2230 start 2017-01-09 17:09:43
+## 7 Registrat… 1 r1 1 complete 2017-01-02 12:40:20
+## 8 Triage an… 1 r2 501 complete 2017-01-02 22:32:25
+## 9 Blood test 1 r3 1001 complete 2017-01-05 14:34:27
+## 10 MRI SCAN 1 r4 1238 complete 2017-01-06 01:54:23
+## 11 Discuss R… 1 r6 1735 complete 2017-01-07 10:18:08
+## 12 Check-out 1 r7 2230 complete 2017-01-09 19:45:45
+## # ℹ 1 more variable: .order <int>
More on symbols and !!: https://adv-r.hadley.nz/quasiquotation.html
@@ -822,21 +822,21 @@eventlog()
will perform some checks to make sure the provided mapping corresponds to the data model. This means that each value of the activity_instance_id
should be connected to a single case_id
and a single activity_id
. For larger event logs, this does take some time. You can circumvent these by setting the argument validate = FALSE
.eventlog
or activitylog
object. The activitylog
is a simplified object class that does not require special validation. eventlog()
will perform some checks to make sure the provided mapping corresponds to the data model. This means that each value of the activity_instance_id
should be connected to a single case_id
and a single activity_id
. For larger event logs, this does take some time. You can circumvent these by setting the argument validate = FALSE
.eventlog
or activitylog
object. The activitylog
is a simplified object class that does not require special validation. bupaR
does not contain a magic slider that can simplify a process_map
. We believe that process maps should always be a transparent visualization of the log, and simplifying should be done by the conscious use of filters. You might consider filter_infrequent_flows()
, filter_trace_frequency()
or filter_activity_frequency()
for this job.bupaR
does not contain a magic slider that can simplify a process_map
. We believe that process maps should always be a transparent visualization of the log, and simplifying should be done by the conscious use of filters. You might consider filter_infrequent_flows()
, filter_trace_frequency()
or filter_activity_frequency()
for this job.renderProcessMap()
in the Server, and processMapOutput
in the UI of your app. Make sure to use the width
and height
arguments to set proper dimensions for the map.renderProcessMap()
in the Server, and processMapOutput
in the UI of your app. Make sure to use the width
and height
arguments to set proper dimensions for the map.export_map()
. To do this, note that you should use the argument render = FALSE
in the call to process_map()
. Use the argument title
to add a caption to the image, and width
and height
to adjust the file dimensions.export_map()
. To do this, note that you should use the argument render = FALSE
in the call to process_map()
. Use the argument title
to add a caption to the image, and width
and height
to adjust the file dimensions.Note that this is the default process map configuration, and is thus equivalent to the following.
%>%
tmp process_map()
%>%
tmp process_map(frequency("absolute-case"))
%>%
tmp process_map(frequency("relative"))
%>%
tmp process_map(frequency("relative-case"))
%>%
tmp process_map(frequency("relative-consequent"))
Read more:
The next piece of code returns the first 10 activity @@ -770,29 +770,29 @@
This is not impacted by a different ordering of the data since it will take the time aspect into account.
%>%
@@ -891,19 +891,19 @@ patients first_n, last_n
## Lifecycle transition: registration_type
##
## # A tibble: 10 × 7
-## handling patient emplo…¹ handl…² regis…³ time .order
-## <fct> <chr> <fct> <chr> <fct> <dttm> <int>
-## 1 Registration 1 r1 1 start 2017-01-02 11:41:53 1
-## 2 Registration 2 r1 2 start 2017-01-02 11:41:53 2
-## 3 Triage and Assess… 1 r2 501 start 2017-01-02 12:40:20 4
-## 4 Registration 1 r1 1 comple… 2017-01-02 12:40:20 6
-## 5 Registration 2 r1 2 comple… 2017-01-02 15:16:38 7
-## 6 Triage and Assess… 2 r2 502 start 2017-01-02 22:32:25 5
-## 7 Triage and Assess… 1 r2 501 comple… 2017-01-02 22:32:25 9
-## 8 Triage and Assess… 2 r2 502 comple… 2017-01-03 12:34:01 10
-## 9 Registration 4 r1 4 start 2017-01-04 01:34:04 3
-## 10 Registration 4 r1 4 comple… 2017-01-04 04:25:06 8
-## # … with abbreviated variable names ¹employee, ²handling_id, ³registration_type
+## handling patient employee handling_id registration_type time
+## <fct> <chr> <fct> <chr> <fct> <dttm>
+## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
+## 2 Registrat… 2 r1 2 start 2017-01-02 11:41:53
+## 3 Triage an… 1 r2 501 start 2017-01-02 12:40:20
+## 4 Registrat… 1 r1 1 complete 2017-01-02 12:40:20
+## 5 Registrat… 2 r1 2 complete 2017-01-02 15:16:38
+## 6 Triage an… 2 r2 502 start 2017-01-02 22:32:25
+## 7 Triage an… 1 r2 501 complete 2017-01-02 22:32:25
+## 8 Triage an… 2 r2 502 complete 2017-01-03 12:34:01
+## 9 Registrat… 4 r1 4 start 2017-01-04 01:34:04
+## 10 Registrat… 4 r1 4 complete 2017-01-04 04:25:06
+## # ℹ 1 more variable: .order <int>
Incombination with group_by_case
, it is very easy to
select the heads or tails of each case. Below, we explore the 95% most
common first 3 activities in the sepsis
log.
%>%
patients sample_n(size = 10)
## # Log of 108 events consisting of:
-## 3 traces
+## # Log of 110 events consisting of:
+## 2 traces
## 10 cases
-## 54 instances of 7 activities
+## 55 instances of 7 activities
## 7 resources
-## Events occurred from 2017-03-29 22:12:55 until 2018-05-04 21:50:07
+## Events occurred from 2017-04-01 22:15:15 until 2018-02-25 14:21:06
##
## # Variables were mapped as follows:
## Case identifier: patient
@@ -935,20 +935,21 @@ sample_n
## Timestamp: time
## Lifecycle transition: registration_type
##
-## # A tibble: 108 × 7
-## handling patient employee handling_id regist…¹ time .order
-## <fct> <chr> <fct> <chr> <fct> <dttm> <int>
-## 1 Registration 80 r1 80 start 2017-03-29 22:12:55 1
-## 2 Registration 92 r1 92 start 2017-04-04 17:42:26 2
-## 3 Registration 156 r1 156 start 2017-06-03 10:05:28 3
-## 4 Registration 170 r1 170 start 2017-06-17 15:10:30 4
-## 5 Registration 202 r1 202 start 2017-07-17 03:11:39 5
-## 6 Registration 231 r1 231 start 2017-08-13 19:50:42 6
-## 7 Registration 328 r1 328 start 2017-11-12 04:23:27 7
-## 8 Registration 434 r1 434 start 2018-02-19 02:53:00 8
-## 9 Registration 462 r1 462 start 2018-03-20 07:37:11 9
-## 10 Registration 497 r1 497 start 2018-04-30 09:42:11 10
-## # … with 98 more rows, and abbreviated variable name ¹registration_type
+## # A tibble: 110 × 7
+## handling patient employee handling_id registration_type time
+## <fct> <chr> <fct> <chr> <fct> <dttm>
+## 1 Registrat… 83 r1 83 start 2017-04-01 22:15:15
+## 2 Registrat… 124 r1 124 start 2017-05-03 12:50:34
+## 3 Registrat… 149 r1 149 start 2017-05-26 15:01:49
+## 4 Registrat… 206 r1 206 start 2017-07-19 15:48:14
+## 5 Registrat… 239 r1 239 start 2017-08-20 03:17:18
+## 6 Registrat… 257 r1 257 start 2017-09-12 23:14:23
+## 7 Registrat… 295 r1 295 start 2017-10-14 00:21:58
+## 8 Registrat… 298 r1 298 start 2017-10-15 18:31:02
+## 9 Registrat… 430 r1 430 start 2018-02-17 15:44:17
+## 10 Registrat… 434 r1 434 start 2018-02-19 02:53:00
+## # ℹ 100 more rows
+## # ℹ 1 more variable: .order <int>
Note that this function can also be used with a sample size bigger than the number of cases in the event log, if you allow for the replacements of drawn cases.
diff --git a/index.html b/index.html index c6a9f45..e9e5c6c 100644 --- a/index.html +++ b/index.html @@ -13,21 +13,22 @@%>% cases() patients
## # A tibble: 500 × 10
-## patient trace…¹ numbe…² start_timestamp complete_timestamp trace trace…³
-## <chr> <int> <int> <dttm> <dttm> <chr> <dbl>
-## 1 1 6 6 2017-01-02 11:41:53 2017-01-09 19:45:45 Regi… 4
-## 2 10 5 5 2017-01-06 05:58:54 2017-01-10 15:41:59 Regi… 7
-## 3 100 5 5 2017-04-11 16:34:31 2017-04-22 09:58:07 Regi… 7
-## 4 101 5 5 2017-04-16 06:38:58 2017-04-23 02:55:23 Regi… 7
-## 5 102 5 5 2017-04-16 06:38:58 2017-04-22 10:50:04 Regi… 7
-## 6 103 6 6 2017-04-19 20:22:01 2017-04-23 02:36:55 Regi… 4
-## 7 104 6 6 2017-04-19 20:22:01 2017-04-23 02:07:20 Regi… 4
-## 8 105 6 6 2017-04-21 02:19:09 2017-04-27 01:09:05 Regi… 4
-## 9 106 6 6 2017-04-21 02:19:09 2017-05-01 09:54:39 Regi… 4
-## 10 107 5 5 2017-04-22 18:32:16 2017-04-27 02:45:57 Regi… 7
-## # … with 490 more rows, 3 more variables: duration <drtn>,
-## # first_activity <fct>, last_activity <fct>, and abbreviated variable names
-## # ¹trace_length, ²number_of_activities, ³trace_id
+## patient trace_length number_of_activities start_timestamp
+## <chr> <int> <int> <dttm>
+## 1 1 6 6 2017-01-02 11:41:53
+## 2 10 5 5 2017-01-06 05:58:54
+## 3 100 5 5 2017-04-11 16:34:31
+## 4 101 5 5 2017-04-16 06:38:58
+## 5 102 5 5 2017-04-16 06:38:58
+## 6 103 6 6 2017-04-19 20:22:01
+## 7 104 6 6 2017-04-19 20:22:01
+## 8 105 6 6 2017-04-21 02:19:09
+## 9 106 6 6 2017-04-21 02:19:09
+## 10 107 5 5 2017-04-22 18:32:16
+## # ℹ 490 more rows
+## # ℹ 6 more variables: complete_timestamp <dttm>, trace <chr>, trace_id <dbl>,
+## # duration <drtn>, first_activity <fct>, last_activity <fct>
%>% traces() patients
## # A tibble: 7 × 3
-## trace absol…¹ relat…²
-## <chr> <int> <dbl>
-## 1 Registration,Triage and Assessment,X-Ray,Discuss Results,Chec… 258 0.516
-## 2 Registration,Triage and Assessment,Blood test,MRI SCAN,Discus… 234 0.468
-## 3 Registration,Triage and Assessment,Blood test,MRI SCAN,Discus… 2 0.004
-## 4 Registration,Triage and Assessment,X-Ray 2 0.004
-## 5 Registration,Triage and Assessment 2 0.004
-## 6 Registration,Triage and Assessment,X-Ray,Discuss Results 1 0.002
-## 7 Registration,Triage and Assessment,Blood test 1 0.002
-## # … with abbreviated variable names ¹absolute_frequency, ²relative_frequency
+## trace absolute_frequency relative_frequency
+## <chr> <int> <dbl>
+## 1 Registration,Triage and Assessment,X-Ra… 258 0.516
+## 2 Registration,Triage and Assessment,Bloo… 234 0.468
+## 3 Registration,Triage and Assessment,Bloo… 2 0.004
+## 4 Registration,Triage and Assessment,X-Ray 2 0.004
+## 5 Registration,Triage and Assessment 2 0.004
+## 6 Registration,Triage and Assessment,X-Ra… 1 0.002
+## 7 Registration,Triage and Assessment,Bloo… 1 0.002
Before continuing to further analyses, not that you might want to
ungroup the log using ungroup_eventlog()
. More on grouping.
%>%
+ patients resource_map()
A more compact representation of hand-over-of-work is given by the
resource_matrix
function, which works the same as the
-precedence matrix
functions.
%>%
- patients resource_matrix() %>%
- plot()
process matrix
functions.
+%>%
+ patients resource_matrix() %>%
+ plot()
The metrics for exploring and describing event data which are -available are based on literature in the field of operational excellence -and are organized in the following (sub)categories
-Three different time metrics can be computed:
The idle time is the time that there is no activity in a case or for a resource. It can only be calculated when there are both start and end timestamps available for activity instances. It can be computed at the @@ -699,27 +689,27 @@
%>%
patients idle_time("resource", units = "days") %>%
plot()
The processing time can be computed at the levels log, trace, case, activity and resource-activity. It can only be calculated when there are both start and end timestamps available for activity instances.
%>%
patients processing_time("activity") %>%
plot
The throughput time is the time form the very first event to the last event of a case. The levels at which it can be computed are log, trace, or case.
%>%
patients throughput_time("log") %>%
plot()
There are three different parameters specific to the
performance()
configuration: the aggregation function, the
time units, and the flow time type.
%>%
patients process_map(performance(FUN = max))
Any function that takes a numerical vector and returns a single value can be used. For example, let’s say we want to show the 0.90 percentile.
@@ -686,8 +686,8 @@Note that the ...
is mandatory as
process_map()
will automatically add na.rm = T
to the aggregation function call.
%>%
patients process_map(performance(mean, "days"))
%>%
patients process_map(performance(mean, "hours"))
<- prepare_examples(traffic_fines, task = "outcome")
df df
## # A tibble: 34,724 × 11
-## ith_case case_id prefix prefi…¹ outcome k activ…² resou…³
-## <int> <chr> <chr> <list> <fct> <dbl> <chr> <fct>
-## 1 1 A2127 Create Fine <chr> Payment 0 Create… 537
-## 2 1 A2127 Create Fine - Payment <chr> Payment 1 Payment <NA>
-## 3 2 A15 Create Fine <chr> Send f… 0 Create… 561
-## 4 2 A15 Create Fine - Send Fi… <chr> Send f… 1 Send F… <NA>
-## 5 2 A15 Create Fine - Send Fi… <chr> Send f… 2 Insert… <NA>
-## 6 2 A15 Create Fine - Send Fi… <chr> Send f… 3 Add pe… <NA>
-## 7 2 A15 Create Fine - Send Fi… <chr> Send f… 4 Send f… <NA>
-## 8 3 A1820 Create Fine <chr> Payment 0 Create… 563
-## 9 3 A1820 Create Fine - Payment <chr> Payment 1 Payment <NA>
-## 10 4 A22 Create Fine <chr> Payment 0 Create… 561
-## # … with 34,714 more rows, 3 more variables: start_time <dttm>,
-## # end_time <dttm>, remaining_trace_list <list>, and abbreviated variable
-## # names ¹prefix_list, ²activity, ³resource
+## ith_case case_id prefix prefix_list outcome k activity resource
+## <int> <chr> <chr> <list> <fct> <dbl> <chr> <fct>
+## 1 1 A2127 Create Fine <chr [1]> Payment 0 Create … 537
+## 2 1 A2127 Create Fine - P… <chr [2]> Payment 1 Payment <NA>
+## 3 2 A15 Create Fine <chr [1]> Send f… 0 Create … 561
+## 4 2 A15 Create Fine - S… <chr [2]> Send f… 1 Send Fi… <NA>
+## 5 2 A15 Create Fine - S… <chr [3]> Send f… 2 Insert … <NA>
+## 6 2 A15 Create Fine - S… <chr [4]> Send f… 3 Add pen… <NA>
+## 7 2 A15 Create Fine - S… <chr [5]> Send f… 4 Send fo… <NA>
+## 8 3 A1820 Create Fine <chr [1]> Payment 0 Create … 563
+## 9 3 A1820 Create Fine - P… <chr [2]> Payment 1 Payment <NA>
+## 10 4 A22 Create Fine <chr [1]> Payment 0 Create … 561
+## # ℹ 34,714 more rows
+## # ℹ 3 more variables: start_time <dttm>, end_time <dttm>,
+## # remaining_trace_list <list>
We split the transformed dataset df
into train- and test
sets for later use in fit()
and predict()
,
respectively. The proportion of the train set is configured with the
@@ -750,28 +750,26 @@
<- df %>% split_train_test(split = 0.8)
split $train_df %>% head(5) split
## # A tibble: 5 × 11
-## ith_case case_id prefix prefi…¹ outcome k activ…² resou…³
-## <int> <chr> <chr> <list> <fct> <dbl> <chr> <fct>
-## 1 1 A2127 Create Fine <chr> Payment 0 Create… 537
-## 2 1 A2127 Create Fine - Payment <chr> Payment 1 Payment <NA>
-## 3 2 A15 Create Fine <chr> Send f… 0 Create… 561
-## 4 2 A15 Create Fine - Send Fine <chr> Send f… 1 Send F… <NA>
-## 5 2 A15 Create Fine - Send Fin… <chr> Send f… 2 Insert… <NA>
-## # … with 3 more variables: start_time <dttm>, end_time <dttm>,
-## # remaining_trace_list <list>, and abbreviated variable names ¹prefix_list,
-## # ²activity, ³resource
+## ith_case case_id prefix prefix_list outcome k activity resource
+## <int> <chr> <chr> <list> <fct> <dbl> <chr> <fct>
+## 1 1 A2127 Create Fine <chr [1]> Payment 0 Create … 537
+## 2 1 A2127 Create Fine - Pa… <chr [2]> Payment 1 Payment <NA>
+## 3 2 A15 Create Fine <chr [1]> Send f… 0 Create … 561
+## 4 2 A15 Create Fine - Se… <chr [2]> Send f… 1 Send Fi… <NA>
+## 5 2 A15 Create Fine - Se… <chr [3]> Send f… 2 Insert … <NA>
+## # ℹ 3 more variables: start_time <dttm>, end_time <dttm>,
+## # remaining_trace_list <list>
$test_df %>% head(5) split
## # A tibble: 5 × 11
-## ith_case case_id prefix prefix_…¹ outcome k activ…² resou…³
-## <int> <chr> <chr> <list> <fct> <dbl> <chr> <fct>
-## 1 8001 A24869 Create Fine <chr [1]> Payment 0 Create… 559
-## 2 8001 A24869 Create Fine - Payment <chr [2]> Payment 1 Payment <NA>
-## 3 8002 A24871 Create Fine <chr [1]> Payment 0 Create… 559
-## 4 8002 A24871 Create Fine - Payment <chr [2]> Payment 1 Payment <NA>
-## 5 8003 A24872 Create Fine <chr [1]> Send f… 0 Create… 559
-## # … with 3 more variables: start_time <dttm>, end_time <dttm>,
-## # remaining_trace_list <list>, and abbreviated variable names ¹prefix_list,
-## # ²activity, ³resource
+## ith_case case_id prefix prefix_list outcome k activity resource
+## <int> <chr> <chr> <list> <fct> <dbl> <chr> <fct>
+## 1 8001 A24869 Create Fine <chr [1]> Payment 0 Create … 559
+## 2 8001 A24869 Create Fine - Pa… <chr [2]> Payment 1 Payment <NA>
+## 3 8002 A24871 Create Fine <chr [1]> Payment 0 Create … 559
+## 4 8002 A24871 Create Fine - Pa… <chr [2]> Payment 1 Payment <NA>
+## 5 8003 A24872 Create Fine <chr [1]> Send f… 0 Create … 559
+## # ℹ 3 more variables: start_time <dttm>, end_time <dttm>,
+## # remaining_trace_list <list>
It’s important to note that the split is done at case level (a case is fully part of either the train data or either the test data). Furthermore, the split is done chronologically, meaning that the train diff --git a/process_matrix.html b/process_matrix.html index d8c1117..7b317b4 100644 --- a/process_matrix.html +++ b/process_matrix.html @@ -13,14 +13,14 @@