Algorithm: Rule Base Data Cleaning & Flagging in Database
Input
Q
: A query to identify data for processing.D
: The database containing the data.t
: A tuple or set of tuples identified as potentially erroneous.
Output
- A list of correction and flagging actions (
CorrectionEdits
andFlaggingEdits
).
Initialization
CorrectionEdits = ∅
: Initialize an empty set for correction edits.FlaggingEdits = ∅
: Initialize an empty set for flagging edits.S = infer_issues(t, Q, D)
: Initialize a setS
containing tuples suspected of being erroneous.
Procedure
- while S ≠ ∅ do
- foreach tuple
r
inS
do- Apply data cleaning rules to
r
.
- Apply data cleaning rules to
- if is_corrected(r) then
CorrectionEdits ← r+
- Update
r
in databaseD
.
- else
FlaggingEdits ← r−
- Flag
r
in databaseD
.
- Remove
r
fromS
. - if S ≠ ∅ then
r_most_common = MostFrequentTuple(S)
- if
is_corrected(r_most_common)
thenS = {s \ {r_most_common} | s ∈ S}
CorrectionEdits ← r_most_common+
- else
- Remove from
S
all sets that containr_most_common
. FlaggingEdits ← r_most_common−
- Remove from
- return (CorrectionEdits, FlaggingEdits)