Skip to content

Latest commit

 

History

History
37 lines (31 loc) · 1.22 KB

rule_base_algorithm.md

File metadata and controls

37 lines (31 loc) · 1.22 KB

Algorithm: Rule Base Data Cleaning & Flagging in Database

Input

  • Q: A query to identify data for processing.
  • D: The database containing the data.
  • t: A tuple or set of tuples identified as potentially erroneous.

Output

  • A list of correction and flagging actions (CorrectionEdits and FlaggingEdits).

Initialization

  • CorrectionEdits = ∅: Initialize an empty set for correction edits.
  • FlaggingEdits = ∅: Initialize an empty set for flagging edits.
  • S = infer_issues(t, Q, D): Initialize a set S containing tuples suspected of being erroneous.

Procedure

  1. while S ≠ ∅ do
  2. foreach tuple r in S do
    • Apply data cleaning rules to r.
  3. if is_corrected(r) then
    • CorrectionEdits ← r+
    • Update r in database D.
  4. else
    • FlaggingEdits ← r−
    • Flag r in database D.
  5. Remove r from S.
  6. if S ≠ ∅ then
    • r_most_common = MostFrequentTuple(S)
    • if is_corrected(r_most_common) then
      • S = {s \ {r_most_common} | s ∈ S}
      • CorrectionEdits ← r_most_common+
    • else
      • Remove from S all sets that contain r_most_common.
      • FlaggingEdits ← r_most_common−
  7. return (CorrectionEdits, FlaggingEdits)