Unique key detector tool #6900

simonaubertbd · 2024-09-21T07:05:24Z

What's your use case?
More than often, you have to deal with a dataset without knowing what's make a row unique. This can lead to misinterpret the data, cartesian product at join and other funny stuff.

What's your proposed solution?
This is a feature I haven't seen in any data prepation/etl. The core feature is to detect the unique key in a dataframe.

How do I imagine that ?

Entry; one dataframe, ability to select fields or check all, ability to specify a max number of field for combination (empty or 0=no max).
Algo : it tests the count distinct every combination of field versus the count of rows

Result : one row by field combination that works. If no result : "no field combination is unique. check for duplicate or need for aggregation upstream".

ex :

order_id	line_id	amount	customer	site
1	1	100	A	U_250
1	2	12	A	U_250
1	3	45	A	U_250
2	1	75	A	U_250
2	2	12	A	U_250
3	1	15	B	U_250
4	1	45	B	U_251

The user will select every field but excluding Amount (he knows that Amount would have no sense in key)

The algo will test the following key
-each separate field
-each combination of two fields
-each combination of three fields
-each combination of four fields

to match the number of row (7)
And gives something like that

choice	number of fields	field combination
very good	2	order_id,line_id
average	3	order_id,line_id, customer
average	3	order_id,line_id, site
bad	4	order_id,line_id, site, customer
…	…	….

Are there any alternative solutions?
N/A

Best regards,

Simon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unique key detector tool #6900

Unique key detector tool #6900

simonaubertbd commented Sep 21, 2024

Unique key detector tool #6900

Unique key detector tool #6900

Comments

simonaubertbd commented Sep 21, 2024