Add Synthetic CrossCat datasets #175

srvasude · 2024-08-20T17:05:42Z

Add four synthetic CrossCat datasets for each of the data types. Will add unit tests that verify that we get the appropriate clusterings with these later.

ThomasColthurst · 2024-08-20T17:47:02Z

cxx/assets/test_files/test_crosscat_bernoulli.schema

@@ -0,0 +1,4 @@
+col1 ~ bernoulli(id)


Doesn't the relation name here have to match the second column of the .obs file? Here it is "col1" but the .obs file uses "has_col1".

Yeah this was a bad push. Fixed here and elsewhere.

ThomasColthurst · 2024-08-20T17:48:51Z

cxx/assets/test_files/test_crosscat_categorical.schema

@@ -0,0 +1,4 @@
+col1 ~ stringcat[strings="a:b:c:d",delim=:](id)


In addition to the "has_" issue, there also appears to be an off by one error. Here, "col1" is the one that has a/b/c/d values, but it is "has_col0" in the .obs file that has those values.

Similarly for the other relations.

Fixed as well.

ThomasColthurst · 2024-08-20T17:52:08Z

cxx/assets/test_files/test_crosscat_generator.py

+NUM_SAMPLES_1 = 33
+NUM_SAMPLES_2 = 50
+NUM_SAMPLES = 100
+


What do you think about adding some tests for this program?

I'm adding integration tests in a later PR that will test the files (test some basic invariants, like creating two IRMs, etc)

Add crosscat datasets

d859c3f

srvasude requested a review from ThomasColthurst August 20, 2024 17:05

ThomasColthurst reviewed Aug 20, 2024

View reviewed changes

srvasude added 2 commits August 20, 2024 11:22

Fix datasets

b8b6832

fix files

3fb6b45

ThomasColthurst approved these changes Aug 20, 2024

View reviewed changes

srvasude merged commit f7a39b8 into master Aug 21, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Synthetic CrossCat datasets #175

Add Synthetic CrossCat datasets #175

srvasude commented Aug 20, 2024

ThomasColthurst Aug 20, 2024

srvasude Aug 20, 2024

ThomasColthurst Aug 20, 2024

srvasude Aug 20, 2024

ThomasColthurst Aug 20, 2024

srvasude Aug 20, 2024

		@@ -0,0 +1,4 @@
		col1 ~ stringcat[strings="a:b:c:d",delim=:](id)

Add Synthetic CrossCat datasets #175

Add Synthetic CrossCat datasets #175

Conversation

srvasude commented Aug 20, 2024

ThomasColthurst Aug 20, 2024

Choose a reason for hiding this comment

srvasude Aug 20, 2024

Choose a reason for hiding this comment

ThomasColthurst Aug 20, 2024

Choose a reason for hiding this comment

srvasude Aug 20, 2024

Choose a reason for hiding this comment

ThomasColthurst Aug 20, 2024

Choose a reason for hiding this comment

srvasude Aug 20, 2024

Choose a reason for hiding this comment