Copied (and adapted) existing k-means test cases to new k-means #3

fschopp · 2012-09-24T23:57:30Z

This is still work in progress. Pull request opened to facilitate discussion. Please only merge when requested.

fschopp · 2012-09-24T23:59:16Z

@huor It seems that dataset.xml and kmeans.xml (and probably the casespecs of other modules) contain redundant information. Is it really necessary to list a method in dataset.xml?

fschopp · 2012-09-25T00:07:21Z

@huor Are there assumption on the names in dataset.xml, algorithms.xml, and the casespecs (such as kmeans.xml)? For instance, is a test_suite assumed to be named <method>_<name>?

huor · 2012-09-25T02:11:59Z

The algorithms/methods defined in algorithms.xml, and datasets defined in dataset.xml will be used in kmeans.xml.
The algorithms/methods defined in algorithms.xml also can be used in dataset.xml.

huor · 2012-09-25T02:15:07Z

The information in dataset.xml and kmeans.xml is not redundant. The the datasets defined dataset.xml is a kind of algorithms-prarameters combination, it will ease us to write cases in kmeans.xml

fschopp · 2012-09-25T22:54:49Z

So, you are saying that, e.g., the fact that test suite kmeans_cset_negative_src_relation contains init_cset_rel and that dataset.xml contains the same parameter is an oversight and not necessary?

fschopp · 2012-09-25T22:57:28Z

Regarding the naming conventions: Yes, I see that some names are references. But I am wondering if you also make assumption that names are composed of different substrings? Do you ever concatenate string and then assume that there is a certain algorithm/test suite/method with that composed name?

geeg · 2012-09-27T23:17:52Z

testspec/casespec/kmeans.xml

+                </list_parameter>
+                <list_parameter>
+                    <name>dist_metric</name>
+                    <value>squared_dist_norm1</value><value>squared_dist_norm2</value><value>squared_angle</value><value>squared_tanimoto</value>


I found that I had to explicitly include the madlib schema, e.g.: madlib.squared_dist_norm1
(Similar change had to be made in the algorithmspec.xml file for madlib.avg)

geeg · 2012-09-27T23:41:37Z

The MADmark Installation Guide that Jiali sent explains everything pretty well. Here's a summary of how to run select test cases:

After updating the 3 XML files accordingly, run 'cd $MADMARK_HOME/bin; python run.py -g' to generate all your test cases. So, if in kmeans.xml, you have a test_suite named "kmeans_baseline" and you have a total of 4 combinations of parameters to test, you'll be generating four case files in $MADMARK_HOME/testcase:
kmeans_baseline_0_0
kmeans_baseline_0_1
kmeans_baseline_0_2
kmeans_baseline_0_3

To run select test cases:

Create a .yaml file in $MADMARK_HOME/schedule, e.g. $MADMARK_HOME/schedule/example.yaml
Add relevant case files that you want to run, e.g. (in example.yaml):
cases : case_example
platform : GPDB42
In the case file (which should also be in the $MADMARK_HOME/schedule directory), include whichever test cases you want to run, e.g. (in case_example):
kmeans_baseline_0_0
kmeans_baseline_0_1
kmeans_baseline_0_2
kmeans_baseline_0_3
Run 'cd $MADMARK_HOME/bin; python run.py -s example.yaml'

…ml, to track changes compared to old kmeans test cases.

…d. Now use CTAS workaround to be compatible with older versions of Greenplum.

geeg · 2012-10-01T23:29:40Z

testspec/metadata/algorithmspec.xml

+                    , {src_col_data}         -- expr_point
+                    , (SELECT centroids FROM {table_name}) -- centroids
+                    , {dist_metric}          -- fn_dist
+                    );


I think we still need the "DROP TABLE {table_name};" in order to be able to run different tests using the same dataset (otherwise, after using the dataset table_name once, all other tests that use the same dataset ERROR out since the table_name already exists).

I think a teardown section should be used for that. This is what I did.

…a pair. Added declarations for kmeanspp_seeding and kmeans_random_seeding.

More complex arguments may contain quotes, e.g., ARRAY['madlib.squared_dist_norm2','madlib.dist_norm2']. Previously, quotes did not pass through the shell invocation but caused errors.

geeg reviewed Sep 27, 2012
View reviewed changes

Florian Schoppmann added 3 commits September 28, 2012 10:59

Copied testspec/casespec/kmeans.xml to testspec/casespec/kmeans_new.x…

b169878

…ml, to track changes compared to old kmeans test cases.

Copied (and adapted) existing k-means test cases to new k-means

bfc1c8c

Made distance-function and meann-aggregate parameters schema-qualifie…

b161515

…d. Now use CTAS workaround to be compatible with older versions of Greenplum.

geeg reviewed Oct 1, 2012
View reviewed changes

Florian Schoppmann added 2 commits October 2, 2012 15:17

Distance argument to k-means and silhouette function now supplied as …

76fc5dc

…a pair. Added declarations for kmeanspp_seeding and kmeans_random_seeding.

Allow argument values that contain double quotes

3667558

More complex arguments may contain quotes, e.g., ARRAY['madlib.squared_dist_norm2','madlib.dist_norm2']. Previously, quotes did not pass through the shell invocation but caused errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copied (and adapted) existing k-means test cases to new k-means #3

Copied (and adapted) existing k-means test cases to new k-means #3

fschopp commented Sep 24, 2012

fschopp commented Sep 24, 2012

fschopp commented Sep 25, 2012

huor commented Sep 25, 2012

huor commented Sep 25, 2012

fschopp commented Sep 25, 2012

fschopp commented Sep 25, 2012

geeg Sep 27, 2012

geeg commented Sep 27, 2012

geeg Oct 1, 2012

fschopp Oct 1, 2012

Copied (and adapted) existing k-means test cases to new k-means #3

Are you sure you want to change the base?

Copied (and adapted) existing k-means test cases to new k-means #3

Conversation

fschopp commented Sep 24, 2012

fschopp commented Sep 24, 2012

fschopp commented Sep 25, 2012

huor commented Sep 25, 2012

huor commented Sep 25, 2012

fschopp commented Sep 25, 2012

fschopp commented Sep 25, 2012

geeg Sep 27, 2012

Choose a reason for hiding this comment

geeg commented Sep 27, 2012

geeg Oct 1, 2012

Choose a reason for hiding this comment

fschopp Oct 1, 2012

Choose a reason for hiding this comment