-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparkdeduplication #429
base: master
Are you sure you want to change the base?
Sparkdeduplication #429
Commits on Mar 17, 2017
-
Initial version of spark-based document deduplication. It contains
a new version of the clustering mechanism with auto-sizing clusters.
Configuration menu - View commit details
-
Copy full SHA for eafc85d - Browse repository at this point
Copy the full SHA eafc85dView commit details
Commits on Apr 4, 2017
-
Work on the algorithm which splits large clusters among the cluster
to obtain better scalability.
Configuration menu - View commit details
-
Copy full SHA for e95a8cc - Browse repository at this point
Copy the full SHA e95a8ccView commit details
Commits on Apr 7, 2017
-
Complete version with tiled comparison task.
Added programatical logging into stdout, for easier log reading
Configuration menu - View commit details
-
Copy full SHA for efb3fc2 - Browse repository at this point
Copy the full SHA efb3fc2View commit details
Commits on Apr 14, 2017
-
Configuration menu - View commit details
-
Copy full SHA for fab0915 - Browse repository at this point
Copy the full SHA fab0915View commit details
Commits on Jun 23, 2017
-
Stable version, does proper job within 2.5h on full data set.
Needs code cleanup and qality assurance.
Configuration menu - View commit details
-
Copy full SHA for 6180bba - Browse repository at this point
Copy the full SHA 6180bbaView commit details
Commits on Jun 26, 2017
-
Added options parsing from command line to control app behaviour.
Version used for performance testing.
Configuration menu - View commit details
-
Copy full SHA for 28834d2 - Browse repository at this point
Copy the full SHA 28834d2View commit details
Commits on Jul 11, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 3a749d4 - Browse repository at this point
Copy the full SHA 3a749d4View commit details
Commits on Jul 14, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 0cfdcbb - Browse repository at this point
Copy the full SHA 0cfdcbbView commit details
Commits on Jul 23, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 2e0c205 - Browse repository at this point
Copy the full SHA 2e0c205View commit details
Commits on Jul 24, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 1a2c5dd - Browse repository at this point
Copy the full SHA 1a2c5ddView commit details -
Configuration menu - View commit details
-
Copy full SHA for ca4592b - Browse repository at this point
Copy the full SHA ca4592bView commit details
Commits on Jul 25, 2017
-
Initial version of spark-based document deduplication. It contains
a new version of the clustering mechanism with auto-sizing clusters.
Configuration menu - View commit details
-
Copy full SHA for ece39dc - Browse repository at this point
Copy the full SHA ece39dcView commit details -
Work on the algorithm which splits large clusters among the cluster
to obtain better scalability.
Configuration menu - View commit details
-
Copy full SHA for c056a0b - Browse repository at this point
Copy the full SHA c056a0bView commit details -
Complete version with tiled comparison task.
Added programatical logging into stdout, for easier log reading
Configuration menu - View commit details
-
Copy full SHA for e7ad7aa - Browse repository at this point
Copy the full SHA e7ad7aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0cf6672 - Browse repository at this point
Copy the full SHA 0cf6672View commit details -
Stable version, does proper job within 2.5h on full data set.
Needs code cleanup and qality assurance.
Configuration menu - View commit details
-
Copy full SHA for 013b53c - Browse repository at this point
Copy the full SHA 013b53cView commit details -
Added options parsing from command line to control app behaviour.
Version used for performance testing.
Configuration menu - View commit details
-
Copy full SHA for ac56042 - Browse repository at this point
Copy the full SHA ac56042View commit details -
Added dependency for the scopt.
Task tiling class rewritten to scala, with tests.
Configuration menu - View commit details
-
Copy full SHA for 81f6509 - Browse repository at this point
Copy the full SHA 81f6509View commit details -
Configuration menu - View commit details
-
Copy full SHA for cd1014c - Browse repository at this point
Copy the full SHA cd1014cView commit details -
Fixed oozie workflow building.
Cleaning up project files. Fixed workflow building for oozie.
Configuration menu - View commit details
-
Copy full SHA for 221cd52 - Browse repository at this point
Copy the full SHA 221cd52View commit details -
Merge branch 'sparkdeduplication' of https://github.com/axnow/CoAnSys …
…into sparkdeduplication
Configuration menu - View commit details
-
Copy full SHA for 20da6f8 - Browse repository at this point
Copy the full SHA 20da6f8View commit details