Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renewed Pull request with datasets removed #663

Closed
wants to merge 194 commits into from
Closed

Renewed Pull request with datasets removed #663

wants to merge 194 commits into from

Conversation

Forthoney
Copy link
Contributor

I updated my branch and remove the datasets directory from git. The core contents of the pull requests are the same as the previous request #662.
I further customized the fit.sh script to match our specific dataset.

@github-actions
Copy link

OS:ubuntu-20.04
Fri Mar 17 15:45:14 UTC 2023
intro: 2/2 tests passed.
interface: 39/39 tests passed.
compiler: 54/54 tests passed.
agg: 109/109 tests passed.

@github-actions
Copy link

OS = Debian 10
CPU = Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Ram = 15752
Hash = f474f48
Kernel= Linux 4.15.0-197-generic x86_64

benchmark tests passed failed untested unresolved unsupported not_in_use other_status
posix 494 375 41 31 6 40 1 0
intro 2 2 0 0 0 0 0 0
interface 39 39 0 0 0 0 0 0
compiler 54 54 0 0 0 0 0 0
aggregator 109 109 0 0 0 0 0 0

angelhof and others added 28 commits September 21, 2023 16:06
Signed-off-by: Forthoney <[email protected]>
* checkpoint: remote execution infra for single commands

Signed-off-by: Tammam Mustafa <[email protected]>

* infra for debuging and accessing config

Signed-off-by: Tammam Mustafa <[email protected]>

* fixed some bugs in ir.py

Signed-off-by: Tammam Mustafa <[email protected]>

* added a function for replacing edges in IR

Signed-off-by: Tammam Mustafa <[email protected]>

* Checkpoint 2: graph is being split, augmented with remote read/write and distributed to workers

Signed-off-by: Tammam Mustafa <[email protected]>

* simple nc read/write in golang

Signed-off-by: Tammam Mustafa <[email protected]>

* improved socket read/write and added timeout

Signed-off-by: Tammam Mustafa <[email protected]>

* use costume implementation of nc and choose ports from offset instead of randomly

Signed-off-by: Tammam Mustafa <[email protected]>

* add retry to both listening and dialing

Signed-off-by: Tammam Mustafa <[email protected]>

* Script to get local paths of hdfs files

Signed-off-by: Tammam Mustafa <[email protected]>

* improve dspash setup script

Signed-off-by: Tammam Mustafa <[email protected]>

* some clean up and docs to dspash ir_helper

Signed-off-by: Tammam Mustafa <[email protected]>

* Extend fids and resources to detect if a fid can be serviced from a particular host

Signed-off-by: Tammam Mustafa <[email protected]>

* improve workers manager to choose workers depending on data availability

Signed-off-by: Tammam Mustafa <[email protected]>

* Extend graph splitting to support multi source graphs and lay background for supporting stdin

Signed-off-by: Tammam Mustafa <[email protected]>

* improvement to graph splitting

Signed-off-by: Tammam Mustafa <[email protected]>

* Refactor: rewrite graph splitting for distributed execution

- Changed graph splitting algorithm to support arbitrary DAGs (not just uniform ones)
- Generate a graph to run in the original user shell
- Named fifos and files are stored where the user executed pa.sh and not the remote worker directory
- The main shell graph handles the creation and wiring of named pipes and new/old files to remote worker

Signed-off-by: Tammam Mustafa <[email protected]>

* Avoid calling to_shell before saving the Graph for distributed exec

to_shell seems to directly modify the graph which causes problems down the line

Signed-off-by: Tammam Mustafa <[email protected]>

* fixed bug caused terminating multi sink graph early

Signed-off-by: Tammam Mustafa <[email protected]>

* fixed bug in graph splitting

Signed-off-by: Tammam Mustafa <[email protected]>

* removed already done TODO comment

Signed-off-by: Tammam Mustafa <[email protected]>

* revert incorrect changes to source_nodes function in IR

Signed-off-by: Tammam Mustafa <[email protected]>

* Initial design to correctly use workers on different machines

Signed-off-by: Tammam Mustafa <[email protected]>

* fix bug in worker caused by graphviz

Signed-off-by: Tammam Mustafa <[email protected]>

* dspash setup script to use PASH_TOP instead of relative access

Signed-off-by: Tammam Mustafa <[email protected]>

* Add workers from cluster.json file

Signed-off-by: Tammam Mustafa <[email protected]>

* use defer for closing to ensure sockets close on panic

Signed-off-by: Tammam Mustafa <[email protected]>

* socket_pipe: improved connection reusability

Signed-off-by: Tammam Mustafa <[email protected]>

* update gitignore to include socket_pipe

Signed-off-by: Tammam Mustafa <[email protected]>

* clean up and remove old remote exec as it's not used

Signed-off-by: Tammam Mustafa <[email protected]>

* Revert "Extend fids and resources to detect if a fid can be serviced from a particular host"

This reverts commit 63dae2f.

Signed-off-by: Tammam Mustafa <[email protected]>

* complete reverting fid location extention

Signed-off-by: Tammam Mustafa <[email protected]>

* fix missing extra line at eof

Signed-off-by: Tammam Mustafa <[email protected]>

* move remote read/write to runtime folder

Signed-off-by: Tammam Mustafa <[email protected]>

* wait for all processes to finish in worker

Signed-off-by: Tammam Mustafa <[email protected]>

* improve docs for dspash ir helpers

Signed-off-by: Tammam Mustafa <[email protected]>

* remove arg annotation from dspash pr

Signed-off-by: Tammam Mustafa <[email protected]>

* remove added lines from pr

Signed-off-by: Tammam Mustafa <[email protected]>

* more dspash docs fix

Signed-off-by: Tammam Mustafa <[email protected]>

Co-authored-by: Pratyush Das <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: a5ob7r <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
* hdfs cat annotation

Signed-off-by: Tammam Mustafa <[email protected]>

* rename and small refactor to hdfs getPaths.py

Signed-off-by: Tammam Mustafa <[email protected]>

* Added hdfs utils

Signed-off-by: Tammam Mustafa <[email protected]>

* Added HDFSCat dfgNode

Signed-off-by: Tammam Mustafa <[email protected]>

* Created RemoteFileResource and HDFSResource

Signed-off-by: Tammam Mustafa <[email protected]>

* Added method in FileId and remote resource to check avalability on given host

Signed-off-by: Tammam Mustafa <[email protected]>

* worker manager filter workers based on given fids locations

Signed-off-by: Tammam Mustafa <[email protected]>

* Implemented full HDFS cat support

Signed-off-by: Tammam Mustafa <[email protected]>

* fixed small bug in FileId has_remote_file_resource

Signed-off-by: Tammam Mustafa <[email protected]>

* addressed comments

Signed-off-by: Tammam Mustafa <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Dimitris Karnikis <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Konstantinos Kallas <[email protected]>
Signed-off-by: Forthoney <[email protected]>
…lgorithm

Signed-off-by: Tammam Mustafa <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Tammam Mustafa <[email protected]>
Signed-off-by: Forthoney <[email protected]>
* grpc client server protobuf for reading logical splits

Signed-off-by: Tammam Mustafa <[email protected]>

* improve naming to more general split reader

Signed-off-by: Tammam Mustafa <[email protected]>

* make config positional argument

Signed-off-by: Tammam Mustafa <[email protected]>

* start file server with worker

Signed-off-by: Tammam Mustafa <[email protected]>

* add new dfs reader node to compiler

Signed-off-by: Tammam Mustafa <[email protected]>

* small fix to dspash setup

Signed-off-by: Tammam Mustafa <[email protected]>

* fix some bugs

Signed-off-by: Tammam Mustafa <[email protected]>

* run on 0.0.0.0

Signed-off-by: Tammam Mustafa <[email protected]>

* add prefix to rpc requests

Signed-off-by: Tammam Mustafa <[email protected]>

* fix bug from printing

Signed-off-by: Tammam Mustafa <[email protected]>

* small code quality improvements

Signed-off-by: Tammam Mustafa <[email protected]>

* added a small readme

Signed-off-by: Tammam Mustafa <[email protected]>

* small fix

Signed-off-by: Tammam Mustafa <[email protected]>
Signed-off-by: Forthoney <[email protected]>
angelhof and others added 26 commits September 21, 2023 16:07
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Konstantinos Kallas <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Forthoney <[email protected]>
* modify up to not use git

* bump version

* add back git in a comment

* add the alternative in a comment, it might interfere with antikythera and other CI scripts and there is no need to mess with this now

Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Forthoney <[email protected]>
Signed-off-by: Forthoney <[email protected]>
@Forthoney Forthoney closed this Sep 21, 2023
@github-actions
Copy link

OS:ubuntu-20.04
Thu Sep 21 20:15:57 UTC 2023
intro: 0/2 tests passed.
interface: 6/39 tests passed.
compiler: 0/54 tests passed.
agg: 10/109 tests passed.
demo-spell.sh are not identical
hello-world.sh are not identical
test1 are not identical
test2 are not identical
test3 are not identical
test4 are not identical
test5 are not identical
test6 are not identical
test8 are not identical
test9 are not identical
test10 are not identical
test12 are not identical
test13 are not identical
test14 are not identical
test15 are not identical
test16 are not identical
test17 are not identical
test18 are not identical
test_set are not identical
test_set_e are not identical
test_redirect are not identical
test_unparsing are not identical
test_set_e_2 are not identical
test_set_e_3 are not identical
test_new_line_in_var are not identical
test_cmd_sbst are not identical
test_cmd_sbst2 are not identical
test_cat_hyphen are not identical
test_trap are not identical
test_umask are not identical
test_quoting are not identical
test_var_assgn_default are not identical
test_exclam are not identical
test_redir_var_test are not identical
test_star are not identical
diff.sh are not identical
diff.sh are not identical
set-diff.sh are not identical
set-diff.sh are not identical
export_var_script.sh are not identical
export_var_script.sh are not identical
comm-par-test.sh are not identical
comm-par-test.sh are not identical
comm-par-test2.sh are not identical
comm-par-test2.sh are not identical
tee_web_index_bug.sh are not identical
tee_web_index_bug.sh are not identical
fun-def.sh are not identical
fun-def.sh are not identical
bigrams.sh are not identical
bigrams.sh are not identical
spell-grep.sh are not identical
spell-grep.sh are not identical
grep.sh are not identical
grep.sh are not identical
minimal_sort.sh are not identical
minimal_sort.sh are not identical
minimal_grep.sh are not identical
minimal_grep.sh are not identical
topn.sh are not identical
topn.sh are not identical
wf.sh are not identical
wf.sh are not identical
spell.sh are not identical
spell.sh are not identical
shortest_scripts.sh are not identical
shortest_scripts.sh are not identical
alt_bigrams.sh are not identical
alt_bigrams.sh are not identical
deadlock_test.sh are not identical
deadlock_test.sh are not identical
double_sort.sh are not identical
double_sort.sh are not identical
no_in_script.sh are not identical
no_in_script.sh are not identical
for_loop_simple.sh are not identical
for_loop_simple.sh are not identical
minimal_grep_stdin.sh are not identical
minimal_grep_stdin.sh are not identical
micro_10.sh are not identical
micro_10.sh are not identical
sed-test.sh are not identical
sed-test.sh are not identical
tr-test.sh are not identical
tr-test.sh are not identical
grep-test.sh are not identical
grep-test.sh are not identical
ann-agg.sh are not identical
ann-agg.sh are not identical
test-1 are not identical
test-2 are not identical
test-3 are not identical
test-4 are not identical
test-5 are not identical
test-6 are not identical
test-7 are not identical
test-10 are not identical
test-11 are not identical
test-12 are not identical
test-13 are not identical
test-14 are not identical
test-17 are not identical
test-18 are not identical
test-21 are not identical
test-24 are not identical
test-25 are not identical
test-28 are not identical
test-30 are not identical
test-31 are not identical
test-32 are not identical
test-34 are not identical
test-35 are not identical
test-36 are not identical
test-38 are not identical
test-40 are not identical
test-41 are not identical
test-42 are not identical
test-43 are not identical
test-45 are not identical
test-46 are not identical
test-47 are not identical
test-48 are not identical
test-50 are not identical
test-51 are not identical
test-52 are not identical
test-53 are not identical
test-55 are not identical
test-56 are not identical
test-58 are not identical
test-62 are not identical
test-63 are not identical
test-70 are not identical
test-73 are not identical
test-82 are not identical
test-84 are not identical
test-89 are not identical
test-90 are not identical
test-93 are not identical
test-95 are not identical
test-96 are not identical
test-97 are not identical
test-98 are not identical
test-99 are not identical
test-100 are not identical
test-101 are not identical
test-102 are not identical
test-103 are not identical
test-104 are not identical
test-105 are not identical
test-106 are not identical
test-107 are not identical
test-108 are not identical
test-109 are not identical
test-110 are not identical
test-111 are not identical
test-113 are not identical
test-114 are not identical
test-115 are not identical
test-116 are not identical
test-117 are not identical
test-118 are not identical
test-120 are not identical
test-121 are not identical
test-122 are not identical
test-123 are not identical
test-125 are not identical
test-129 are not identical
test-130 are not identical
test-131 are not identical
test-140 are not identical
test-149 are not identical
test-150 are not identical
test-152 are not identical
test-153 are not identical
test-160 are not identical
test-163 are not identical
test-165 are not identical
test-170 are not identical
test-175 are not identical
test-176 are not identical
test-177 are not identical
test-178 are not identical
test-179 are not identical
test-180 are not identical
test-181 are not identical
test-182 are not identical
test-187 are not identical
test-192 are not identical

@github-actions
Copy link

OS = Debian 10
CPU = Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Ram = 15752
Hash = 75279b4
Kernel= Linux 4.15.0-197-generic x86_64

benchmark tests passed failed untested unresolved unsupported not_in_use other_status
posix 494 375 41 31 6 40 1 0
intro 2 2 0 0 0 0 0 0
interface 41 41 0 0 0 0 0 0
compiler 54 54 0 0 0 0 0 0

@nvasilakis
Copy link
Collaborator

nvasilakis commented Sep 21, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants