Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on single CPU node system #438

Open
vineel96 opened this issue May 9, 2023 · 6 comments
Open

Running on single CPU node system #438

vineel96 opened this issue May 9, 2023 · 6 comments

Comments

@vineel96
Copy link

vineel96 commented May 9, 2023

Hi @cTatu ,
Is it possible to run dislib on single node system with 8 core cpu?(as i am getting non-reachable nodes error when running on single node) and also will the performance boost remain same?

@vineel96 vineel96 changed the title Running on single CPU node(system) Running on single CPU node system May 9, 2023
@lezzidan
Copy link

lezzidan commented May 9, 2023

Could you please send the logs and details on how/where you are running?

@vineel96
Copy link
Author

vineel96 commented May 11, 2023

Hi @lezzidan ,
Hardware info:

  1. AWS c7g.4xlarge instance
  2. Architecture: ARM (aarch64)
  3. No of CPU: 1, CPU Cores: 16, No hyperthreading i.e only one thread per core
  4. RAM: 32GB
    Installation : pycompss=3.1, dislib=0.8.0, python=3.9 (followed installation steps mentioned in doc)
    Algorithm: dislib Kmeans
    Dataset size: 236930 x 14
    Command 1:
    python kmeans_dislib.py
    observation: The program gets stuck/hangs for longer time
    htop command: we observe only 1 core is being used

Command 2:
export ComputingUnits=8
runcompss kmeans_dislib.py
observation: No task could be scheduled to any of the available resources, shutting down COMPSs

Screenshot 2023-05-11 102243

htop output: randomly some cores is getting used at different instances
Screenshot (30)

@cTatu
Copy link
Collaborator

cTatu commented May 11, 2023

Hi, I suspect that the default ComputingUnits in the resources.xml of COMPSs is set to only 4 cores.
Try looking into this file /opt/COMPSs//Runtime/configuration/xml/resources/default_resources.xml and change <ComputingUnits>4</ComputingUnits> to 16.

Also you can also try export ComputingUnits=1.
You mentioned Dataset size: 236930 x 14 but what block size are you using? Because that will determine the number of tasks that will be launched in parallel.

@vineel96
Copy link
Author

vineel96 commented May 18, 2023

Hi @cTatu,
I have changed computingunits value to 16 in default_resources.xml. The error remained same "No task could be scheduled to any of the available resources, shutting down COMPSs" or the program gets hanged for long time.
Also i tried setting "export ComputingUnits=1", same issue persists.
I have tried two block sizes: 1. (229616,7) and 2. (2,2)
For these two block sizes the error remained same where program gets hanged or it says "no task can be scheduled, shutting down COMPs"

@vineel96
Copy link
Author

Hi @cTatu, @lezzidan,
Can i get any suggestions/help regarding the issue mentioned?

@cTatu
Copy link
Collaborator

cTatu commented Jun 19, 2023

Hey sorry for the delay,

One possible thing could be that the ssh-daemon is not started. COMPSs needs ssh access to the worker node (which in your case is the same machine). So to check that try executing ssh localhost and it should be configured in a password-less way (using rsa keys). Make sure the service is installed sudo apt install openssh-server and that is on sudo service ssh start.
For passworless configuration you can follow our guide: https://compss-doc.readthedocs.io/en/stable/Sections/01_Installation/05_Additional_configuration.html

Hope this works
Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants