Submission Systems

Submission system play an important role, if you want to develop your pygromos code. Many times, they are hidden in the Simulation_runner blocks. But maybe you want to develop something, where you need direct access on the submission system?

This notebook will give you some examples, how you can use the submission systems. Note that all submission systems are write in the same ways, such you can exchange them quickly.


[1]:
from pygromos.simulations.hpc_queuing.submission_systems import local # this executes your code in your local session.
from pygromos.simulations.hpc_queuing.submission_systems import lsf # this module can be used to submit to the lsf-queue (e.g. on euler)
from pygromos.simulations.hpc_queuing.submission_systems import dummy # this is a dummy system, that only prints the commands
from pygromos.simulations.hpc_queuing.submission_systems.submission_job import Submission_job # this class stores all data for a single job

Local Submission

This system executes the commands directly in your current session. This allows you to locally test or execute your code. Maybe if your process needs much more time, you want later to switch to a submission system for job-queueing.

[2]:
sub_local = local.LOCAL()
sub_local.verbose = True
[3]:
bash_command = "sleep 2; echo \"WUHA\"; sleep 2"
job = Submission_job(bash_command)

job_id = sub_local.submit_to_queue(job)
job_id
Submission Command:      . / j o b _ t e s t . s h
STDOUT:
                b'WUHA\n'
END

[3]:
0
[4]:
#This is a dummy function, to not break the code!
sub_local.get_jobs_from_queue("FUN")
[4]:
[]

LSF Submission

The Lsf submission system allows to submit jobs to the IBM LSF-Queueing system.

Careful! This part requires a running LSF-Queueing System on your System

You can submit and kill jobs and arrays to the queue, as well as getting information from the queuing list.

[5]:
#Construct system:
sub_lsf = lsf.LSF(nmpi=1, job_duration = "24:00", max_storage=100)
sub_lsf.verbose = True

sub_lsf._refresh_job_queue_list_all_s = 0 #you must wait at least 1s to update job_queue list

Queue Checking:

[6]:
sub_lsf.get_queued_jobs()
sub_lsf.job_queue_list
---------------------------------------------------------------------------
ChildProcessError                         Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    307                     else:
--> 308                         out_process = bash.execute("bjobs -w", catch_STD=True)
    309                     job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))

~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
    934 ):
--> 935     return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
    936

~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
    828         msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829         raise ChildProcessError(msg)
    830     if verbose:

ChildProcessError: SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
/tmp/ipykernel_4765/3507705770.py in <module>
----> 1 sub_lsf.get_queued_jobs()
      2 sub_lsf.job_queue_list

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    317                     self._job_queue_time_stamp = datetime.now()
    318                 except Exception as err:
--> 319                     raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
    320             else:
    321                 job_list_str = []

Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'

Submission:

here you can submit jobs to the queue as bash commands

[7]:
bash_command = "sleep 5; echo \"WUHA\"; sleep 2"
job_name = "Test1"

job_id = sub_lsf.submit_to_queue(Submission_job(bash_command, job_name))
check queue
---------------------------------------------------------------------------
ChildProcessError                         Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    307                     else:
--> 308                         out_process = bash.execute("bjobs -w", catch_STD=True)
    309                     job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))

~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
    934 ):
--> 935     return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
    936

~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
    828         msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829         raise ChildProcessError(msg)
    830     if verbose:

ChildProcessError: SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
/tmp/ipykernel_4765/3188711362.py in <module>
      2 job_name = "Test1"
      3
----> 4 job_id = sub_lsf.submit_to_queue(Submission_job(bash_command, job_name))

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in submit_to_queue(self, sub_job)
     74             if self.verbose:
     75                 print("check queue")
---> 76             ids = list(self.search_queue_for_jobname(sub_job.jobName).index)
     77
     78             if len(ids) > 0:

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in search_queue_for_jobname(self, job_name, regex)
    376         """
    377
--> 378         self.get_queued_jobs()
    379         if regex:
    380             return self._job_queue_list.where(self._job_queue_list.JOB_NAME.str.match(job_name)).dropna()

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    317                     self._job_queue_time_stamp = datetime.now()
    318                 except Exception as err:
--> 319                     raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
    320             else:
    321                 job_list_str = []

Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'
[8]:
#search for the just submitted job in the queue
sub_lsf.search_queue_for_jobid(job_id)
---------------------------------------------------------------------------
ChildProcessError                         Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    307                     else:
--> 308                         out_process = bash.execute("bjobs -w", catch_STD=True)
    309                     job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))

~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
    934 ):
--> 935     return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
    936

~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
    828         msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829         raise ChildProcessError(msg)
    830     if verbose:

ChildProcessError: SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
/tmp/ipykernel_4765/1706125824.py in <module>
      1 #search for the just submitted job in the queue
----> 2 sub_lsf.search_queue_for_jobid(job_id)

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in search_queue_for_jobid(self, job_id)
    357
    358     def search_queue_for_jobid(self, job_id: int) -> pd.DataFrame:
--> 359         self.get_queued_jobs()
    360         return self._job_queue_list.where(self._job_queue_list.JOBID == job_id).dropna()
    361

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    317                     self._job_queue_time_stamp = datetime.now()
    318                 except Exception as err:
--> 319                     raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
    320             else:
    321                 job_list_str = []

Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'
[9]:
sub_lsf.search_queue_for_jobname("Test1")
---------------------------------------------------------------------------
ChildProcessError                         Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    307                     else:
--> 308                         out_process = bash.execute("bjobs -w", catch_STD=True)
    309                     job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))

~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
    934 ):
--> 935     return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
    936

~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
    828         msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829         raise ChildProcessError(msg)
    830     if verbose:

ChildProcessError: SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
/tmp/ipykernel_4765/1654124141.py in <module>
----> 1 sub_lsf.search_queue_for_jobname("Test1")

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in search_queue_for_jobname(self, job_name, regex)
    376         """
    377
--> 378         self.get_queued_jobs()
    379         if regex:
    380             return self._job_queue_list.where(self._job_queue_list.JOB_NAME.str.match(job_name)).dropna()

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    317                     self._job_queue_time_stamp = datetime.now()
    318                 except Exception as err:
--> 319                     raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
    320             else:
    321                 job_list_str = []

Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'

Submitting multiple jobs

[10]:
bash_command = "sleep 2; echo \"WUHA\"; sleep 2"
job_ids = []
for test in range(3):
    job_name = "Test"+str(test)

    job_id = sub_lsf.submit_to_queue(Submission_job(bash_command, job_name))
    job_ids.append(job_id)
check queue
---------------------------------------------------------------------------
ChildProcessError                         Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    307                     else:
--> 308                         out_process = bash.execute("bjobs -w", catch_STD=True)
    309                     job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))

~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
    934 ):
--> 935     return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
    936

~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
    828         msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829         raise ChildProcessError(msg)
    830     if verbose:

ChildProcessError: SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
/tmp/ipykernel_4765/422912753.py in <module>
      4     job_name = "Test"+str(test)
      5
----> 6     job_id = sub_lsf.submit_to_queue(Submission_job(bash_command, job_name))
      7     job_ids.append(job_id)

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in submit_to_queue(self, sub_job)
     74             if self.verbose:
     75                 print("check queue")
---> 76             ids = list(self.search_queue_for_jobname(sub_job.jobName).index)
     77
     78             if len(ids) > 0:

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in search_queue_for_jobname(self, job_name, regex)
    376         """
    377
--> 378         self.get_queued_jobs()
    379         if regex:
    380             return self._job_queue_list.where(self._job_queue_list.JOB_NAME.str.match(job_name)).dropna()

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    317                     self._job_queue_time_stamp = datetime.now()
    318                 except Exception as err:
--> 319                     raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
    320             else:
    321                 job_list_str = []

Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'
[11]:
sub_lsf.search_queue_for_jobname("Te", regex=True)
---------------------------------------------------------------------------
ChildProcessError                         Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    307                     else:
--> 308                         out_process = bash.execute("bjobs -w", catch_STD=True)
    309                     job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))

~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
    934 ):
--> 935     return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
    936

~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
    828         msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829         raise ChildProcessError(msg)
    830     if verbose:

ChildProcessError: SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
/tmp/ipykernel_4765/722442852.py in <module>
----> 1 sub_lsf.search_queue_for_jobname("Te", regex=True)

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in search_queue_for_jobname(self, job_name, regex)
    376         """
    377
--> 378         self.get_queued_jobs()
    379         if regex:
    380             return self._job_queue_list.where(self._job_queue_list.JOB_NAME.str.match(job_name)).dropna()

~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
    317                     self._job_queue_time_stamp = datetime.now()
    318                 except Exception as err:
--> 319                     raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
    320             else:
    321                 job_list_str = []

Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
 COMMAND:
        bjobs -w
STDOUT:

STDERR:
        b'/bin/sh: 1: bjobs: not found\n'

Killing a jobs

Remove a job the job queue

sub_lsf.kill_jobs(job_ids=[job_id])sub_lsf.search_queue_for_jobname("Te", regex=True)
[ ]: