Submission Systems¶
Submission system play an important role, if you want to develop your pygromos code. Many times, they are hidden in the Simulation_runner blocks. But maybe you want to develop something, where you need direct access on the submission system?
This notebook will give you some examples, how you can use the submission systems. Note that all submission systems are write in the same ways, such you can exchange them quickly.
[1]:
from pygromos.simulations.hpc_queuing.submission_systems import local # this executes your code in your local session.
from pygromos.simulations.hpc_queuing.submission_systems import lsf # this module can be used to submit to the lsf-queue (e.g. on euler)
from pygromos.simulations.hpc_queuing.submission_systems import dummy # this is a dummy system, that only prints the commands
from pygromos.simulations.hpc_queuing.submission_systems.submission_job import Submission_job # this class stores all data for a single job
Local Submission¶
This system executes the commands directly in your current session. This allows you to locally test or execute your code. Maybe if your process needs much more time, you want later to switch to a submission system for job-queueing.
[2]:
sub_local = local.LOCAL()
sub_local.verbose = True
[3]:
bash_command = "sleep 2; echo \"WUHA\"; sleep 2"
job = Submission_job(bash_command)
job_id = sub_local.submit_to_queue(job)
job_id
Submission Command: . / j o b _ t e s t . s h
STDOUT:
b'WUHA\n'
END
[3]:
0
[4]:
#This is a dummy function, to not break the code!
sub_local.get_jobs_from_queue("FUN")
[4]:
[]
LSF Submission¶
The Lsf submission system allows to submit jobs to the IBM LSF-Queueing system.
Careful! This part requires a running LSF-Queueing System on your System
You can submit and kill jobs and arrays to the queue, as well as getting information from the queuing list.
[5]:
#Construct system:
sub_lsf = lsf.LSF(nmpi=1, job_duration = "24:00", max_storage=100)
sub_lsf.verbose = True
sub_lsf._refresh_job_queue_list_all_s = 0 #you must wait at least 1s to update job_queue list
Queue Checking:¶
[6]:
sub_lsf.get_queued_jobs()
sub_lsf.job_queue_list
---------------------------------------------------------------------------
ChildProcessError Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
307 else:
--> 308 out_process = bash.execute("bjobs -w", catch_STD=True)
309 job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))
~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
934 ):
--> 935 return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
936
~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
828 msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829 raise ChildProcessError(msg)
830 if verbose:
ChildProcessError: SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
/tmp/ipykernel_4765/3507705770.py in <module>
----> 1 sub_lsf.get_queued_jobs()
2 sub_lsf.job_queue_list
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
317 self._job_queue_time_stamp = datetime.now()
318 except Exception as err:
--> 319 raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
320 else:
321 job_list_str = []
Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
Submission:¶
here you can submit jobs to the queue as bash commands
[7]:
bash_command = "sleep 5; echo \"WUHA\"; sleep 2"
job_name = "Test1"
job_id = sub_lsf.submit_to_queue(Submission_job(bash_command, job_name))
check queue
---------------------------------------------------------------------------
ChildProcessError Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
307 else:
--> 308 out_process = bash.execute("bjobs -w", catch_STD=True)
309 job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))
~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
934 ):
--> 935 return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
936
~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
828 msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829 raise ChildProcessError(msg)
830 if verbose:
ChildProcessError: SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
/tmp/ipykernel_4765/3188711362.py in <module>
2 job_name = "Test1"
3
----> 4 job_id = sub_lsf.submit_to_queue(Submission_job(bash_command, job_name))
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in submit_to_queue(self, sub_job)
74 if self.verbose:
75 print("check queue")
---> 76 ids = list(self.search_queue_for_jobname(sub_job.jobName).index)
77
78 if len(ids) > 0:
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in search_queue_for_jobname(self, job_name, regex)
376 """
377
--> 378 self.get_queued_jobs()
379 if regex:
380 return self._job_queue_list.where(self._job_queue_list.JOB_NAME.str.match(job_name)).dropna()
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
317 self._job_queue_time_stamp = datetime.now()
318 except Exception as err:
--> 319 raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
320 else:
321 job_list_str = []
Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
[8]:
#search for the just submitted job in the queue
sub_lsf.search_queue_for_jobid(job_id)
---------------------------------------------------------------------------
ChildProcessError Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
307 else:
--> 308 out_process = bash.execute("bjobs -w", catch_STD=True)
309 job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))
~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
934 ):
--> 935 return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
936
~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
828 msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829 raise ChildProcessError(msg)
830 if verbose:
ChildProcessError: SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
/tmp/ipykernel_4765/1706125824.py in <module>
1 #search for the just submitted job in the queue
----> 2 sub_lsf.search_queue_for_jobid(job_id)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in search_queue_for_jobid(self, job_id)
357
358 def search_queue_for_jobid(self, job_id: int) -> pd.DataFrame:
--> 359 self.get_queued_jobs()
360 return self._job_queue_list.where(self._job_queue_list.JOBID == job_id).dropna()
361
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
317 self._job_queue_time_stamp = datetime.now()
318 except Exception as err:
--> 319 raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
320 else:
321 job_list_str = []
Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
[9]:
sub_lsf.search_queue_for_jobname("Test1")
---------------------------------------------------------------------------
ChildProcessError Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
307 else:
--> 308 out_process = bash.execute("bjobs -w", catch_STD=True)
309 job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))
~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
934 ):
--> 935 return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
936
~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
828 msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829 raise ChildProcessError(msg)
830 if verbose:
ChildProcessError: SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
/tmp/ipykernel_4765/1654124141.py in <module>
----> 1 sub_lsf.search_queue_for_jobname("Test1")
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in search_queue_for_jobname(self, job_name, regex)
376 """
377
--> 378 self.get_queued_jobs()
379 if regex:
380 return self._job_queue_list.where(self._job_queue_list.JOB_NAME.str.match(job_name)).dropna()
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
317 self._job_queue_time_stamp = datetime.now()
318 except Exception as err:
--> 319 raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
320 else:
321 job_list_str = []
Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
Submitting multiple jobs¶
[10]:
bash_command = "sleep 2; echo \"WUHA\"; sleep 2"
job_ids = []
for test in range(3):
job_name = "Test"+str(test)
job_id = sub_lsf.submit_to_queue(Submission_job(bash_command, job_name))
job_ids.append(job_id)
check queue
---------------------------------------------------------------------------
ChildProcessError Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
307 else:
--> 308 out_process = bash.execute("bjobs -w", catch_STD=True)
309 job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))
~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
934 ):
--> 935 return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
936
~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
828 msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829 raise ChildProcessError(msg)
830 if verbose:
ChildProcessError: SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
/tmp/ipykernel_4765/422912753.py in <module>
4 job_name = "Test"+str(test)
5
----> 6 job_id = sub_lsf.submit_to_queue(Submission_job(bash_command, job_name))
7 job_ids.append(job_id)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in submit_to_queue(self, sub_job)
74 if self.verbose:
75 print("check queue")
---> 76 ids = list(self.search_queue_for_jobname(sub_job.jobName).index)
77
78 if len(ids) > 0:
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in search_queue_for_jobname(self, job_name, regex)
376 """
377
--> 378 self.get_queued_jobs()
379 if regex:
380 return self._job_queue_list.where(self._job_queue_list.JOB_NAME.str.match(job_name)).dropna()
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
317 self._job_queue_time_stamp = datetime.now()
318 except Exception as err:
--> 319 raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
320 else:
321 job_list_str = []
Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
[11]:
sub_lsf.search_queue_for_jobname("Te", regex=True)
---------------------------------------------------------------------------
ChildProcessError Traceback (most recent call last)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
307 else:
--> 308 out_process = bash.execute("bjobs -w", catch_STD=True)
309 job_list_str = list(map(lambda x: x.decode("utf-8"), out_process.stdout.readlines()))
~/PyGromosTools/pygromos/utils/bash.py in execute(command, verbose, catch_STD, env)
934 ):
--> 935 return execute_subprocess(command=command, verbose=verbose, catch_STD=catch_STD, env=env)
936
~/PyGromosTools/pygromos/utils/bash.py in execute_subprocess(command, catch_STD, env, verbose)
828 msg += "NONE" if (p.stdout is None) else "\n\t".join(map(str, p.stderr.readlines()))
--> 829 raise ChildProcessError(msg)
830 if verbose:
ChildProcessError: SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
/tmp/ipykernel_4765/722442852.py in <module>
----> 1 sub_lsf.search_queue_for_jobname("Te", regex=True)
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in search_queue_for_jobname(self, job_name, regex)
376 """
377
--> 378 self.get_queued_jobs()
379 if regex:
380 return self._job_queue_list.where(self._job_queue_list.JOB_NAME.str.match(job_name)).dropna()
~/PyGromosTools/pygromos/simulations/hpc_queuing/submission_systems/lsf.py in get_queued_jobs(self)
317 self._job_queue_time_stamp = datetime.now()
318 except Exception as err:
--> 319 raise Exception("Could not get job_list!\nerr:\n" + "\n".join(err.args))
320 else:
321 job_list_str = []
Exception: Could not get job_list!
err:
SubProcess Failed due to returncode: 127
COMMAND:
bjobs -w
STDOUT:
STDERR:
b'/bin/sh: 1: bjobs: not found\n'
Killing a jobs¶
Remove a job the job queue
sub_lsf.kill_jobs(job_ids=[job_id])sub_lsf.search_queue_for_jobname("Te", regex=True)[ ]: