psij.executors package¶
Subpackages¶
- psij.executors.batch package
- Submodules
- psij.executors.batch.batch_scheduler_executor module
- psij.executors.batch.cobalt module
- psij.executors.batch.escape_functions module
- psij.executors.batch.lsf module
- psij.executors.batch.pbspro module
- psij.executors.batch.script_generator module
- psij.executors.batch.slurm module
- psij.executors.batch.template_function_library module
- Module contents
Submodules¶
psij.executors.flux module¶
This module contains the Flux JobExecutor.
Implementation references: github.com/flux-framework/flux-core/blob/master/src/bindings/python/flux/job/executor.py flux-framework.readthedocs.io/projects/flux-core/en/latest/python/job_submission.html#the-fluxexecutor-interface
Events and state transitions: github.com/flux-framework/rfc/blob/master/spec_21.rst
- class FluxJobExecutor(url=None, config=None)[source]¶
Bases:
JobExecutorA
JobExecutorfor the Flux scheduler.The Flux resource manager framework is deployed and used on a per-user basis at many sites, and is slated to become the system-level resource manager at LLNL.
Uses Flux’s python library/bindings to submit, monitor, and manipulate jobs.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (Optional[JobExecutorConfig]) – The FluxJobExecutor does not have any configuration options.
- Return type
None
psij.executors.local module¶
This module contains the local JobExecutor.
- class LocalJobExecutor(url=None, config=None)[source]¶
Bases:
JobExecutorA job executor that runs jobs locally using
subprocess.Popen.This job executor is intended to be used either to run jobs directly on the same machine as the PSI/J library or for testing purposes.
Note
In Linux, attached jobs always appear to complete with a zero exit code regardless of the actual exit code.
Warning
Instantiation of a local executor from both parent process and a fork()-ed process is not guaranteed to work. In general, using fork() and multi-threading in Linux is unsafe, as suggested by the fork() man page. While PSI/J attempts to minimize problems that can arise when fork() is combined with threads (which are used by PSI/J), no guarantees can be made and the chances of unexpected behavior are high. Please do not use PSI/J with fork(). If you do, please be mindful that support for using PSI/J with fork() will be limited.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (JobExecutorConfig) – The LocalJobExecutor does not have any configuration options.
- Return type
None
- attach(job, native_id)[source]¶
Attaches a job to a process.
The job must be in the
NEWstate. The exit code of the attached job will not be available upon completion and a zero exit code will always be returned for jobs attached by the LocalJobExecutor.
- list()[source]¶
Return a list of ids representing jobs that are running on the underlying implementation.
Specifically for the LocalJobExecutor, this returns a list of ~psij.NativeId objects corresponding to the processes running under the current user on the local machine. These processes need not correspond to jobs statrted by calling the submit() method of an instance of a LocalJobExecutor.
- submit(job)[source]¶
Submits the specified
Jobto be run locally.Successful return of this method indicates that the job has been started locally and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an
InvalidJobExceptionis thrown. If the actual submission fails for reasons outside the validity of the job, aSubmitExceptionis thrown.- Parameters
job (Job) – The job to be submitted.
- Return type
None
psij.executors.rp module¶
This module contains the RP JobExecutor.
- class RPJobExecutor(url=None, config=None)[source]¶
Bases:
JobExecutorA job executor that runs jobs via the RADICAL Pilot system.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (Optional[JobExecutorConfig]) – The RPJobExecutor does not have any configuration options.
- Return type
None
- list()[source]¶
See
list().Return a list of ids representing jobs that are running on the underlying implementation - in this case RP task IDs.
- submit(job)[source]¶
Submits the specified
Jobto the pilot.Successful return of this method indicates that the job has been submitted to RP and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an
InvalidJobExceptionis thrown. If the actual submission fails for reasons outside the validity of the job, aSubmitExceptionis thrown.- Parameters
job (Job) – The job to be submitted.
- Return type
None
Module contents¶
A package containing psij.JobExecutor implementations.
- class CobaltJobExecutor(url=None, config=None)[source]¶
Bases:
BatchSchedulerExecutorA
JobExecutorfor the Cobalt Workload Manager.The Cobalt HPC Job Scheduler, is used by Argonne’s ALCF systems.
Uses the
qsub,qstat, andqdelcommands, respectively, to submit, monitor, and cancel jobs.Creates a batch script with #COBALT directives when submitting a job.
Custom attributes prefixed with cobalt. are rendered as long-form directives in the script. For example, setting custom_attributes[‘cobalt.m’] = ‘co’ results in the #COBALT –m=co directive being placed in the submit script.
- Parameters
url (Optional[str]) – This parameter is not used and is only provided for compatibility reasons.
config (Optional[CobaltExecutorConfig]) – An optional configuration for this executor.
- Return type
None
- get_cancel_command(native_id)[source]¶
See
get_cancel_command().
- get_list_command()[source]¶
See
get_list_command().
- get_status_command(native_ids)[source]¶
See
get_status_command().- Parameters
native_ids (Collection[str]) –
- Return type
- get_submit_command(job, submit_file_path)[source]¶
See
get_submit_command().
- process_cancel_command_output(exit_code, out)[source]¶
See
process_cancel_command_output().This should be unnecessary because qdel only seems to fail on non-integer job IDs.
- class LocalJobExecutor(url=None, config=None)[source]¶
Bases:
JobExecutorA job executor that runs jobs locally using
subprocess.Popen.This job executor is intended to be used either to run jobs directly on the same machine as the PSI/J library or for testing purposes.
Note
In Linux, attached jobs always appear to complete with a zero exit code regardless of the actual exit code.
Warning
Instantiation of a local executor from both parent process and a fork()-ed process is not guaranteed to work. In general, using fork() and multi-threading in Linux is unsafe, as suggested by the fork() man page. While PSI/J attempts to minimize problems that can arise when fork() is combined with threads (which are used by PSI/J), no guarantees can be made and the chances of unexpected behavior are high. Please do not use PSI/J with fork(). If you do, please be mindful that support for using PSI/J with fork() will be limited.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (JobExecutorConfig) – The LocalJobExecutor does not have any configuration options.
- Return type
None
- attach(job, native_id)[source]¶
Attaches a job to a process.
The job must be in the
NEWstate. The exit code of the attached job will not be available upon completion and a zero exit code will always be returned for jobs attached by the LocalJobExecutor.
- list()[source]¶
Return a list of ids representing jobs that are running on the underlying implementation.
Specifically for the LocalJobExecutor, this returns a list of ~psij.NativeId objects corresponding to the processes running under the current user on the local machine. These processes need not correspond to jobs statrted by calling the submit() method of an instance of a LocalJobExecutor.
- submit(job)[source]¶
Submits the specified
Jobto be run locally.Successful return of this method indicates that the job has been started locally and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an
InvalidJobExceptionis thrown. If the actual submission fails for reasons outside the validity of the job, aSubmitExceptionis thrown.- Parameters
job (Job) – The job to be submitted.
- Return type
None
- class LsfJobExecutor(url, config=None)[source]¶
Bases:
BatchSchedulerExecutorA
JobExecutorfor the LSF Workload Manager.The IBM Spectrum LSF workload manager is the system resource manager on LLNL’s Sierra and Lassen, and ORNL’s Summit.
Uses the ‘bsub’, ‘bjobs’, and ‘bkill’ commands, respectively, to submit, monitor, and cancel jobs.
Creates a batch script with #BSUB directives when submitting a job.
Renders all custom attributes of the form lsf.<name> into the corresponding LSF directive. For example, setting job.spec.attributes.custom_attributes[‘lsf.core_isolation’] = ‘0’ results in a `#BSUB -core_isolation 0 directive being placed in the submit script.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (Optional[LsfExecutorConfig]) – An optional configuration for this executor.
- get_cancel_command(native_id)[source]¶
See
get_cancel_command().bkillwill exit with an error set if the job does not exist or has already finished.
- get_list_command()[source]¶
See
get_list_command().
- get_status_command(native_ids)[source]¶
See
get_status_command().- Parameters
native_ids (Collection[str]) –
- Return type
- get_submit_command(job, submit_file_path)[source]¶
See
get_submit_command().
- parse_status_output(exit_code, out)[source]¶
-
Iterate through the RECORDS entry, grabbing JOBID and STAT entries, as well as any state-change reasons if present.
- process_cancel_command_output(exit_code, out)[source]¶
See
process_cancel_command_output().Check if the error was raised only because a job already exited.
- class SlurmJobExecutor(url=None, config=None)[source]¶
Bases:
BatchSchedulerExecutorA
JobExecutorfor the Slurm Workload Manager.The Slurm Workload Manager is a widely used resource manager running on machines such as NERSC’s Perlmutter, as well as a variety of LLNL machines.
Uses the ‘sbatch’, ‘squeue’, and ‘scancel’ commands, respectively, to submit, monitor, and cancel jobs.
Creates a batch script with #SBATCH directives when submitting a job.
Renders all custom attributes set on a job’s attributes with a slurm. prefix into corresponding Slurm directives with long-form parameters. For example, job.spec.attributes.custom_attributes[‘slurm.qos’] = ‘debug’ causes a directive #SBATCH –qos=debug to be placed in the submit script.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (Optional[SlurmExecutorConfig]) – An optional configuration for this executor.
- get_cancel_command(native_id)[source]¶
See
get_cancel_command().
- get_list_command()[source]¶
See
get_list_command().
- get_status_command(native_ids)[source]¶
See
get_status_command().- Parameters
native_ids (Collection[str]) –
- Return type
- get_submit_command(job, submit_file_path)[source]¶
See
get_submit_command().