psij.executors package

Submodules

psij.executors.flux module

This module contains the Flux JobExecutor.

Implementation references: github.com/flux-framework/flux-core/blob/master/src/bindings/python/flux/job/executor.py flux-framework.readthedocs.io/projects/flux-core/en/latest/python/job_submission.html#the-fluxexecutor-interface

Events and state transitions: github.com/flux-framework/rfc/blob/master/spec_21.rst

class FluxJobExecutor(url=None, config=None)[source]

Bases: JobExecutor

A JobExecutor for the Flux scheduler.

The Flux resource manager framework is deployed and used on a per-user basis at many sites, and is slated to become the system-level resource manager at LLNL.

Uses Flux’s python library/bindings to submit, monitor, and manipulate jobs.

Parameters
  • url (Optional[str]) – Not used, but required by the spec for automatic initialization.

  • config (Optional[JobExecutorConfig]) – The FluxJobExecutor does not have any configuration options.

Return type

None

attach(job, native_id)[source]

Attaches a job to a process.

The job must be in the NEW state.

Parameters
  • job (Job) – The job to attach.

  • native_id (str) – The native ID of the process to attached to, as obtained through list() method.

Return type

None

cancel(job)[source]

See cancel().

Parameters

job (Job) –

Return type

None

list()[source]

See list().

Return a list of ids representing jobs that are running on the underlying implementation - in this case Flux job IDs.

Returns

The list of known tasks.

Return type

List[str]

submit(job)[source]

See submit().

Parameters

job (Job) –

Return type

None

psij.executors.local module

This module contains the local JobExecutor.

class LocalJobExecutor(url=None, config=None)[source]

Bases: JobExecutor

A job executor that runs jobs locally using subprocess.Popen.

This job executor is intended to be used either to run jobs directly on the same machine as the PSI/J library or for testing purposes.

Note

In Linux, attached jobs always appear to complete with a zero exit code regardless of the actual exit code.

Warning

Instantiation of a local executor from both parent process and a fork()-ed process is not guaranteed to work. In general, using fork() and multi-threading in Linux is unsafe, as suggested by the fork() man page. While PSI/J attempts to minimize problems that can arise when fork() is combined with threads (which are used by PSI/J), no guarantees can be made and the chances of unexpected behavior are high. Please do not use PSI/J with fork(). If you do, please be mindful that support for using PSI/J with fork() will be limited.

Parameters
  • url (Optional[str]) – Not used, but required by the spec for automatic initialization.

  • config (JobExecutorConfig) – The LocalJobExecutor does not have any configuration options.

Return type

None

attach(job, native_id)[source]

Attaches a job to a process.

The job must be in the NEW state. The exit code of the attached job will not be available upon completion and a zero exit code will always be returned for jobs attached by the LocalJobExecutor.

Parameters
  • job (Job) – The job to attach.

  • native_id (str) – The native ID of the process to attached to, as obtained through list() method.

Return type

None

cancel(job)[source]

Cancels a job.

Parameters

job (Job) – The job to cancel.

Return type

None

list()[source]

Return a list of ids representing jobs that are running on the underlying implementation.

Specifically for the LocalJobExecutor, this returns a list of ~psij.NativeId objects corresponding to the processes running under the current user on the local machine. These processes need not correspond to jobs statrted by calling the submit() method of an instance of a LocalJobExecutor.

Returns

The list of ~psij.NativeId objects corresponding to the current user’s processes running locally.

Return type

List[str]

submit(job)[source]

Submits the specified Job to be run locally.

Successful return of this method indicates that the job has been started locally and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an InvalidJobException is thrown. If the actual submission fails for reasons outside the validity of the job, a SubmitException is thrown.

Parameters

job (Job) – The job to be submitted.

Return type

None

psij.executors.rp module

This module contains the RP JobExecutor.

class RPJobExecutor(url=None, config=None)[source]

Bases: JobExecutor

A job executor that runs jobs via the RADICAL Pilot system.

Parameters
  • url (Optional[str]) – Not used, but required by the spec for automatic initialization.

  • config (Optional[JobExecutorConfig]) – The RPJobExecutor does not have any configuration options.

Return type

None

attach(job, native_id)[source]

Attaches a job to a process.

The job must be in the NEW state.

Parameters
  • job (Job) – The job to attach.

  • native_id (str) – The native ID of the process to attached to, as obtained through list() method.

Return type

None

cancel(job)[source]

Cancels a job.

Parameters

job (Job) – The job to cancel.

Return type

None

list()[source]

See list().

Return a list of ids representing jobs that are running on the underlying implementation - in this case RP task IDs.

Returns

The list of known tasks.

Return type

List[str]

submit(job)[source]

Submits the specified Job to the pilot.

Successful return of this method indicates that the job has been submitted to RP and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an InvalidJobException is thrown. If the actual submission fails for reasons outside the validity of the job, a SubmitException is thrown.

Parameters

job (Job) – The job to be submitted.

Return type

None

Module contents

A package containing psij.JobExecutor implementations.

class CobaltJobExecutor(url=None, config=None)[source]

Bases: BatchSchedulerExecutor

A JobExecutor for the Cobalt Workload Manager.

The Cobalt HPC Job Scheduler, is used by Argonne’s ALCF systems.

Uses the qsub, qstat, and qdel commands, respectively, to submit, monitor, and cancel jobs.

Creates a batch script with #COBALT directives when submitting a job.

Custom attributes prefixed with cobalt. are rendered as long-form directives in the script. For example, setting custom_attributes[‘cobalt.m’] = ‘co’ results in the #COBALT –m=co directive being placed in the submit script.

Parameters
Return type

None

generate_submit_script(job, context, submit_file)[source]

See generate_submit_script().

Parameters
Return type

None

get_cancel_command(native_id)[source]

See get_cancel_command().

Parameters

native_id (str) –

Return type

List[str]

get_list_command()[source]

See get_list_command().

Return type

List[str]

get_status_command(native_ids)[source]

See get_status_command().

Parameters

native_ids (Collection[str]) –

Return type

List[str]

get_submit_command(job, submit_file_path)[source]

See get_submit_command().

Parameters
  • job (Job) –

  • submit_file_path (Path) –

Return type

List[str]

job_id_from_submit_output(out)[source]

See job_id_from_submit_output().

Parameters

out (str) –

Return type

str

parse_status_output(exit_code, out)[source]

See parse_status_output().

Parameters
  • exit_code (int) –

  • out (str) –

Return type

Dict[str, JobStatus]

process_cancel_command_output(exit_code, out)[source]

See process_cancel_command_output().

This should be unnecessary because qdel only seems to fail on non-integer job IDs.

Parameters
  • exit_code (int) –

  • out (str) –

Return type

None

class LocalJobExecutor(url=None, config=None)[source]

Bases: JobExecutor

A job executor that runs jobs locally using subprocess.Popen.

This job executor is intended to be used either to run jobs directly on the same machine as the PSI/J library or for testing purposes.

Note

In Linux, attached jobs always appear to complete with a zero exit code regardless of the actual exit code.

Warning

Instantiation of a local executor from both parent process and a fork()-ed process is not guaranteed to work. In general, using fork() and multi-threading in Linux is unsafe, as suggested by the fork() man page. While PSI/J attempts to minimize problems that can arise when fork() is combined with threads (which are used by PSI/J), no guarantees can be made and the chances of unexpected behavior are high. Please do not use PSI/J with fork(). If you do, please be mindful that support for using PSI/J with fork() will be limited.

Parameters
  • url (Optional[str]) – Not used, but required by the spec for automatic initialization.

  • config (JobExecutorConfig) – The LocalJobExecutor does not have any configuration options.

Return type

None

attach(job, native_id)[source]

Attaches a job to a process.

The job must be in the NEW state. The exit code of the attached job will not be available upon completion and a zero exit code will always be returned for jobs attached by the LocalJobExecutor.

Parameters
  • job (Job) – The job to attach.

  • native_id (str) – The native ID of the process to attached to, as obtained through list() method.

Return type

None

cancel(job)[source]

Cancels a job.

Parameters

job (Job) – The job to cancel.

Return type

None

list()[source]

Return a list of ids representing jobs that are running on the underlying implementation.

Specifically for the LocalJobExecutor, this returns a list of ~psij.NativeId objects corresponding to the processes running under the current user on the local machine. These processes need not correspond to jobs statrted by calling the submit() method of an instance of a LocalJobExecutor.

Returns

The list of ~psij.NativeId objects corresponding to the current user’s processes running locally.

Return type

List[str]

submit(job)[source]

Submits the specified Job to be run locally.

Successful return of this method indicates that the job has been started locally and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an InvalidJobException is thrown. If the actual submission fails for reasons outside the validity of the job, a SubmitException is thrown.

Parameters

job (Job) – The job to be submitted.

Return type

None

class LsfJobExecutor(url, config=None)[source]

Bases: BatchSchedulerExecutor

A JobExecutor for the LSF Workload Manager.

The IBM Spectrum LSF workload manager is the system resource manager on LLNL’s Sierra and Lassen, and ORNL’s Summit.

Uses the ‘bsub’, ‘bjobs’, and ‘bkill’ commands, respectively, to submit, monitor, and cancel jobs.

Creates a batch script with #BSUB directives when submitting a job.

Renders all custom attributes of the form lsf.<name> into the corresponding LSF directive. For example, setting job.spec.attributes.custom_attributes[‘lsf.core_isolation’] = ‘0’ results in a `#BSUB -core_isolation 0 directive being placed in the submit script.

Parameters
generate_submit_script(job, context, submit_file)[source]

See generate_submit_script().

Parameters
Return type

None

get_cancel_command(native_id)[source]

See get_cancel_command().

bkill will exit with an error set if the job does not exist or has already finished.

Parameters

native_id (str) –

Return type

List[str]

get_list_command()[source]

See get_list_command().

Return type

List[str]

get_status_command(native_ids)[source]

See get_status_command().

Parameters

native_ids (Collection[str]) –

Return type

List[str]

get_submit_command(job, submit_file_path)[source]

See get_submit_command().

Parameters
  • job (Job) –

  • submit_file_path (Path) –

Return type

List[str]

job_id_from_submit_output(out)[source]

See job_id_from_submit_output().

Parameters

out (str) –

Return type

str

parse_status_output(exit_code, out)[source]

See parse_status_output().

Iterate through the RECORDS entry, grabbing JOBID and STAT entries, as well as any state-change reasons if present.

Parameters
  • exit_code (int) –

  • out (str) –

Return type

Dict[str, JobStatus]

process_cancel_command_output(exit_code, out)[source]

See process_cancel_command_output().

Check if the error was raised only because a job already exited.

Parameters
  • exit_code (int) –

  • out (str) –

Return type

None

class SlurmJobExecutor(url=None, config=None)[source]

Bases: BatchSchedulerExecutor

A JobExecutor for the Slurm Workload Manager.

The Slurm Workload Manager is a widely used resource manager running on machines such as NERSC’s Perlmutter, as well as a variety of LLNL machines.

Uses the ‘sbatch’, ‘squeue’, and ‘scancel’ commands, respectively, to submit, monitor, and cancel jobs.

Creates a batch script with #SBATCH directives when submitting a job.

Renders all custom attributes set on a job’s attributes with a slurm. prefix into corresponding Slurm directives with long-form parameters. For example, job.spec.attributes.custom_attributes[‘slurm.qos’] = ‘debug’ causes a directive #SBATCH –qos=debug to be placed in the submit script.

Parameters
generate_submit_script(job, context, submit_file)[source]

See generate_submit_script().

Parameters
Return type

None

get_cancel_command(native_id)[source]

See get_cancel_command().

Parameters

native_id (str) –

Return type

List[str]

get_list_command()[source]

See get_list_command().

Return type

List[str]

get_status_command(native_ids)[source]

See get_status_command().

Parameters

native_ids (Collection[str]) –

Return type

List[str]

get_submit_command(job, submit_file_path)[source]

See get_submit_command().

Parameters
  • job (Job) –

  • submit_file_path (Path) –

Return type

List[str]

job_id_from_submit_output(out)[source]

See job_id_from_submit_output().

Parameters

out (str) –

Return type

str

parse_status_output(exit_code, out)[source]

See parse_status_output().

Parameters
  • exit_code (int) –

  • out (str) –

Return type

Dict[str, JobStatus]

process_cancel_command_output(exit_code, out)[source]

See process_cancel_command_output().

Parameters
  • exit_code (int) –

  • out (str) –

Return type

None