Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsl is grabbing all pre-existing EC2 instances in an AWS account on startup, causing a KeyError in self.resources in AWSProvider().status() #3764

Open
thomas-watson-tandemai opened this issue Jan 30, 2025 · 0 comments
Labels

Comments

@thomas-watson-tandemai
Copy link

thomas-watson-tandemai commented Jan 30, 2025

Describe the bug
When the AWSProvider goes to check and update the status of tracked jobs in self.resources with an empty job_ids list (happened on initial submission), the boto client method describe_instances returns all InstanceIds in that account. There could be running EC2 instances in general. This causes a KeyError at this line https://github.com/Parsl/parsl/blob/master/parsl/providers/aws/aws.py#L624.

To Reproduce
Steps to reproduce the behavior, for e.g:

  1. Setup Parsl 1.3.0-dev with Python 3.11 on cluster, laptop, ec2
  2. Run a test workflow (adding numbers) with an AWS config like this
config = Config(
    executors=[
        HighThroughputExecutor(
            label="aws_htex",
            address=address_by_hostname(),
            worker_debug=True,
            provider=AWSProvider(
                image_id="ami-XYZ",
                instance_type="t2.large",
                key_file="credentials.json",
                key_name="my-laptop",
                region="us-east-1",
                min_blocks=1,
                max_blocks=1,
                nodes_per_block=1,
                launcher=SingleNodeLauncher(debug=True, fail_on_any=True),
                worker_init="source /etc/profile && conda activate my-env"
                walltime="01:00:00",
                linger=True
            ),
        )
    ]
)

parsl.load(config)

but with credentials.json where EC2 instances are already running.
3. Let run for a few seconds
4. See error. The error always contained the same KeyError, and this was because it was an existing EC2 id.

Expected behavior
Parsl should only track the EC2 instances it submits.

    def status(self, job_ids):
        """Get the status of a list of jobs identified by their ids.

        Parameters
        ----------
        job_ids : list of str
            Identifiers for the jobs.

        Returns
        -------
        list of int
            The status codes of the requsted jobs.
        """

        all_states = []

        if len(job_ids) == 0:  # <------------------- Everything submits if I add these lines
            return all_states

        status = self.client.describe_instances(InstanceIds=list(job_ids))
        for r in status['Reservations']:
            for i in r['Instances']:
                instance_id = i['InstanceId']
                instance_state = translate_table.get(i['State']['Name'], JobState.UNKNOWN)
                instance_status = JobStatus(instance_state)
                self.resources[instance_id]['status'] = instance_status
                all_states.extend([instance_status])

        return all_states

Environment

  • OS: Ubuntu22 and WSL
  • Python 3.11
  • Parsl 1.3.0-dev

Distributed Environment

  • Where are you running the Parsl script from ? Laptop and Workstation and EC2 node
  • Where do you need the workers to run ? AWS nodes in an account where EC2 instances are already running for other services.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant