Deployment cdk script fails with database 404 #15

kittyandrew · 2024-08-01T14:11:02Z

After configuring everything according to the readme and more ("bootstrapping" aws sdk and creating engineering group etc), I'm stuck with a new error:

This seems to be related to some database setup, but it doesn't seem to be described anywhere inside readme and not related to the existing database setup section, which comes much later than npm run deploy.

From my limited AWS experience and after reading readme its also strange that it failed to find "default" group or subnet, because as far as I understand this is a reserve name that should exist already (?).

Please let me know if I'm missing something trivial, or doing something wrong.

The text was updated successfully, but these errors were encountered:

dmmiller · 2024-08-06T08:47:44Z

@tyrtel, was this what you were seeing as well? Were you able to get past it?

dmmiller · 2024-08-06T08:51:13Z

@flooey , did you run into this at all? And if so, what was the fix/workaround?

yukigesho · 2024-08-08T23:10:52Z

I got the same error and have been stuck trying to figure it out. Have you found any solution or workaround for this? @kittyandrew

kittyandrew · 2024-08-08T23:36:45Z

nope

yukigesho · 2024-08-09T20:39:47Z

I discovered that in /ops/aws/src/radical-stack/rds/ProdReplica.ts the default subnet group is hardcoded. However, it cannot be simply 'default'; rather, it should be 'default-vpc-xxxxxxxxxxxxx', which you can verify in the RDS/Subnet groups. After updating this value, another error surfaced. To resolve it, create a security group in your VPC and update the values in /ops/aws/src/radical-stack/rds/ProdReplica.ts. It's also important to note that ops/aws/src/radical-stack/ec2/vpc.ts contains hardcoded IPs and subnet masks. Make sure that everything is configured correctly there too. Currently, I am struggling with this issue:

I hope it's just a matter of time.

yukigesho · 2024-08-09T22:29:12Z

Okay, now I received:

It's progress, but this error message doesn't help me much.
Maybe someone knows the root cause of it?

dmmiller · 2024-08-12T17:13:25Z

Someone else ran into that and they just removed monitoring from the deployment to see if they could get it to work. That got them past the monitoring issue.

yukigesho · 2024-08-14T20:06:37Z

@dmmiller I'm experiencing a freeze with no error messages, and the screen has been unchanged for about two hours. Here's the screen I'm stuck on:

I minimized sizes to see if it would enable deployment (I'm using the free tier). Could this adjustment be causing the freeze? I'm unsure if this is a minor issue or if it indicates a deeper problem.

For example, in ops/aws/src/radical-stack/ec2/autoScalingGroup.ts, I changed EC2.InstanceSize.XLARGE to EC2.InstanceSize.LARGE.

const asg = new autoScaling.AutoScalingGroup(
  radicalStack(),
  `${tier}ServerLondonASG`,
  {
    autoScalingGroupName,
    instanceType: EC2.InstanceType.of(
      // m5.xlarge: general purpose instance type
      // 4 vCPUs, 16 GiB of RAM
      EC2.InstanceClass.M5,
      EC2.InstanceSize.LARGE
    ...

jwatzman · 2024-08-24T15:18:47Z

Driving by, if anyone wants to actually fix the issue with monitoring, I have a potential lead from my memory of the issues it had at Cord. It's something like... there is a persistent drive that monitoring uses so that historical data isn't lost when the ec2 instance is rebuilt. The IAM permissions are set so that only the monitoring instance can mount that drive. At least in steady-state, this would cause issues when the ec2 instance would get rebuilt because CF wouldn't assign the new IAM role to the new monitoring instance (which is what allows it to mount the drive) until the instance was up/healthy, but it wouldn't be considered up/healthy until it had mounted the drive. Or something like that -- I didn't 100% pin it down but that is my recollection of what I strongly suspected was going on.

This used to (somehow!) work and then it broke earlier this year. We rebuilt monitoring infrequently enough that I just assigned the drive by hand in the ec2 console in the timeframe that CF was waiting on it (you have about a 10m window, though you need to be watching to find the right time...) the two or three times I hit it.

I'd be unsurprised if an extremely similar issue affected creating monitoring for the first time.

If you want to at least try to have a monitoring instance, you can try removing that persistent drive logic and just having monitoring write to the raw root drive (which you might want to increase in size a little bit). You'll lose all data on rebuild, but it will at least give you something.

Relevant links:

cord/ops/aws/src/radical-stack/ec2/monitoring.ts

Line 40 in f40c5d3

// Attach and mount the data volume

cord/ops/aws/src/radical-stack/ec2/monitoring.ts

Lines 282 to 290 in f40c5d3

    
           const dataVolume = define( 
        
             () => 
        
               new EC2.Volume(radicalStack(), 'monitoring-dataVolume', { 
        
                 availabilityZone, 
        
                 size: cdk.Size.gibibytes(80), 
        
                 volumeName: 'monitoringData', 
        
                 removalPolicy: cdk.RemovalPolicy.RETAIN, 
        
               }), 
        
           );

cord/ops/aws/src/radical-stack/ec2/monitoring.ts

Line 349 in f40c5d3

policy.attachToRole(monitoringInstance().role);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment cdk script fails with database 404 #15

Deployment cdk script fails with database 404 #15

kittyandrew commented Aug 1, 2024

dmmiller commented Aug 6, 2024

dmmiller commented Aug 6, 2024

yukigesho commented Aug 8, 2024

kittyandrew commented Aug 8, 2024

yukigesho commented Aug 9, 2024

yukigesho commented Aug 9, 2024

dmmiller commented Aug 12, 2024

yukigesho commented Aug 14, 2024

jwatzman commented Aug 24, 2024

Deployment cdk script fails with database 404 #15

Deployment cdk script fails with database 404 #15

Comments

kittyandrew commented Aug 1, 2024

dmmiller commented Aug 6, 2024

dmmiller commented Aug 6, 2024

yukigesho commented Aug 8, 2024

kittyandrew commented Aug 8, 2024

yukigesho commented Aug 9, 2024

yukigesho commented Aug 9, 2024

dmmiller commented Aug 12, 2024

yukigesho commented Aug 14, 2024

jwatzman commented Aug 24, 2024