Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App Framework: Add option to use path style s3 URLs #1291

Closed
paheath opened this issue Feb 22, 2024 · 28 comments
Closed

App Framework: Add option to use path style s3 URLs #1291

paheath opened this issue Feb 22, 2024 · 28 comments
Assignees
Labels
2.6.0 app framework New App Framework enhancement New feature or request

Comments

@paheath
Copy link

paheath commented Feb 22, 2024

Please select the type of request

Enhancement

Tell us more

Describe the request
I am deploying the operator in an on-prem environment with a storage solution that only supports path style s3 URLs. As far as I can tell, the operator defaults to using virtual host style s3 URLs to download apps. I propose making the current behavior remain the default, and provide an option in the AppFramework spec to explicitly set the s3 URLs to path style. I rebuilt the operator with S3ForcePathStyle: aws.Bool(true) added here and the app framework worked as expected.

Smartstore offers a similar option to specify the url version, and defaults to path style. See remote.s3.url_version here.

Expected behavior
Force the s3 client to use path style URLs when downloading apps, when set as such in the AppFramework spec.

Splunk setup on K8S
SearchHeadCluster, IndexerCluster, ClusterManager, LicenseManager, MonitoringConsole, and Standalone heavy forwarder.

Reproduction/Testing steps
Enable path style s3 URLs via the AppFramework spec. Verify that apps are correctly downloaded and installed.

K8s environment
On-prem k8s cluster with on-prem s3-compatible NAS.

@paheath paheath added the app framework New App Framework label Feb 22, 2024
@yaroslav-nakonechnikov
Copy link

i guess this is related: #1030 (comment)

@vivekr-splunk
Copy link
Collaborator

vivekr-splunk commented Mar 29, 2024

Hello @yaroslav-nakonechnikov @paheath we will work on this change and get back to you

@vivekr-splunk vivekr-splunk added enhancement New feature or request Q2 labels Apr 24, 2024
@akondur
Copy link
Collaborator

akondur commented Apr 30, 2024

Hello @paheath , we are exploring possible solutions to the path style S3 URLs. Meanwhile, can you please provide an example of the working(with the modified Splunk operator image) appFramework configurations for the path style URLs?

Also, path style URLs will be discontinued per AWS documentation.

Currently, Amazon S3 supports both virtual-hosted–style and path-style URL access in all AWS Regions. However, path-style URLs will be discontinued in the future. For more information, see the following Important note.

@paheath
Copy link
Author

paheath commented Apr 30, 2024

This is an excerpt from my helm chart, and the underlying operator image is modified as indicated in the original bug description. I don't think any of the value substitutions necessarily impact the functionality. I've defined it in the yaml as documented here https://splunk.github.io/splunk-operator/AppFramework.html

appRepo:
  appsRepoPollIntervalSeconds: {{ .Values.configPollInterval }}
  defaults:
    volumeName: {{ .Values.volumeName }}
  appSources:
  - name: node
    location: node/
    scope: local
  volumes:
  - name: {{ .Values.volumeName }}
    storageType: s3
    path: {{ .Values.bucketPath }}/
    provider: aws
    region: {{ .Values.bucketRegion }}
    endpoint: {{ .Values.bucketEndpoint }}
    secretRef: {{ .Values.secretRef }}

@akondur
Copy link
Collaborator

akondur commented Apr 30, 2024

Hi @paheath , thanks for the example above. To further test our solution, are you able to let us know the storage provider being used to test path style S3 URLs? Currently, by default AWS S3 buckets support both path style as well as virtual hosted. I was able to test path style specifically on S3 buckets.

@paheath
Copy link
Author

paheath commented Apr 30, 2024

I'm testing against an on-prem s3-compatible NAS. I think testing against any s3-compatible storage might be sufficient, as long as you can confirm the outbound request is hitting the path-style endpoint when configured to do so. Maybe even locally block outbound traffic to the virtual endpoint. Testing might be similar to how the smartstore path-style config is tested.

@akondur
Copy link
Collaborator

akondur commented May 22, 2024

@paheath Are you able to test the changes in the MR to see if its working before we merge? If there is something missing, please comment on the MR or here it will be fixed.

@akondur
Copy link
Collaborator

akondur commented May 30, 2024

@paheath Please let us know if this solution works so we can merge it.

@paheath
Copy link
Author

paheath commented Jun 3, 2024

Unfortunately I can't get this change to work. I'm seeing my clustermanager instance reporting Ready, but all the apps in the description report this:

        appDeploymentInfo:                                                                                                                                                                    
        - appName: myapp.tgz                                                                                                                                                           
          appPackageTopFolder: ""                                                                                                                                                             
          deployStatus: 1                                                                                                                                                                     
          isUpdate: false                                                                                                                                                                     
          objectHash: <hash>                                                                                                                                        
          phaseInfo:                                                                                                                                                                          
            failCount: 3                                                                                                                                                                      
            phase: download                                                                                                                                                                   
            status: 199                                                                                                                                                                       
          repoState: 1

and the associated indexer cluster never reconciles. I don't see the apps appear in the pod under /opt/splunk/etc/apps or /opt/splunk/etc/manager-apps

@akondur
Copy link
Collaborator

akondur commented Jun 3, 2024

Hey @paheath , can you share any Splunk Operator pod logs indicating any errors?

The CR status code 199 indicates that the app package was not downloaded properly.

@paheath
Copy link
Author

paheath commented Jun 4, 2024

Appears to be running through this periodically for the nodes using app framework:

2024-06-04T00:47:27.481032478Z  INFO    updatePplnWorkerPhaseInfo   changing the status {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "appName": "app.tgz", "old status": "Download In Progress", "new status": "Download Pending"}
2024-06-04T00:47:27.657331829Z  INFO    downloadPhaseManager    Download worker got a run slot  {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "name": "lm", "namespace": "test", "App name": "app.tgz", "digest": "<digest>"} 
2024-06-04T00:47:27.663811632Z  INFO    isAppAlreadyDownloaded  App not present on operator pod {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "app name": "app.tgz"}
2024-06-04T00:47:27.663872366Z  INFO    updatePplnWorkerPhaseInfo   changing the status {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "appName": "app.tgz", "old status": "Download Pending", "new status": "Download In Progress"}
2024-06-04T00:47:27.664103782Z  INFO    GetRemoteStorageClient  Creating the client {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "name": "lm", "namespace": "test", "volume": "config-repo", "bucket": "<bucket>", "bucket path": "lic_manager/"}
2024-06-04T00:47:27.664283386Z  INFO    InitAWSClientSession    AWS Client Session initialization successful.   {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "region": "zone1", "TLS Version": "TLS 1.2"}
2024-06-04T00:47:27.820996027Z  ERROR   DownloadApp Unable to download item {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "remoteFile": "lic_manager/app.tgz", "localFile": "/opt/splunk/appframework/downloadedApps/test/LicenseManager/lm/local/lic_manager/app.tgz_<etag>", "etag": "<etag>", "RemoteFile": "lic_manager/app.tgz", "error": "stream error: stream ID 7; NO_ERROR; received from peer"}
github.com/splunk/splunk-operator/pkg/splunk/client.(*AWSS3Client).DownloadApp
    /workspace/pkg/splunk/client/awss3client.go:277
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*RemoteDataClientManager).DownloadApp
    /workspace/pkg/splunk/enterprise/util.go:842
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*PipelineWorker).download
    /workspace/pkg/splunk/enterprise/afwscheduler.go:497
2024-06-04T00:47:27.821131931Z  ERROR   PipelineWorker.Download()   unable to download app  {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "name": "lm", "namespace": "test", "App name": "app.tgz", "objectHash": "<digest>", "appName": "app.tgz", "error": "stream error: stream ID 7; NO_ERROR; received from peer"}
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*PipelineWorker).download
    /workspace/pkg/splunk/enterprise/afwscheduler.go:499

@paheath
Copy link
Author

paheath commented Jun 5, 2024

This is the cluster manager app framework spec I'm using. Same as before with s3PathUrl: true set.

appRepo:
  appsRepoPollIntervalSeconds: {{ .Values.configPollInterval }}
  defaults:
    volumeName: {{ .Values.volumeName }}
  appSources:
  - name: node
    location: node/
    scope: local
  volumes:
  - name: {{ .Values.volumeName }}
    storageType: s3
    path: {{ .Values.bucketPath }}/
    provider: aws
    region: {{ .Values.bucketRegion }}
    s3PathUrl: true
    endpoint: {{ .Values.bucketEndpoint }}
    secretRef: {{ .Values.secretRef }}

@akondur
Copy link
Collaborator

akondur commented Jun 5, 2024

Hey @paheath , whilst we are debugging further were you able to successfully install the new CRDs on the new cluster before deploying the clusterManager CR? Please let us know.

@paheath
Copy link
Author

paheath commented Jun 5, 2024

Yes, I updated the CRDs beforehand. And the cluster manager accepted the s3PathUrl setting.

@paheath
Copy link
Author

paheath commented Jun 5, 2024

Well, maybe it did not take. In the cluster manager spec s3PathUrl is set to true. But when I describe the cluster manager, I see status.Smartstore.Volumes.s3PathUrl is false. Was s3PathUrl added for smartstore also?

@paheath
Copy link
Author

paheath commented Jun 5, 2024

Disregard, I see status.AppContext.AppRepo.AppSources.Volumes.s3PathUrl is set to true as expected. I didn't catch that the false setting was in the smartstore status section.

@akondur
Copy link
Collaborator

akondur commented Jun 6, 2024

Thank you @paheath . I believe we are setting the pathStyleUrl in the AWS S3 client. It is an update of the S3 client(vs during creation in your successful example here) before creating the downloader. Some posts online don't recommend updating the client once created. I will try and cater the changes to update this option during creation.

@akondur
Copy link
Collaborator

akondur commented Jun 6, 2024

@paheath Are you able to try it out with the latest changes?

@akondur
Copy link
Collaborator

akondur commented Jun 10, 2024

@paheath Please let us know if the latest changes are working.

@paheath
Copy link
Author

paheath commented Jun 10, 2024

Forgive me, my bandwidth is limited at the moment. I will do my best to get to this today.

@paheath
Copy link
Author

paheath commented Jun 11, 2024

With the latest patch I'm seeing the same "unable to download item" error logs as before. The general behavior is also the same, blocking indexer cluster creation.

@akondur
Copy link
Collaborator

akondur commented Jun 11, 2024

Hi @paheath , thank you for testing. Are you able to provide us Splunk operator pod logs similar to this:

2024-06-06T01:03:17.019639356Z  INFO    InitAWSClientSession    Setting up AWS SDK client       {"controller": "standalone", "controllerGroup": "enterprise.splunk.com", "controllerKind": "Standalone", "Standalone": {"name":"example","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "ido", "reconcileID": "4c684039-fe1b-4bea-b550-ce618f2ef57e", "regionWithEndpoint": "us-west-2|https://s3-us-west-2.amazonaws.com", "pathStyleUrl": true}

The changes in the MR are made are keeping in mind this issue's description and changes were made here.

@paheath
Copy link
Author

paheath commented Jun 14, 2024

I see similar logs for all nodes using the app framework (standalone, licensemanager, clustermanager)

2024-06-14T21:32:41.612345298Z  INFO    InitAWSClientSession    Setting up AWS SDK client       {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "<id>", "regionWithEndpoint": "zone1|https://<endpoint-fqdn>", "pathStyleUrl": true}
2024-06-14T21:32:41.61252801Z   INFO    InitAWSClientSession    AWS Client Session initialization successful.   {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "<id>", "region": "<region>", "TLS Version": "TLS 1.2"}

@akondur
Copy link
Collaborator

akondur commented Jun 25, 2024

Hey @paheath , in the MR, the field S3ForcePathStyle of aws.Config is being set here per your original request. Were there any other changes made to make this work? If not, are you able to open a customer JIRA with Splunk Support so we can debug the issue further?

@paheath
Copy link
Author

paheath commented Jun 27, 2024

I've been able to test this a little more thoroughly today. I only had to add that one line to make this work previously, but I was testing on top of 2.4.0. I was able to reproduce this successfully on top of 2.4.0 today, but cherrypicking the one-line change on top of 2.5.2 did not work. Can you think of anything that has changed between 2.4.0 and 2.5.2 that would affect the behavior of the aws s3 client? I compared the two releases, but I couldn't see anything obvious. I assume whatever is breaking this in 2.5.2 is also breaking your PR.

@akondur
Copy link
Collaborator

akondur commented Aug 1, 2024

Hi @paheath , after the comparison between 2.4.0 and 2.5.2 I couldn't see any major differences that would cause the aws sdk client to behave differently.

We just released 2.6.0. The MR has been rebased. Could you please try with the new version?

@akondur
Copy link
Collaborator

akondur commented Aug 8, 2024

Hey @paheath , did you get a chance to try with 2.6.0? If it's not working can you please open a Splunk support case with these details?

@akondur
Copy link
Collaborator

akondur commented Aug 19, 2024

Closing the issue for now. Please re-open a Splunk support ticket if the issue persists.

@akondur akondur closed this as completed Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.6.0 app framework New App Framework enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants