Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/transform] Add support for flat configuration style #37444

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

edmocosta
Copy link
Contributor

Description

This PR is part of #29017, and a split from #36888. It changes the transformprocessor, adding support for flat configuration styles.

Change log:

  • It now supports an additional configuration style, where statements are expressed as a list of strings, being the path's context required, and the context inferred from them thanks to the context inferrer & ottl.ParserCollection. For example:
    log_statements:
     - set(log.body, "bear") where log.attributes["http.path"] == "/animal"
     - set(resource.attributes["name"], "bear")
  • It does support mixed configuration styles.
  • The context's cache values are only shared among flat statements
  • Structured configuration cache values are still isolated, which means that a cache written using a structured configuration style will only be available for that configuration group's statements, and won't be shared with flat statements and/or other structured configuration groups, for example:
    log_statements:
    - set(resource.cache["flat"], "value")
    
    -  statements:
       - set(resource.cache["name"], "bear")
       - set(resource.attributes["name"], resource.cache["name"]) # OK
       - set(resource.attributes["name"], resource.cache["flat"]) # Fail(not set by this group of statements)
    
    - set(resource.attributes["name"], resource.cache["name"]) # Fail(not set by a flat statement)
    - set(resource.attributes["flat"], resource.cache["flat"]) # OK
    
    -  statements:
       - set(resource.attributes["name"], resource.cache["name"]) # Fail(set by another group)

Link to tracking issue

#29017

Testing

Unit tests

// Although it's configurable via `mapstructure`, users won't be able to set it on their
// configurations, as it's currently meant for internal use only, and it's validated by
// the transformprocessor Config unmarshaller function.
SharedCache bool `mapstructure:"shared_cache"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently being set programmatically, and does not allow users to configure it on their configurations (https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/37444/files#diff-1e527186a992bb04852a9e8cd6fe43ef611d0e071360c4e40a1432a30efc1d38R89).

That's a conservative approach to keep the behavior the same, but there's no technical reason to not allow it.
if you folks also think it might be useful, we could make this setting available, so users would be able to control which statement's groups are using the shared cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets be opinionated and hide it for now. Config support can be added later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we unexport it and/or remove the mapstructure tags? That would mean the unmarshal function doesn't have to worry about users trying to set it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I couldn't find a clean solution for this field, so I ended up with this approach, considering there's a possibility of making this setting available to users in the future.

Given we're still relying on mapstructure to unmarshal the configuration, unexporting this field would require both, a custom unmarshalling function for common.ContextStatements to set the field value, and some mechanism to pass this information down from the transformprocessor.Config Unmarshal function (which is the one who knows its value). Unexported fields are ignored by mapstructure as it's not possible to set their values using reflection.

I was able to unexport it and make it work by passing the extra shared_cache key here (as it's currently doing), and an extra confmap.WithIgnoreUnused() option here (otherwise mapstructure returns an error), then with that key in the conf map, we just need to read it and set the field value on the common.ContextStatements unmarshaller function. The problem with this approach is that invalid keys are not validated anymore, and we would need to validate them manually, which IMO, is not ideal.

Finally, another option would be removing the mapstructure tag and keep it exported, so we wouldn't need to worry about users trying to set it on their configurations. To set it internally, we would need to use reflection, as I initially implemented on the draft (see 498f9b1).

Do you have any thoughts or ideas on how to work it around?

// object, with empty [common.ContextStatements.Context] value.
// On the other hand, structured configurations are parsed following the mapstructure Config format.
// Mixed configuration styles are also supported.
func (c *Config) Unmarshal(component *confmap.Conf) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we typically call this conf.

Suggested change
func (c *Config) Unmarshal(component *confmap.Conf) error {
func (c *Config) Unmarshal(conf *confmap.Conf) error {

@@ -44,6 +48,63 @@ type Config struct {
logger *zap.Logger
}

// Unmarshal is used internally by mapstructure to parse the transformprocessor configuration (Config),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the godoc comment with examples of both supported config styles?

Comment on lines +86 to +91
if ok {
_, hasShareCacheKey := configuredKeys["shared_cache"]
if hasShareCacheKey {
return fmt.Errorf("%s[%d] has invalid keys: %s", fieldName, i, "shared_cache")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this check protecting against? Someone doing

trace_statements:
  - name: span
    shared_cache: true
    statements: 
        - set(name, "test")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly, although it wouldn't avoid API's consumers of setting/changing it.

Comment on lines +14 to +16
if !sharedCache {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this function including a parameter that makes is a noop, can the callers make a decision whether to call this function?

// Although it's configurable via `mapstructure`, users won't be able to set it on their
// configurations, as it's currently meant for internal use only, and it's validated by
// the transformprocessor Config unmarshaller function.
SharedCache bool `mapstructure:"shared_cache"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we unexport it and/or remove the mapstructure tags? That would mean the unmarshal function doesn't have to worry about users trying to set it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/transform Transform processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants