Skip to content

Commit

Permalink
feedback: more details on severity, language-specific, replacement fo…
Browse files Browse the repository at this point in the history
…r span events
  • Loading branch information
lmolkova committed Jan 14, 2025
1 parent adfc60b commit 0bc7c56
Showing 1 changed file with 85 additions and 31 deletions.
116 changes: 85 additions & 31 deletions oteps/4333-recording-exceptions-on-logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,70 +50,124 @@ starting point, but they are encouraged to adjust it to their needs.

This guidance boils down to the following:

Instrumentations should record exception information (along with other context) as a log record and
use appropriate severity - only unhandled exceptions should be recorded as `Error` or higher.
Instrumentations should strive to report each exception once.
Instrumentations SHOULD record exception information (along with other context) as a log record with appropriate severity.
Only unhandled exceptions SHOULD be recorded as `Error` or higher. Instrumentations SHOULD do the best effort to report
each exception once.

Instrumentations should provide the whole exception instance to the OTel SDK (instead of individual attributes)
and the SDK should, based on configuration, decide which information to record. The default SDK
behavior should be to record exception stack traces when logging exceptions at `Error` or higher severity.
Instrumentations SHOULD provide the whole exception instance to the OTel SDK so it can
record it fully or partially based on provided configuration. The default SDK behavior SHOULD
be to record exception stack traces when logging exceptions at `Error` or higher severity.

In the long term, exceptions recorded on logs will replace span events (according to [Event vision OTEP](./0265-event-vision.md)).

### Details

1. Exceptions should be recorded as [logs](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/exceptions/exceptions-logs.md)
1. Exceptions SHOULD be recorded as [logs](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/exceptions/exceptions-logs.md)
or [log-based events](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/events.md)

2. Instrumentations for incoming requests, message processing, background job execution, or others that wrap user code and usually create local root spans, should record logs
for unhandled exceptions with `Error` severity.
2. Instrumentations for incoming requests, message processing, background job execution, or others that wrap user code and usually
create local root spans, SHOULD record logs for unhandled exceptions with `Error` severity.

Some runtimes provide global exception handler that can be used to log exceptions.
Priority should be given to the instrumentation point where the operation context is available.
Language SIGs are encouraged to give runtime-specific guidance. For example, here is the
[.NET guidance](https://github.com/open-telemetry/opentelemetry-dotnet/blob/610045298873397e55e0df6cd777d4901ace1f63/docs/trace/reporting-exceptions/README.md#unhandled-exception)
for recording exceptions on traces.

3. Natively instrumented libraries should record a log describing an exception and the context it happened in
3. Natively instrumented libraries SHOULD record a log describing an exception and the context it happened in
as soon as the exception is detected (or where the most context is available).

4. It's not recommended to record the same exception as it propagates through the stack frames, or
4. It's NOT RECOMMENDED to record the same exception as it propagates through the stack frames, or
to attach the same instance of an exception to multiple log records.

5. An error should be logged with appropriate severity depending on the available context.
5. An exception (or error) SHOULD be logged with appropriate severity depending on the available context.

- Exceptions or errors that don't indicate actual issues SHOULD be recorded with
severity not higher than `Info`.

Such exceptions can be used to control application logic and have a minor impact, if any,
on application functionality, availability, or performance.

Examples:

- exception is thrown when checking optional dependency or resource existence.
- exception thrown when client disconnects before reading full response from the server

- Exceptions or errors that are expected to be retried or handled by the caller or another
layer of the component SHOULD be recorded with severity not higher than `Warning`.

Such exceptions represent transient failures that are common and expected in
distributed applications. They typically increase the latency of individual
operations and have a minor impact on overall application availability.

Examples:

- attempt to connect to the required remote dependency times out
- remote dependency returns 401 "Unauthorized" response code
- writing data to a file results in IO exception
- remote dependency returned 503 "Service Unavailable" response for 5 times in a row,
retry attempts are exhausted and the corresponding operation has failed.

- Errors that don't indicate any issue should be recorded with severity not higher than `Info`.
- Transient errors (even if it's the last try) should be recorded with severity not higher than `Warning`.
- Unhandled exceptions that don't result in application shutdown should be recorded with severity `Error`
- Errors that result in application shutdown should be recorded with severity `Fatal`.
- Unhandled (by the user code) exceptions that don't result in application shutdown SHOULD
be recorded with severity `Error`

6. When recording exception on logs, user applications and instrumentations are encouraged to add additional attributes
These exceptions are not expected and may indicate a bug in the application logic
that this application instance was not able to recover from or a gap in the error
handling logic.

Examples:

- Background job terminates with an exception
- HTTP framework error handler catches exception thrown by the user code.

Note: some frameworks use exceptions as a communication mechanism when request fails. For example,
Spring users can throw [ResponseStatusException](https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/server/ResponseStatusException.html)
exception to return unsuccessful status code. Such exceptions represent errors already handled by the application code.
Application code, in this case, is expected to logs error at appropriate severity and
general-purpose instrumentation SHOULD NOT record such exceptions.

- Exceptions or errors that result in application shutdown SHOULD be recorded with severity `Fatal`.

- The application detects an invalid configuration at startup and shuts down.
- The application encounters a (presumably) terminal error, such as an out-of-memory condition.

1. When recording exception on logs, user applications and instrumentations are encouraged to add additional attributes
to describe the context that the exception was thrown in.
They are also encouraged to define their own error events and enrich them with exception details.

7. OTel SDK should record stack traces on exceptions with severity `Error` or higher and should allow users to
2. OTel SDK SHOULD record stack traces on exceptions with severity `Error` or higher and SHOULD allow users to
change the threshold.

See [logback exception config](https://logback.qos.ch/manual/layouts.html#ex) for an example of configuration that
records stack trace conditionally.

3. Instrumentation libraries that record exceptions using span events SHOULD gracefully migrate
to log-based exceptions offering it as an opt-in feature first and then switching to log-based exceptions
in the next major version update.

## API changes

> [!IMPORTANT]
> [!NOTE]
>
> It should not be an instrumentation concern to decide whether exception stack trace
> should be recorded or not.
>
> A natively instrumented library may write logs providing exception instance
> through a log bridge and not be aware of this guidance.
>
> OTel should provide API like `setException` when creating log record that will record only necessary information depending
> on the configuration and log severity.
> It also may be desirable by some vendors/apps to record all exception details at all levels.
It should not be an instrumentation library concern to decide whether exception stack trace should be recorded or not.
An instrumentation library may write logs providing an exception instance through a log bridge and not be aware of this guidance.
OTel Logs API SHOULD provide methods that enrich log record with exception details such as
`setException(exception)` and similar to [RecordException](../specification/trace/api.md#L682) method on span.

It also may be desirable by some vendors/apps to record all exception details at all levels.
OTel SDK, based on the log severity and configuration, SHOULD record exception details fully or partially.

OTel Logs API should provide methods that enrich log record with exception details such as
`setException(exception)` similar to [RecordException](../specification/trace/api.md?plain=1#L682)
method on span.
The signature of the method is to be determined by each language
and can be overloaded as appropriate including ability to collect and customize stack trace
collection.

OTel SDK should implement such methods and set exception attributes based on configuration
and in the optimal way. This would also provide a more scalable way to evolve this guidance
and extend configuration options if necessary.
It MUST be possible to efficiently set exception information on a log record without
using the `setException` method.

## Examples

Expand Down Expand Up @@ -280,7 +334,7 @@ See [corresponding Java (tracing) instrumentation](https://github.com/open-telem

## Trade-offs and mitigations

1. Breaking change for any component following existing [exception guidance](https://github.com/open-telemetry/opentelemetry-specification/blob/a265ae0628177be25dc477ea8fe200f1c825b871/specification/trace/exceptions.md) which recommends recording exceptions as span events in every instrumentation that detects them.
1. Breaking change for any component following existing [exception guidance](/specification/trace/exceptions.md) which recommends recording exceptions as span events in every instrumentation that detects them.

**Mitigation:**
- OpenTelemetry API and/or SDK in the future may provide opt-in span events -> log-based events conversion,
Expand Down

0 comments on commit 0bc7c56

Please sign in to comment.