Summary of errors in logs that are not yet monitored

Here is a summary of `error=""` entries in our logs that we may want to monitor more closely in our metrics. We may want to do like Heroku does and assign code values to these error cases. We should ensure these all have an `at=error ` prefix so that they can be easily ingested from logs.

* `error="canceling statement due to statement timeout"`
* `error="unhealthy database pool"`
* `error="there is no unique or exclusion constraint matching the ON CONFLICT specification"`
* `error="end of file reached"` (on crate publish endpoint)
* `downloads_counter error: unhealthy database pool`
  * Only 1 occurrence in the last month. I'm opening a PR to log this as `at=error mod=downloads_counter error="unhealthy database pool"`.
* `Error: error sending request for url (https://events.pagerduty.com/generic/2010-04-15/create_event.json): operation timed out`
  * A few occurrences in the last month. (We send a resolved update, if nothing is wrong, and this failed a few times.)
* `error="failed to upload crate: error sending request for url (https://crates-io.s3-us-west-1.amazonaws.com/crates/xyz/xyz-0.2.0.crate): connection closed before message completed"`
* From crate_owner_invitations. These don't seem like internal status=500 errors, and should maybe be user facing instead: `error="missing user {private.inviter_id}"`, `error="missing crate with id {invitation.crate_id}"`.

Additionally, we may want to add an `at=warn ` prefix that could be used to flag slow requests and other operationally interesting events that aren't strictly errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Summary of errors in logs that are not yet monitored #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Summary of errors in logs that are not yet monitored #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions