-
Notifications
You must be signed in to change notification settings - Fork 649
Implement ProcessCdnLog
and ProcessCdnLogQueue
background jobs
#8036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #8036 +/- ##
==========================================
- Coverage 87.55% 87.38% -0.17%
==========================================
Files 265 270 +5
Lines 25663 26172 +509
==========================================
+ Hits 22468 22871 +403
- Misses 3195 3301 +106 ☔ View full report in Codecov by Sentry. |
58deb25
to
0aaebfa
Compare
Since CodeCov was unhappy I've now doubled the size of the diff and added a couple of tests... 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to having this "in production" so we can check it against the existing logic!
|
||
debug!("Deleting message {message_id} from the CDN log queue…"); | ||
queue | ||
.delete_message(receipt_handle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a question for infra, but this queue is configured to only deliver messages exactly once, right?
(What I'm concerned about here is double counting if this job runs twice, and the second receive_messages()
call occurs before messages being handled by the first job are deleted.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this queue is configured to only deliver messages exactly once, right?
from what I've seen so far that does not appear to be the case. I assume if it was the case then the delivered messages would be auto-deleted upon reception, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will treat this as a non-blocking concern since the code here is experimental and read-only for now anyway. If we can change the queue to use once-only delivery then we can probably just delete the delete_message()
calls here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a question for infra, but this queue is configured to only deliver messages exactly once, right?
This is a standard queue that only guarantees at least once
delivery. If we want exactly once
, we'd need to recreate the queue as a FIFO queue.
This allows us to access the `config` at runtime too, instead of only having access at startup.
2cb6c5c
to
64b0a7f
Compare
This allows us to use in-memory storage for testing purposes and fall back to the local file system for development.
64b0a7f
to
c27b101
Compare
This PR builds on top of #8010, and implements two new background worker jobs.
The
ProcessCdnLog
job takes an AWS region, S3 bucket name and object path inside the bucket, connects to the bucket via theobject_store
crate, and then stream-downloads the file while runningcount_downloads()
on it. If download counting is successful it prints a summary of the result to the logs.The
ProcessCdnLogQueue
job connects to an AWS SQS queue, receives messages from it, parses them and enqueuesProcessCdnLog
jobs based on the content of these messages.You may be asking: "Where are the counted downloads saved to the database?!" and the answer is: nowhere. The plan is to run this read-only setup in production for a couple of days to ensure that the download counting works correctly with production data. Once we have verified that everything works as expected, we will adjust the
ProcessCdnLog
job to save the result to the database instead of or in addition to reporting it to the logs.Best reviewed commit by commit 😉