Implement `ProcessCdnLog` and `ProcessCdnLogQueue` background jobs #8036

Turbo87 · 2024-01-31T09:16:00Z

This PR builds on top of #8010, and implements two new background worker jobs.

The ProcessCdnLog job takes an AWS region, S3 bucket name and object path inside the bucket, connects to the bucket via the object_store crate, and then stream-downloads the file while running count_downloads() on it. If download counting is successful it prints a summary of the result to the logs.

The ProcessCdnLogQueue job connects to an AWS SQS queue, receives messages from it, parses them and enqueues ProcessCdnLog jobs based on the content of these messages.

You may be asking: "Where are the counted downloads saved to the database?!" and the answer is: nowhere. The plan is to run this read-only setup in production for a couple of days to ensure that the download counting works correctly with production data. Once we have verified that everything works as expected, we will adjust the ProcessCdnLog job to save the result to the database instead of or in addition to reporting it to the logs.

Best reviewed commit by commit 😉

codecov · 2024-01-31T09:25:25Z

Codecov Report

Attention: 118 lines in your changes are missing coverage. Please review.

Comparison is base (1f4abb9) 87.55% compared to head (c27b101) 87.38%.

Files	Patch %	Lines
src/worker/jobs/downloads.rs	86.42%	52 Missing ⚠️
src/config/cdn_log_storage.rs	29.03%	22 Missing ⚠️
src/sqs.rs	56.86%	22 Missing ⚠️
src/config/cdn_log_queue.rs	0.00%	17 Missing ⚠️
src/admin/enqueue_job.rs	0.00%	2 Missing ⚠️
src/config/server.rs	0.00%	2 Missing ⚠️
src/bin/background-worker.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8036      +/-   ##
==========================================
- Coverage   87.55%   87.38%   -0.17%     
==========================================
  Files         265      270       +5     
  Lines       25663    26172     +509     
==========================================
+ Hits        22468    22871     +403     
- Misses       3195     3301     +106

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Turbo87 · 2024-01-31T14:10:06Z

Since CodeCov was unhappy I've now doubled the size of the diff and added a couple of tests... 😅

LawnGnome

Looking forward to having this "in production" so we can check it against the existing logic!

src/worker/jobs/downloads.rs

LawnGnome · 2024-01-31T22:14:09Z

src/worker/jobs/downloads.rs

+
+                debug!("Deleting message {message_id} from the CDN log queue…");
+                queue
+                    .delete_message(receipt_handle)


Maybe a question for infra, but this queue is configured to only deliver messages exactly once, right?

(What I'm concerned about here is double counting if this job runs twice, and the second receive_messages() call occurs before messages being handled by the first job are deleted.)

this queue is configured to only deliver messages exactly once, right?

from what I've seen so far that does not appear to be the case. I assume if it was the case then the delivered messages would be auto-deleted upon reception, right?

I will treat this as a non-blocking concern since the code here is experimental and read-only for now anyway. If we can change the queue to use once-only delivery then we can probably just delete the delete_message() calls here.

Maybe a question for infra, but this queue is configured to only deliver messages exactly once, right?

This is a standard queue that only guarantees at least once delivery. If we want exactly once, we'd need to recreate the queue as a FIFO queue.

src/config/cdn_log_queue.rs

This allows us to access the `config` at runtime too, instead of only having access at startup.

This allows us to use in-memory storage for testing purposes and fall back to the local file system for development.

Turbo87 added C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear A-backend ⚙️ labels Jan 31, 2024

Turbo87 requested a review from a team January 31, 2024 09:16

Turbo87 force-pushed the wip/queue-job branch from 58deb25 to 0aaebfa Compare January 31, 2024 14:09

LawnGnome approved these changes Jan 31, 2024

View reviewed changes

worker/environment: Add config field

f23b362

This allows us to access the `config` at runtime too, instead of only having access at startup.

Turbo87 force-pushed the wip/queue-job branch 2 times, most recently from 2cb6c5c to 64b0a7f Compare February 1, 2024 08:48

Turbo87 added 9 commits February 1, 2024 09:51

Implement ProcessCdnLog background job

e457746

worker/jobs/downloads: Extract CdnLogStorageConfig enum

5bec7d5

This allows us to use in-memory storage for testing purposes and fall back to the local file system for development.

worker/jobs/downloads: Extract inner run() fn

643c2c9

worker/jobs/downloads: Add tests for ProcessCdnLog

17204e7

Implement ProcessCdnLogQueue background job

f8bb485

Extract SqsQueue trait

c6f7e32

worker/jobs/downloads: Extract CdnLogQueueConfig enum

08836d0

worker/jobs/downloads: Extract inner run() fn

9a12705

worker/jobs/downloads: Add tests for ProcessCdnLogQueue

c27b101

Turbo87 force-pushed the wip/queue-job branch from 64b0a7f to c27b101 Compare February 1, 2024 08:51

Turbo87 enabled auto-merge February 1, 2024 08:52

Turbo87 merged commit 348c52b into rust-lang:main Feb 1, 2024

Turbo87 deleted the wip/queue-job branch February 1, 2024 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement `ProcessCdnLog` and `ProcessCdnLogQueue` background jobs #8036

Implement `ProcessCdnLog` and `ProcessCdnLogQueue` background jobs #8036

Uh oh!

Turbo87 commented Jan 31, 2024 •

edited

Loading

Uh oh!

codecov bot commented Jan 31, 2024 •

edited

Loading

Uh oh!

Turbo87 commented Jan 31, 2024

Uh oh!

LawnGnome left a comment

Uh oh!

Uh oh!

LawnGnome Jan 31, 2024

Uh oh!

Turbo87 Feb 1, 2024

Uh oh!

Turbo87 Feb 1, 2024 •

edited

Loading

Uh oh!

jdno Feb 1, 2024

Uh oh!

Uh oh!

Uh oh!

Implement ProcessCdnLog and ProcessCdnLogQueue background jobs #8036

Implement ProcessCdnLog and ProcessCdnLogQueue background jobs #8036

Uh oh!

Conversation

Turbo87 commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Turbo87 commented Jan 31, 2024

Uh oh!

LawnGnome left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LawnGnome Jan 31, 2024

Choose a reason for hiding this comment

Uh oh!

Turbo87 Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

Turbo87 Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdno Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Implement `ProcessCdnLog` and `ProcessCdnLogQueue` background jobs #8036

Implement `ProcessCdnLog` and `ProcessCdnLogQueue` background jobs #8036

Turbo87 commented Jan 31, 2024 •

edited

Loading

codecov bot commented Jan 31, 2024 •

edited

Loading

Turbo87 Feb 1, 2024 •

edited

Loading