Skip to content

Big v1 Release #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Feb 1, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
b0a7341
Initial project.
metaskills Nov 28, 2020
6107864
Send A Message!
metaskills Nov 29, 2020
fedc605
Some FIFO/Delay Logic
metaskills Nov 29, 2020
cd47e1c
Simple confirm/redrive setup.
metaskills Nov 30, 2020
7f7f29c
Use AWS SDK stubs. Client w/default options. New TODO file.
metaskills Dec 30, 2020
bba9357
Remove Job Buffer & Api Trackers. Not needed with SDK stubs.
metaskills Dec 30, 2020
f66876b
Refactor event helpers.
metaskills Dec 30, 2020
449621f
Use queue name from job. New RecordTest.
metaskills Dec 30, 2020
8aa5595
TODO
metaskills Dec 31, 2020
d195290
Debug #receive_count
metaskills Dec 31, 2020
98ae306
Test jobs in full with retries.
metaskills Dec 31, 2020
c3c9745
Use `handler` name like Lamby.
metaskills Dec 31, 2020
3347d17
Return a custom wrapped JobError with clean backtrace.
metaskills Dec 31, 2020
e4f77cc
DEBUG Delay.
metaskills Jan 1, 2021
b5c684a
Test wrapped error.
metaskills Jan 1, 2021
59811de
Initial metrics PORO.
metaskills Jan 1, 2021
5723df0
Few more job tests WRT delay and FIFO.
metaskills Jan 2, 2021
d521c55
FIFO Queues Work With Delay. Logging/Perform Helpers.
metaskills Jan 2, 2021
e70546b
CloudWatch Embedded Metrics
metaskills Jan 3, 2021
760be27
Implement Worker mixin with lambdakiq_options.
metaskills Jan 4, 2021
061fc5a
Stronger time sensitive tests.
metaskills Jan 4, 2021
2f68769
TODO Notes.
metaskills Jan 4, 2021
05dc1ef
Fix FIFO pseudo delays using visibility timeout.
metaskills Jan 4, 2021
02316bd
No backtrace for FifoDelayError
metaskills Jan 4, 2021
58f692e
Support async enqueue with concurrent-ruby.
metaskills Jan 4, 2021
340cd83
Allow metric app name to be a config.
metaskills Jan 4, 2021
b61c2f5
Tweak Metrics.
metaskills Jan 10, 2021
282b4b8
Instrument enqueue_retry and retry_stopped
metaskills Jan 11, 2021
f77587e
Docs and MessageGroupId change.
metaskills Jan 14, 2021
ad4d170
Doc images.
metaskills Jan 14, 2021
654ebfb
Todo
metaskills Jan 14, 2021
4571334
DEBUG: Can we change visibility without consumer errors?
metaskills Jan 15, 2021
0e1c4ea
Revert "DEBUG: Can we change visibility without consumer errors?"
metaskills Jan 15, 2021
78aa8ea
FINAL DOCS!
metaskills Feb 1, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,25 @@ PATH
lambdakiq (1.0.0)
activejob
aws-sdk-sqs
concurrent-ruby
railties

GEM
remote: https://rubygems.org/
specs:
actionpack (6.0.3.4)
actionview (= 6.0.3.4)
activesupport (= 6.0.3.4)
rack (~> 2.0, >= 2.0.8)
rack-test (>= 0.6.3)
rails-dom-testing (~> 2.0)
rails-html-sanitizer (~> 1.0, >= 1.2.0)
actionview (6.0.3.4)
activesupport (= 6.0.3.4)
builder (~> 3.1)
erubi (~> 1.4)
rails-dom-testing (~> 2.0)
rails-html-sanitizer (~> 1.1, >= 1.2.0)
activejob (6.0.3.4)
activesupport (= 6.0.3.4)
globalid (>= 0.3.6)
Expand All @@ -29,25 +44,54 @@ GEM
aws-sigv4 (~> 1.1)
aws-sigv4 (1.2.2)
aws-eventstream (~> 1, >= 1.0.2)
builder (3.2.4)
coderay (1.1.3)
concurrent-ruby (1.1.7)
crass (1.0.6)
erubi (1.10.0)
globalid (0.4.2)
activesupport (>= 4.2.0)
i18n (1.8.5)
concurrent-ruby (~> 1.0)
jmespath (1.4.0)
loofah (2.8.0)
crass (~> 1.0.2)
nokogiri (>= 1.5.9)
macaddr (1.7.2)
systemu (~> 2.6.5)
method_source (1.0.0)
mini_portile2 (2.4.0)
minitest (5.14.2)
minitest-focus (1.2.1)
minitest (>= 4, < 6)
mocha (1.11.2)
nokogiri (1.10.10)
mini_portile2 (~> 2.4.0)
pry (0.13.1)
coderay (~> 1.1)
method_source (~> 1.0)
rack (2.2.3)
rack-test (1.1.0)
rack (>= 1.0, < 3)
rails-dom-testing (2.0.3)
activesupport (>= 4.2.0)
nokogiri (>= 1.6)
rails-html-sanitizer (1.3.0)
loofah (~> 2.3)
railties (6.0.3.4)
actionpack (= 6.0.3.4)
activesupport (= 6.0.3.4)
method_source
rake (>= 0.8.7)
thor (>= 0.20.3, < 2.0)
rake (13.0.1)
systemu (2.6.5)
thor (1.0.1)
thread_safe (0.3.6)
tzinfo (1.2.8)
thread_safe (~> 0.1)
uuid (2.3.9)
macaddr (~> 1.0)
zeitwerk (2.4.2)

PLATFORMS
Expand All @@ -61,6 +105,7 @@ DEPENDENCIES
mocha
pry
rake
uuid

BUNDLED WITH
2.1.4
239 changes: 237 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,247 @@

![Lambdakiq: ActiveJob on SQS & Lambda](images/Lambdakiq.png)

# Lambdakiq

TODO ...
<a href="https://lamby.custominktech.com"><img src="https://user-images.githubusercontent.com/2381/59363668-89edeb80-8d03-11e9-9985-2ce14361b7e3.png" alt="Lamby: Simple Rails & AWS Lambda Integration using Rack." align="right" width="300" /></a>A drop-in replacement for [Sidekiq](https://github.com/mperham/sidekiq) when running Rails in AWS Lambda using the [Lamby](https://lamby.custominktech.com) gem.

Lambdakiq allows you to leverage AWS' managed infrastructure to the fullest extent. Gone are the days of managing pods and long polling processes. Instead AWS delivers messages directly to your Rails' job functions and scales it up and down as needed. Observability is built in using AWS CloudWatch Metrics, Dashboards, and Alarms. Learn more about [Using AWS Lambda with Amazon SQS](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html) or get started now.

## Key Features

* Distinct web & jobs Lambda functions.
* AWS fully managed polling. Event-driven.
* Maximum 12 retries. Per job configurable.
* Mirror Sidekiq's retry [backoff](https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry) timing.
* Last retry is at 11 hours 30 minutes.
* Supports ActiveJob's wait/delay. Up to 15 minutes.
* Dead messages are stored for up to 14 days.

## Project Setup

This gem assumes your Rails application is on AWS Lambda, ideally with our [Lamby](https://lamby.custominktech.com) gem. It could be using Lambda's traditional zip package type or the newer [container](https://dev.to/aws-heroes/lambda-containers-with-rails-a-perfect-match-4lgb) format. If Rails on Lambda is new to you, consider following our [quick start](https://lamby.custominktech.com/docs/quick_start) guide to get your first application up and running. From there, to use Lambdakiq, here are steps to setup your project


### Bundle & Config

Add the Lambdakiq gem to your `Gemfile`.

```ruby
gem 'lambdakiq'
```

Open `config/initializers/production.rb` and set Lambdakiq as your ActiveJob queue adapter.

```ruby
config.active_job.queue_adapter = :lambdakiq
```

Open `app/jobs/application_job.rb` and add our worker module. The queue name will be set by an environment using CloudFormation further down.

```ruby
class ApplicationJob < ActiveJob::Base
include Lambdakiq::Worker
queue_as ENV['JOBS_QUEUE_NAME']
end
```

### SQS Resources

Open up your project's SAM [`template.yaml`](https://lamby.custominktech.com/docs/anatomy#file-template-yaml) file and make the following additions and changes. First, we need to create your [SQS queues](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-sqs-queues.html) under the `Resources` section.

```yaml
JobsQueue:
Type: AWS::SQS::Queue
Properties:
ReceiveMessageWaitTimeSeconds: 10
RedrivePolicy:
deadLetterTargetArn: !GetAtt JobsDLQueue.Arn
maxReceiveCount: 13
VisibilityTimeout: 301

JobsDLQueue:
Type: AWS::SQS::Queue
Properties:
MessageRetentionPeriod: 1209600
```

In this example above we are also creating a queue to automatically handle our redrives and storage for any dead messages. We use [long polling](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-short-and-long-polling.html#sqs-long-polling) to receive messages for lower costs. In most cases your message is consumed almost immediately. Sidekiq polling is around 10s too.

The max receive count is 13 which means you get 12 retries. This is done so we can mimic Sidekiq's [automatic retry and backoff](https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry). The dead letter queue retains messages for the maximum of 14 days. This can be changed as needed. We also make no assumptions on how you want to handle dead jobs.

### Queue Name Environment Variable

We need to pass the newly created queue's name as an environment variable to your soon to be created jobs function. Since it is common for your Rails web and jobs functions to share these, we can leverage [SAM's Globals](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-specification-template-anatomy-globals.html) section.

```yaml
Globals:
Function:
Environment:
Variables:
RAILS_ENV: !Ref RailsEnv
JOBS_QUEUE_NAME: !GetAtt JobsQueue.QueueName
```

We can remove the `Environment` section from our web function and all functions in this stack will now use the globals. Here we are using an [intrinsic function](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference-getatt.html) to pass the queue's name as the `JOBS_QUEUE_NAME` environment variable.

### IAM Permissions

Both functions will need capabilities to access the SQS jobs queue. We can add or extend the [SAM Policies](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-resource-function.html#sam-function-policies) section of our `RailsLambda` web function so it (and our soon to be created jobs function) have full capabilities to this new queue.

```yaml
Policies:
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- sqs:*
Resource:
- !Sub arn:aws:sqs:${AWS::Region}:${AWS::AccountId}:${JobsQueue.QueueName}
```

Now we can duplicate our `RailsLambda` resource YAML (except for the `Events` property) to a new `JobsLambda` one. This gives us a distinct Lambda function to process jobs whose events, memory, timeout, and more can be independently tuned. However, both the `web` and `jobs` functions will use the same ECR container image!

```yaml
JobsLambda:
Type: AWS::Serverless::Function
Metadata:
DockerContext: ./.lamby/RailsLambda
Dockerfile: Dockerfile
DockerTag: jobs
Properties:
Events:
SQSJobs:
Type: SQS
Properties:
Queue: !GetAtt JobsQueue.Arn
BatchSize: 1
MemorySize: 1792
PackageType: Image
Policies:
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- sqs:*
Resource:
- !Sub arn:aws:sqs:${AWS::Region}:${AWS::AccountId}:${JobsQueue.QueueName}
Timeout: 300
```

Here are some key aspects of our `JobsLambda` resource above:

* The `Events` property uses the [SQS Type](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-property-function-sqs.html).
* Our [BatchSize](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-property-function-sqs.html#sam-function-sqs-batchsize) is set to one so we can handle retrys more easily without worrying about idempotency in larger batches.
* The `Metadata`'s Docker properties must be the same as our web function except for the `DockerTag`. This is needed for the image to be shared. This works around a known [SAM issue](https://github.com/aws/aws-sam-cli/issues/2466) vs using the `ImageConfig` property.
* The jobs function `Timeout` must be lower than the `JobsQueue`'s `VisibilityTimeout` property. When the batch size is one, the queue's visibility is generally one second more.

🎉 Deploy your application and have fun with ActiveJob on SQS & Lambda.

## Configuration

Most general Lambdakiq configuration options are exposed via the Rails standard configuration method.

### Rails Configs

```ruby
config.lambdakiq
```

* `max_retries=` - Retries for all jobs. Default is the Lambdakiq maximum of `12`.
* `metrics_namespace=` - The CloudWatch Embedded Metrics namespace. Default is `Lambdakiq`.
* `metrics_logger=` - Set to the Rails logger which is STDOUT via Lamby/Lambda.

### ActiveJob Configs

You can also set configuration options on a per job basis using the `lambdakiq_options` method.

```ruby
class OrderProcessorJob < ApplicationJob
lambdakiq_options retry: 2
end
```

* `retry` - Overrides the default Lambdakiq `max_retries` for this one job.

## Concurrency & Limits

AWS SQS is highly scalable with [few limits](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-quotas.html). As your jobs in SQS increases so should your concurrent functions to process that work. However, as this article, ["Why isn't my Lambda function with an Amazon SQS event source scaling optimally?"](https://aws.amazon.com/premiumsupport/knowledge-center/lambda-sqs-scaling/) describes it is possible that errors will effect your concurrency.

To help keep your queue and workers scalable, reduce the errors raised by your jobs. You an also reduce the retry count.

## Observability with CloudWatch

Get ready to gain way more insights into your ActiveJobs using AWS' [CloudWatch](https://aws.amazon.com/cloudwatch/) service. Every AWS service, including SQS & Lambda, publishes detailed [CloudWatch Metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html). This gem leverages [CloudWatch Embedded Metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format.html) to add detailed ActiveJob metrics to that system. You can mix and match these data points to build your own [CloudWatch Dashboards](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html). If needed, any combination can be used to trigger [CloudWatch Alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html). Much like Sumo Logic, you can search & query for data using [CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html).

![CloudWatch Dashboard](https://user-images.githubusercontent.com/2381/106465990-be7a6200-6468-11eb-8461-93db0046cda5.png)

Metrics are published under the `Lambdakiq` namespace. This is configurable using `config.lambdakiq.metrics_namespace` but should not be needed since all metrics are published using these three dimensions which allow you to easily segment metrics/dashboards to a specific application.

### Metric Dimensions

* `AppName` - This is the name of your Rails application. Ex: `MyApp`
* `JobEvent` - Name of the ActiveSupport Notification. Ex: `*.active_job`.
* `JobName` - The class name of the ActiveSupport job. Ex: `NotificationJob`

### ActiveJob Event Names
For reference, here are the `JobEvent` names published by ActiveSupport. A few of these are instrumented by Lambdakiq since we use custom retry logic like Sidekiq. These event/metrics are found in the Rails application CloudWatch logs because they publish/enqueue jobs.

* `enqueue.active_job`
* `enqueue_at.active_job`

While these event/metrics can be found in the jobs function's log.

* `perform_start.active_job`
* `perform.active_job`
* `enqueue_retry.active_job`
* `retry_stopped.active_job`

### Metric Properties

These are the properties published with each metric. Remember, properties can not be used as metric data in charts but can be searched using [CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html).

* `JobId` - ActiveJob Unique ID. Ex: `9f3b6977-6afc-4769-aed6-bab1ad9a0df5`
* `QueueName` - SQS Queue Name. Ex: `myapp-JobsQueue-14F18LG6XFUW5.fifo`
* `MessageId` - SQS Message ID. Ex: `5653246d-dc5e-4c95-9583-b6b83ec78602`
* `ExceptionName` - Class name of error raised. Present in perform and retry events.
* `EnqueuedAt` - When ActiveJob enqueued the message. Ex: `2021-01-14T01:43:38Z`
* `Executions` - The number of current executions. Counts from `1` and up.
* `JobArg#{n}` - Enumerated serialized arguments.

### Metric Data

And finally, here are the metrics which each dimension can chart using [CloudWatch Metrics & Dashboards](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html).

* `Duration` - Of the job event in milliseconds.
* `Count` - Of the event.
* `ExceptionCount` - Of the event. Useful with `ExceptionName`.

### CloudWatch Dashboard Examples

Please share how you are using CloudWatch to monitor and/or alert on your ActiveJobs with Lambdakiq!

💬 https://github.com/customink/lambdakiq/discussions/3


## Common Questions

**Are Scheduled Jobs Supported?** - No. If you need a scheduled job please use the [SAM Schedule](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-property-function-schedule.html) event source which invokes your function with an [Eventbridege AWS::Events::Rule](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-events-rule.html).

**Are FIFO Queues Supported?** - Yes. When you create your [AWS::SQS::Queue](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-sqs-queues.html) resources you can set the [FifoQueue](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-sqs-queues.html#aws-sqs-queue-fifoqueue) property to `true`. Remember that both your jobs queue and the redrive queue must be the same. When using FIFO we:

* Simulate `delay_seconds` for ActiveJob's wait by using visibility timeouts under the hood. We still cap it to non-FIFO's 15 minutes.
* Set both the messages `message_group_id` and `message_deduplication_id` to the unique job id provided by ActiveJob.

**Can I Use Multiple Queues?** - Yes. Nothing is stopping you from creating any number of queues and/or functions to process them. Your subclasses can use ActiveJob's `queue_as` method as needed. This is an easy way to handle job priorities too.

```ruby
# TODO ...
class SomeLowPriorityJob < ApplicationJob
queue_as ENV['BULK_QUEUE_NAME']
end
```

**What Is The Max Message Size?** - 256KB. ActiveJob messages should be small however since Rails uses the [GlobalID](https://github.com/rails/globalid) gem to avoid marshaling large data structures to jobs.

## Contributing

After checking out the repo, run:
Expand Down
2 changes: 1 addition & 1 deletion Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ require "rake/testtask"
Rake::TestTask.new(:test) do |t|
t.libs << "test"
t.libs << "lib"
t.test_files = FileList["test/**/*_test.rb"]
t.test_files = FileList["test/cases/**/*_test.rb"]
t.verbose = false
t.warning = false
end
Expand Down
14 changes: 14 additions & 0 deletions bin/_console
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/env ruby

require "bundler/setup"
require "lambdakiq"

# You can add fixtures and/or initialization code here to make experimenting
# with your gem easier. You can also use a different console, if you like.

# (If you use this, don't forget to add pry to your Gemfile!)
# require "pry"
# Pry.start

require "irb"
IRB.start(__FILE__)
18 changes: 5 additions & 13 deletions bin/console
Original file line number Diff line number Diff line change
@@ -1,14 +1,6 @@
#!/usr/bin/env ruby
#!/bin/bash
set -e

require "bundler/setup"
require "lambdakiq"

# You can add fixtures and/or initialization code here to make experimenting
# with your gem easier. You can also use a different console, if you like.

# (If you use this, don't forget to add pry to your Gemfile!)
# require "pry"
# Pry.start

require "irb"
IRB.start(__FILE__)
docker-compose run \
lambdakiqgem \
./bin/_console
Binary file added images/Lambdakiq.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Lambdakiq.sketch
Binary file not shown.
Loading