Description
Use case
As part of the implementation of the log buffering feature for Logger we need to implement logic that allows the Logger to add log entries to a buffer and later retrieve them. This buffer logic should be able to hold one or more buffers of logs with each buffer being tied to a request.
For example, assuming the following code:
const logger = new Logger({
buffering: {
maxBytes: 1024
}
});
export const handler = (event: { userId: string }) => {
logger.debug('hello', { userId: event.userId });
}
The following two requests:
_X_AMZN_TRACE_ID=abcd
; event{ userId: 'Alice' }
_X_AMZN_TRACE_ID=efgh
; event{ userId: 'Bob' }
Should allow the Logger class to create two buffers, one for each request, in which to store the respective log of each request. Logs of different requests should never end up in the same buffer:
['abcd', [{ userId: 'Alice' }]]
['efgh', [{ userId: 'Bob' }]]
For this part of the implementation we will focus on the data structures that will allow us to create and hold these buffers, how the logs are added to the buffers and what identifies requests are points that will be tackled in separate issues.
For a full overview of the feature, please refer to this RFC (#3410) and specifically to this comment for the final spec.
Solution/User Experience
This should be a standalone module under packages/logger/src/
in which we'll add the logic needed to create a circular buffer.
To keep the buffer decoupled from the rest, consumers of this buffer should only: 1/ set a max size in bytes, 2/ optionally provide an onBufferOverflow()
callback function that is called by the buffer when the buffer was full, and 3/ be able to set, get, delete items at a specific key.
Since we need to create one buffer for each AWS Lambda invocation, my current line of thinking is to use a Set
to hold the logs of a request (aka the buffer), and a Map
to hold these buffers. Since these two data structures don't provide a way to keep track of size in bytes nor an eviction mechanism, I was thinking we could extend them rather than starting from scratch.
The first of the two pieces is the buffer itself, below is some pseudo code of how I think it could work:
class SizedItem<V> {
public value: V;
public logLevel: number;
public bytesSize: number;
public constructor(value: V, logLevel: number) {
this.value = value;
this.logLevel = logLevel;
this.bytesSize = (...) // TODO: actually calculate the size in bytes
}
}
class SizedSet<V> extends Set<SizedItem<V>> {
public currentBytesSize: number;
public constructor() {
super();
this.currentBytesSize = 0;
}
public add(item: SizedItem<V>): this {
this.currentBytesSize += item.byteSize;
super.add(item);
return this;
}
public delete(item: SizedItem<V>): boolean {
const byteSize = item.byteSize;
const isDeleted = super.delete(item);
if (isDeleted) {
this.currentBytesSize -= byteSize;
}
return isDeleted;
}
public clear(): void {
super.clear();
this.currentBytesSize = 0;
}
public shift(): V | undefined {
const firstElement = this.values().next().value;
if (!firstElement) {
return undefined;
}
this.delete(firstElement);
return firstElement;
}
}
The key features of this extended set are:
- it starts with a size of
0
and every time an item is added, the item size is added to the tally - when an item is deleted, its size in bytes is subtracted from the total
- we provide a custom method called
shift
(name borrowed from the analogous array method) that gets the first element in the set and deletes it
I also added a custom data structure called SizedItem
, its main purpose is to centralize the logic to calculate the size of an item, and reuse it as much as possible rather than recalculate it.
The second part is the map of buffers that holds all the request buffers, again below is some pseudo code:
class CircularMap<V> extends Map<
string,
SizedSet<V>
> {
#maxBytesSize: number;
#onBufferOverflow?: () => void;
public constructor({
maxBytesSize,
onBufferOverflow,
}: {
maxBytesSize: number;
onBufferOverflow?: () => void;
}) {
super();
this.#maxBytesSize = maxBytesSize;
this.#onBufferOverflow = onBufferOverflow;
}
public set(key: string, value: V, logLevel: number): this {
const item = new SizedItem<V>(value, logLevel);
if (item.size > this.#maxBytesSize) {
throw Error('Item too big');
}
const buffer = this.get(key) || new SizedSet();
if (buffer.currentBytesSize === 0) {
super.set(key, buffer.add(item));
return this;
}
if (buffer.currentBytesSize + item.size >= this.#maxBytesSize) {
this.#deleteFromBufferUntilSizeIsLessThanMax(buffer, item);
// inform the callback that the buffer was full and an element was removed
this.#onBufferOverflow && this.#onBufferOverflow();
}
super.set(key, buffer.add(item));
return this;
}
#deleteFromBufferUntilSizeIsLessThanMax = (
buffer: SizedSet<V>,
item: SizedItem<V>,
) => {
while (buffer.currentBytesSize + item.size >= this.#maxBytesSize) {
buffer.shift();
}
};
}
The key features of this second data structure are:
- it accepts a
maxBytesSize
parameter that represents the max size a buffer can reach - it's responsible for orchestrating the addition and removal of items based on the max size
- it calls the optional callback whenever a buffer reaches the max size, but it's not concerned with what the callback does
- when attempting to add an item that is larger than the buffer max size, it throws an error
I have not tested these implementations at all, so they might be broken or I might be missing something, so whoever implements this please feel free to deviate from it. If you do, I'd still be curious to discuss the reasoning and benefits, but other than that it's fine.
Also, I imagine that calculating the size of items might require different treatments depending on the type of the item stored. Ideally we should be able to store objects (i.e. { message: 'hello world', age: 42 }
), but if that complicates things too much we can also just assume everything is a string and move on (i.e. '{"message":"hello world","age"42}'
). The decision for this topic is left to the implementer, but I'd say let's prioritize completion over versatility at this stage, since we already know we'll be storing logs - the question is just whether we're storing them serialized or not.
In terms of unit tests for these new data structures, you can use these as reference for an idea of which cases we should handle.
Alternative solutions
Acknowledgment
- This feature request meets Powertools for AWS Lambda (TypeScript) Tenets
- Should this be considered in other Powertools for AWS Lambda languages? i.e. Python, Java, and .NET
Future readers
Please react with 👍 and your use case to help us understand customer demand.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status