Leveraging Lambda Powertools Logger for querying in Athena or other AWS services

**Runtime:** 
Python

**Is your feature request related to a problem? Please describe**
IHAC who was looking into the benefits of the logger formatting but was having a hard time getting Athena to ingest the logs from Cloudwatch and create a schema out of them and was wondering if there was any consideration on the logger format to be  more friendly to data ingestion by other services outside of Cloudwatch. For example as it is now this is how its outputting to cloudwatch with the powertools logger on:

2021-05-10T15:26:57.772Z {"level":"INFO","location":"checkOrderInput:50","message":"Checking required fields ['contact', 'address', 'id', 'expedite', 'installDate', 'addressValidationOverride', 'product', 'asset']","timestamp":"2021-05-10 15:26:57,771+0000","service":"new-order","xray_trace_id":"1-609950be-18655ee7e321f53ab8b4f629"}

with this format you could extract two columns 
1. timestamp, 2. the entire message as a json struc column

or use Grok serDer in athena with regex to try to grab a pattern, in either case you would still need to run this through some type of data processing no different than if you used the built in python logger with custom configurations but slightly more challenging due to the JSON structure. 

**Describe the solution you'd like**

Either for Powertools Logger to just output eachline as a JSON object (removing the timestamp at the beginning of the log so Athena/Glue can just use the built in JsonSerDer to parse it and create columns or some documentation or examples on how to leverage this logger formatting in queries and creating Metric Filters of them. 
**Describe alternatives you've considered**

Using a messy DDL statement in Athena using Grok SerDer
CREATE EXTERNAL TABLE `ugi`(
  `loglevel` string COMMENT 'from deserializer', 
  `timestamp` string COMMENT 'from deserializer', 
  `service` string COMMENT 'from deserializer', 
  `traceid` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'com.amazonaws.glue.serde.GrokSerDe' 
WITH SERDEPROPERTIES ( 
  'input.format'='(?<loglevel>\"level\":\"(.{4,10})),([^.]+)(?<timestamp>\"timestamp\":\"(.{10,28}))\"([^.]+)(?<service>\"service\":\"(.{3,10}))\",([^.]+)(?<traceid>xray_trace_id\":\"(.{30,40}))\"') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://ugi/'
TBLPROPERTIES (
  'CrawlerSchemaDeserializerVersion'='1.0', 
  'CrawlerSchemaSerializerVersion'='1.0', 
  'UPDATED_BY_CRAWLER'='powertoollogs', 
  'averageRecordSize'='191', 
  'classification'='powertoollogs', 
  'compressionType'='none', 
  'grokPattern'='(?<loglevel>\"level\":\"(.{4,10})),([^.]+)(?<timestamp>\"timestamp\":\"(.{10,28}))\"([^.]+)(?<service>\"service\":\"(.{3,10}))\",([^.]+)(?<traceid>xray_trace_id\":\"(.{30,40}))\"', 
  'objectCount'='1', 
  'recordCount'='1', 
  'sizeKey'='191', 
  'typeOfData'='file')

Just using the built-in python logger and use the traditional CW-Firehose-S3 architecture for streaming the logs into an S3 bucket in Athena to avoid parsing the JSON structure. 
**If you provide guidance, is this something you'd like to contribute?**
I am not the best developer but sure! I could help

**Additional context**

Providing some examples of how others have leverage the powertools logger or any use cases where this logger has made operational tasks easier would be very valuable and easier to sell to customers. Right now outside of the nice structured uniform formatting it creates in Cloudwatch logs I do not see another benefit of then using this data efficiently

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Leveraging Lambda Powertools Logger for querying in Athena or other AWS services #460

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Leveraging Lambda Powertools Logger for querying in Athena or other AWS services #460

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions