Posted on

Logging JSON in Python

Python’s standard library provides the logging module making it simple to write out messages along with the current time and location in the code. Log entries are often useful for debugging, performance measuring, and statistical analysis.

The built-in formatter provided by logging.basicConfig() can be configured with a format string. Its main focus is readability for humans.

import logging

FMT = "[%(asctime)s] %(levelname)s [%(filename)s:%(lineno)d] %(message)s"
logging.basicConfig(format=FMT)
logging.getLogger().setLevel(logging.DEBUG)

logging.debug("Lorem ipsum")
logging.warning("Duis aute irure")

This code prints:

[2021-09-05 13:47:57,292] DEBUG [example.py:7] Lorem ipsum
[2021-09-05 13:47:57,292] WARNING [example.py:8] Duis aute irure

Computers have a hard time parsing such log entries correctly. Command line tools may be enough for coarse-grained parsing but are not reliable when parsing the details of large amounts of log records. JSON is a well-defined format easily parsed correctly by many tools and programming languages. I will therefore show how Python’s logging module can be used to write log records as JSON.

A simple solution for writing JSON log records would be to just set the format string in the example above to a JSON-like format. The pitfall here is that any attributes containing JSON-specific characters could corrupt the JSON object. We instead have to create a custom logging.Formatter class and let the json module properly escape all attributes:

import json, logging, sys

class JsonFormatter(logging.Formatter):
    def format(self, record):
        # Accessing `logging.LogRecord` attributes
        extra = getattr(record, "__dict__", {})
        json_record = {
            "time": extra.get("created"),
            "lvl": getattr(record, "levelname", None),
            "file": getattr(record, "filename", None),
            "line": getattr(record, "lineno", None),
            "msg": getattr(record, "msg", None),
            "additional_info": extra.get("additional_info"),
        }
        return json.dumps(json_record)

def jsonConfig(filename=None):
    # Mimmic the function signature of `logging.basicConfig()`
    # https://github.com/python/cpython/blob/3.9/Lib/logging/__init__.py#L1908
    if filename:
        h = logging.FileHandler(filename)
    else:
        h = logging.StreamHandler(sys.stdout)
    h.setFormatter(JsonFormatter())
    # Set our custom Handler on the root logger
    logging.getLogger().addHandler(h)

jsonConfig()
logging.getLogger().setLevel(logging.DEBUG)
logging.debug("Lorem ipsum")
logging.warning("Duis aute irure", extra={"additional_info": 123})

This code prints a JSON dictionary per line:

{"time": 1630842477.0521548, "lvl": "DEBUG", "file": "py-log-json.py", "line": 35, "msg": "Lorem ipsum", "additional_info": null}
{"time": 1630842477.052344, "lvl": "WARNING", "file": "py-log-json.py", "line": 36, "msg": "Duis aute irure", "additional_info": 123}

Parsing these log lines gets simpler. The JSON dictionary also makes it easy to add additional details to the log record. The logging.LogRecord objects created for each logging call offer many more runtime attributes, like threading info and function names, all accessible in the logging.Formatter-based class. There is also support for setting custom keys in the log record, as seen in the example above by setting extra={"additional_info": 123} as an argument.

The Python logging module could be used for writing other formats as well, such as CSV and binary Protocol Buffers. YAML is another popular format that may be suitable for large log records. It is a superset of JSON with additional syntax for improved readability and compactness. In our JSON example, we use a newline to separate each JSON object, which is a common approach for streaming JSON objects. YAML has the benefit of supporting object streaming already in the syntax using the --- delimiter. Each log record can therefore in YAML span multiple lines, improving the human readability of large log records.