In the Python world, there's a saying: "Flat is better than nested."
Maybe times have changed or maybe that adage just applies more to code than data. In spite of the warning, nested data continues to grow, from document stores to RPC systems to structured logs to plain ol' JSON web services.
After all, if "flat" was the be-all-end-all, why would namespaces be one honking great idea? Nobody likes artificial flatness, nobody wants to call a function with 40 arguments.
Nested data is tricky though. Reaching into deeply structured data can get you some ugly errors. Consider this simple line:
value = target.a['b']['c']
That single line can result in at least four different exceptions, each less helpful than the last:
AttributeError: 'TargetType' object has no attribute 'a'
TypeError: 'NoneType' object has no attribute '__getitem__'
TypeError: list indices must be integers, not str
Clearly, we need our tools to catch up to our nested data.
Many nested data tools simply perform deep gets and searches, stopping short after solving the problem posed above. Realizing that access almost always precedes assignment, glom takes the paradigm further, enabling total declarative transformation
of the data.
By way of introduction, let's start off with space-age access, the classic "deep-get":
We can react to changing data requirements as fast as the data itself can change, naturally restructuring our results, despite the input's nested nature. Like a list comprehension, but for nested data, our code mirrors our output.
Most other implementations are limited to a particular data format or pure model, be it jmespath or XPath/XSLT. glom
makes no such sacrifices of practicality, harnessing the full power of Python itself.
Going back to our example, let's say we wanted to get an aggregate moon count:
With glom, you have full access to Python at any given moment. Pass values to functions, whether built-in, imported, or defined inline with lambda. But glom doesn't stop there.
Now we get to one of my favorite features by far. Leaning into Python's power, we unlock the following syntax:
from glom import T
spec = T['system']['planets'][-1].values()
# ['jupiter', 69]
What just happened?
T stands for target, and it acts as your data's stunt
double. T records every key you get, every attribute you access, every index you index, and every method you call. And out comes a spec that's usable like any other.
No more worrying if an attribute is None or a key isn't set. Take that leap with T. T never raises an exception, so worst case you get a meaningful error message when you run glom() on it.
And if you're ok with the data not being there, just set a default:
Tools like jq provide a lot of value on the console, but leave a dubious path forward for further integration. glom's full-featured command-line interface is only a stepping stone to using it more extensively
inside application logic.
Piping hot JSON into glom with a cool Python literal spec, with pretty-printed JSON out. A great way to process and filter API calls, and explore some data. Something genuinely enjoyable, because you know you won't be stuck in this pipe
Everything on the command line ports directly into production-grade Python, complete with better error handling and limitless integration possibilities.
Never before glom have I put a piece of code into production so quickly.
Within two weeks of the first commit, glom has paid its weight in gold, with glom specs replacing Django Rest Framework code 2x to 5x their size, making the codebase faster and more readable. Meanwhile,
glom's core is so tight that we're on pace to have more docs and tests than code very soon.
The glom() function is stable, along with the rest of the API, unless otherwise specified.
A lot of other features are baking or in the works. For now, we'll be focusing on the following growth areas: