Merging Dicts of Dicts

Image: "Merge" by Taylor Riché , CC BY-NC

2022-Jan-13 - 4 minutes read - 845 words

I am not a coder, but i do dabble with Python from time to time. I know enough Python to be dangerous but not enough to be clever. I know there are Pythonic ways of doing things, but i’m not just there yet. But since the tagline of this blog is Learning in public, i might as well do so, with the hope that somebody will tell me how to do this properly. This, namely working with – and merging – multiple dictionaries.

I’m working on a script to look at machine inventory data from multiple sources and munge¹ these together. The ultimate goal for me is to figure out which of these machines i should have in AppleCare Enterprise (ACE) and which not; which ones i should buy an ACE license for, which ones should come for free, and whether there are machines that should be grandfathered in at zero cost but have been billed for. After juggling multiple spreadsheets until my brain was sore, i decided to throw some Python at it.

Internally, we keep our gear in a bunch of fairly structured text files. Each record, which contains a laptop, a phone, a monitor, a tablet, etc, is a line of key=value; pairs. I say fairly structured, because they really are curated by hand and only sometimes checked with code. Yes, we’re shopping for a management system and i’ll probably get to that in a much later post. Files sent from Apple are in Excel format. I have some from our account manager and i get some from Apple Business Manager and the ACE interface. And any day now, i’ll probably incorporate export files from our MDM (for fun and profit).

Each file type read ends up in a Python dictionary, where the key is the serial number (which i must remember to ensure is in UPPER CASE in, ahem, case it’s not) and the value is another dictionary containing all key=value pairs. Ideally, the files read would have 1:1 mappings of all serial numbers, but life isn’t perfect and there are some unique snowflakes that only exist in one type of data source but not the other. This is actually also useful information and i hope to be able to write a tool that can help me get to it. Anyway, this information needs to be consolidated or in Python parlance, these directories need to be merged. Luckily, there’s a Python routine for that. Several, actually. Sadly, most of them don’t work. At least not here.

The two most delightful dictionary update methods (sorry if i’m misusing the word here) are

merged_dictionary = some_dictionary.update(another_dictionary)
# ...and...
merged_dictionary = some_dictionary | another_dictionary

# or if you just want to update a dictionary and aren't militantly idempotent
some_dictionary |= another_dictionary

This is really useful for regular dictionaries:

some_dictionary = {
  'Neil': 'drums',
  'Alex': 'guitar'
}

another_dictionary = {
  'Fox': 'agent',
  'Alex': 'double agent'
}

some_dictionary | another_dictionary
# Yields: {'Neil': 'drums', 'Alex': 'double agent', 'Fox': 'agent'}

Once your dictionary values (‘drums’, ‘double agent’, etc) themselves are dictionaries, merging them will just overwrite all kinds of valuable stuff:

inventory = {
  'C02F2400001': { 'user': 'Robin Laurén', 'asset_tag': 'L1234' },
  'C03VX300002': { 'user': 'Alex Lifeson', 'asset_tag': 'L2112' }
}

applecare = {
 'C02F2400001': { 'end_date': '2022-04-01', 'agreement': '3141592' },
 'X0T4VS00004': { 'end_date': '2023-12-11', 'agreement': '3141592' }
}

inventory | applecare
# Yields:
# {'C02F2400001': {'end_date': '2022-04-01', 'agreement': '3141592'},
#  'C03VX300002': {'user': 'Alex Lifeson', 'asset_tag': 'L2112'},
#  'X0T4VS00004': {'end_date': '2023-12-11', 'agreement': '3141592'}}

Clearly not what we want.

Now here is where i see that i must be stupid but that i simply lack the required pythonité. Help is appreciated.

To deep-merge these dictionaries-of-dictionaries, i took the following, trivial approach:

def merge_dods(d1, d2):
    for key in d2:
        d1[key] = d1.get(key, {}) | d2.get(key)
    return d1

merge_dods(inventory, applecare)
# Yields:
# {
#  'C02F2400001': {'user': 'Robin Laurén', 'asset_tag': 'L1234', 'end_date': '2022-04-01', 'agreement': '3141592'},
#  'C03VX300002': {'user': 'Alex Lifeson', 'asset_tag': 'L2112'},
#  'X0T4VS00004': {'end_date': '2023-12-11', 'agreement': '3141592'}
# }

The magic is running through all keys of d2, updating each value in d1 for which there exists a key in d2 with the dictionary gained from merging d1[key] and d2[key]. If a given key doesn’t exist for d1, the routine merges an empty dictionary with that gained from d2[key], which thankfully results in just the value d2[key] and not some strange concoction of a null directory and the value of d2[key].

I probably need to write that last paragraph again one more time, since reading it twice after writing it half a dozen times still didn’t make it as consise as the code. But i’m sure there’s a more elegant way which doesn’t include looping through all keys of either dictionary to do the merge.

If you’re trying to merge dictionaries of dictionaries of dictionaries, maybe you need to untangle your thinking. The alternative will be messy, recursive, or both.

So there you are. I really tried to look for a solution on the interwebs, but i guess i’m not just that good at googling.

cf, meaning 3 ↩︎

« Safe Christmas Now i need images »