Another Dependency Management Scheme

This post is based on an accumulator pattern I encountered when converting recursive functions to be tail recursive.

For a long time I have been looking for a way to cleanly handle errors that arise in functions. It’s well known among programmers that only about 20% of code does any real work, the rest is there to take care of  IO and error conditions. That’s one of the afterthoughts that Joe Morasco has in How to learn a new programming language.

It takes me a few hours to recall how to make the linked list
work and get something on the air. To allow someone else to play the
game unsupervised (which means doing some error handling) usually
takes me a few days of programming.

KDBQ lets you omit a lot of ceremony and boilerplate so you can get to work faster. Often a function that does something useful, ie real work, is a few lines. However, you still need to do the error handling somewhere and it’s best if we can keep our functions doing mostly work. If there are specific error conditions that need to be handled then by all means handle it, otherwise assume that the input you get is what you expect. (I know this sounds crazy to all those paranoid defensive programmers, but frankly you might like where this post goes.)

The only ceremony that I insist on is that every function you write takes a dictionary as the input. I find this to be pretty liberating in general as you can easily add functionality over time without breaking an API and it allows for the function to implement sensible default values while allowing customization.

What I eventually settled on was something that would allow me to create dependent functions by ordering them.

I think of them as “getters.” Each function “gets” a value using a dictionary of configurations.

A list of getters can be easily seen on a page and read. If you know what each getter is doing then you can keep reading, otherwise you can go look at the definition.

I then would create a getter dictionary, each key in the getter dictionary will eventually be assigned a value. The value is either the value that was already in the input or the value of applying the current getter to the dictionary.

To make this a bit clearer, suppose that I want to create a function that gives my pet animal a name.

I have the following dependencies:

What pet animal am I getting? Turtle, Elephant, Dog, Snake

getAnimalType:{[args] first 1?`Turtle`Elephant`Dog`Snake}

Is the animal a boy or a girl? (obviously turtles and elephants are more likely to be girls)

getBoyorGirl:{[args] 
   d:`Turtle`Elephant`Dog`Snake!0.7 0.7 0.5 0.5;
    $[first d[args[`AnimalType]]>1?1.0;`girl;`boy]}

Choose a name based on those two criteria.

getName:{[args] $[`boy~args[`BoyorGirl];`Kunee;`Kuni]} 
/(we like the name IPA transcription kʊniː)

So here is the basic idea: We combine these three separate verbs into a larger verb:

getPetAnimalName:{[args] 
  getters:()!();
  getters[`AnimalType]:getAnimalType;
  getters[`BoyorGirl]:getBoyorGirl;
  getters[`Name]:getName;
  :setIfMissing /[args;key getters; value getters]}

We can read this function by looking at what keys are being set in the getters dictionary. The keys tell you what this function plans on using or setting; the values tell you how the function will set them if the key is missing in the dictionary you pass in.

So for example if you don’t know what you want you can simply pass an empty dictionary to getPetAnimalName which will give you back something like:

q)getPetAnimalName[()!()]
AnimalType| Turtle
BoyorGirl | girl
Name | Kuni

Now suppose you are set on a particular pet animal you can pass a dictionary that has the animal preset.

q)presetAnimalDict:enlist[`AnimalType]!enlist[`Elephant]
q)getPetAnimalName[presetAnimalDict]
AnimalType| Elephant
BoyorGirl | girl
Name | Kuni

So far we have simply seen that this construct of setIfMissing allows us to push in default arguments and have default verbs that use our preset arguments. Now we can also show that we also get basic error handling for free.

Suppose that we break the function that chooses the animal type. We would expect that the function will not return boy or a girl if we pass nothing, but should work as usual if we do pass it an animal type.

/break function getAnimalType
q)getAnimalType:{[args] break;first 1?`Turtle`Elephant`Dog`Snake}
q)getPetAnimalName[()!()] /we get back as much as was possible 
Name | Kuni
q)getPetAnimalName[presetAnimalDict] /works as before
AnimalType| Elephant
BoyorGirl | girl
Name | Kuni

Mechanics of how the verb setIfMissing operates:

The premise of this verb, as its name suggests, is that it sets the key in the dictionary that was passed in if that key was missing. If that key exists, it does nothing. To set a key we call the function that was passed in the dictionary we have so far. This allows functions to depend on values from previous functions or from arguments that are passed in. If a function throws an error, it catches the error and tries to set the next key that is given to it.

I have created essentially two different versions of this verb. They follow two different philosophies.

Simple Version: with primitive error handling and logging

The first version is the most simple and just logs what keys it is setting and what keys it failed to set. Here it is, pretty short:

log:{neg[2] (string .z.Z)," : ",x,"\t",y;}
setIfMissing:{[a;k;f]  log["INFO";"setting key: ",string k];
  .[{[a;k;f]$[ k in key a;a; @[a;k;:;f[a]]]};
    (a;k;f); 
     {[a;k;e] log["ERROR";
   "Fail to set key: ",(string k)," Error: ",e]];a}[a;k]]};

Here I have fully elucidated every step:

/first we create a logging utility
.log:{
 neg[2] /this is the handle to standard error 
 (string .z.Z) /we convert the time into a string
 ," : ",x,"\t",y; / and append the type of error to the message
 };
/so for example 
/.log["INFO";"HELLO WORLD"] 
/prints 2017.09.17T09:06:14.309 : INFO HELLO WORLD

/now we define setIfMissing
setIfMissing:{[a;k;f] /a is the dictionary,k is the key, f is the function
 .log["INFO";"setting key: ",string k]; /log that we will be setting k
 .[ /protected execution will trap any errors
 {[a;k;f] /a,k,f are the same as the beginning
 $[ k in key a; /check if k is in the dictionary
 a; /if it is return a
 @[ /general apply, allow you to modify contents of a container
 a; /a is our container or dictionary in this case
 k; /k is the key or index in the container we will modify
 :; /ammend (set) is the function we will use 
 f[a] /is the value we will set, which is the verb f applied to a
 / this is what allows f to depend on any set items in the dictionary
 ] /this ends general apply 
 ] /this ends the conditional of whether k was in a
 }; / this ends the first function in the protected execution
 (a;k;f); /These are the arguments to the protected execution
 {[a;k;e] /a is still the dictionary, k is the key and e is the error
 .log["ERROR";"Fail to set key: ",(string k)," Error: ",e]; /log error
 a} /return the dictionary, and ends the trap function
 / this allows us to chain functions even if there is an error
 [a;k] /pass in the value of a and k, 
 /otherwise the error handler only sees the error
 ] /ends the protected execution
 }; /ends setIfMissing

Fancier version with introspection and profiling:

The second version of setIfMissing is likely to please those defensive programmers that I mentioned earlier.  I will not go through and elucidate this entire function. Instead I will point out the salient differences in this significantly more involved version of setIfMissing.

The first difference is that there no longer is text logging. Instead the dictionary that you pass in gets updated with all of the information about what happened. There are two tables that get added simErrors and simProfiling. So for example here is what our examples from before look like:

q)getPetAnimalName[()!()]
            | ::
simErrors   | (+(,`failed)!,`AnimalType`BoyorGirl`Name)!+(,`reason)!,("break"..
simProfiling| (+(,`simKey)!,,`initialized)!+(,`time)!,,2017.09.17T09:53:24.471
q)getPetAnimalName[presetAnimalDict]
AnimalType | `Elephant
 | ::
simErrors | (+(,`failed)!,())!+(,`reason)!,()
simProfiling | (+(,`simKey)!,`initialized`AnimalType`BoyorGirl`Name`WelcomeM..
BoyorGirl | `girl
Name | `Kuni

We can look at those tables

q)getPetAnimalName[()!()][`simErrors]
failed    | reason 
----------| ------------
AnimalType| "break" 
BoyorGirl | "AnimalType"
Name      | "BoyorGirl" 
q)getPetAnimalName[()!()][`simProfiling]
simKey     | time 
-----------| -----------------------
initialized| 2017.09.17T09:55:59.166
q)getPetAnimalName[presetAnimalDict][`simErrors]
failed| reason
------| ------
q)getPetAnimalName[presetAnimalDict][`simProfiling]
simKey     | time 
-----------| -----------------------
initialized| 2017.09.17T10:09:04.062
AnimalType | 2017.09.17T10:09:04.062
BoyorGirl  | 2017.09.17T10:09:04.062
Name       | 2017.09.17T10:09:04.062

As we can see, we now get all of the logging information from the object itself. This is very powerful because it allows us to do queries against this data later to either improve performance by using the profiling table or to see where the errors are coming from.

The second feature takes advantage of the simErrors table. Here is where we become super defensive,  instead of naively trying to run every function against every key, we can preemptively fail if we know a function depends on an earlier key that failed. This is the reason, we now no longer return a name if boyOrGirl failed. Since getName explicitly depends on boyOrGirl being set, so we can premtively bail and not even bother running the query.  Therefore, you can see that reason given will either be the error message from running the function or the keys that were required. To demonstrate this a bit further, let’s add another function to our getPetAnimalName a welcome message:

getWelcomeMessage:{[args] 
  "Welcome Home, ", string args[`AnimalType]," ", args[`Name]}
getPetAnimalName:{[args] 
 getters:()!();
 getters[`AnimalType]:getAnimalType;
 getters[`BoyorGirl]:getBoyorGirl;
 getters[`Name]:getName;
 getters[`WelcomeMessage]:getWelcomeMessage;
 :setIfMissing /[args;key getters; value getters]}

Now if we run getPetAnimalName:

q)getPetAnimalName[()!()][`simErrors]
failed        | reason 
--------------| -----------------
AnimalType    | "break" 
BoyorGirl     | "AnimalType" 
Name          | "BoyorGirl" 
WelcomeMessage| "AnimalType,Name"

As we can see all keys that were required were mentioned as the reason for the failure. In particular WelcomeMessage failed because both AnimalType and Name were missing. We can do this by inspecting the function before calling it, if any of the keys in the failed column of simErrors table appear in the function, we report them and bail.

Finally, the last feature of setIfMissing is that we can pass the dictionary by name or by value. So far in all the examples we have passed the dictionary by value and gotten a new value back with some keys set. However, if we call setIfMissing with a name of a dictionary it will return the name and modify the dictionary in place.

q)presetAnimalDict
AnimalType| Elephant
q)getPetAnimalName[`presetAnimalDict]
`presetAnimalDict
q)presetAnimalDict
AnimalType    | `Elephant
              | ::
simErrors     | (+(,`failed)!,())!+(,`reason)!,()
simProfiling  | (+(,`simKey)!,`initialized`AnimalType`BoyorGirl`Name`WelcomeM..
BoyorGirl     | `girl
Name          | `Kuni
WelcomeMessage| "Welcome Home, Elephant Kuni!"

Finally here is the code for the introspecting and more defensive setIfMissing:

setIfMissing:{[a;k;f] 
 if[not `simErrors in key a[];
 simT:``simErrors`simProfiling!
 (::;([failed:()]reason:());([simKey:(1#`initialized)]time:1#.z.Z));
 $[neg[11h]~type a;
 a set a[],simT;
 a:a,simT]
 ];
 .[{[a;k;f]
 $[ k in key a;
 @[a;`simErrors`simProfiling;
 {[x;y]y@x};
 ({[k;d]d _ k}[k];{[k;d]d upsert (k;.z.Z)}[k])]; 
 [$[any c:@[0!a[`simErrors];`failed] in `$1_'s where(s:-4!last value f) like "`*";
 'raze "," sv string @[0!a[`simErrors];`failed] where c;
 res:f[a[]]
 ];
 @[a;(`simErrors;`simProfiling;k);
 {[x;y]y@x};
 ({[k;d]d _ k}[k];{[k;d]d upsert (k;.z.Z)}[k];{y;x}[res])]]
 ]};
 (a;k;f);
 {[a;k;e]  @[a;`simErrors;upsert;(k;e)]}[a;k]]};


/#### Short and Sweet Documentation:
How to use:
/Create a list of getter function that take a dictionary as inputs
getA:{[kwargs]5};
getB:{[kwargs]42};
getC:{[kwargs]kwargs[`A]+kwargs[`B]};




/Create a meta function that composes these getters
getABC:{[kwargs]
 getters:()!();
 getters[`A]:getA;
 getters[`B]:getB;
 getters[`C]:getC;
 :kwargs:setIfMissing/[kwargs;key getters;value getters]}

/Apply the meta function to either a dictionary or a name of a dictionary
emptyD:()!()
q)getABC[emptyD]
sigil | ::
simErrors | (+(,`failed)!,())!+(,`reason)!,()
simProfiling| (+(,`simKey)!,`started`A`B`C)!+(,`time)!,2017.09.14T13:44:24.05..
A | 5
B | 42
C | 47
q)getABC[`emptyD]
`emptyD
q)emptyD
sigil | ::
simErrors | (+(,`failed)!,())!+(,`reason)!,()
simProfiling| (+(,`simKey)!,`started`A`B`C)!+(,`time)!,2017.09.14T13:45:02.09..
A | 5
B | 42
C | 47
q)notSoEmptyD:`A`B!(10;14)
q)getABC[notSoEmptyD]
sigil | ::
A | 10
B | 14
simErrors | (+(,`failed)!,())!+(,`reason)!,()
simProfiling| (+(,`simKey)!,`started`A`B`C)!+(,`time)!,2017.09.14T13:47:59.24..
C | 24




/We can look at profiling information
q)emptyD.simProfiling
simKey | time 
-------| -----------------------
started| 2017.09.14T13:45:02.099
A | 2017.09.14T13:45:02.099
B | 2017.09.14T13:45:02.099
C | 2017.09.14T13:45:02.099

/We get automatic error handling and prememptive failiure
/let's add an error into getA
getA:{[kwargs]'`broken};
emptyD:()!();
q)emptyD:getABC[emptyD]
sigil | ::
simErrors | (+(,`failed)!,`A`C)!+(,`reason)!,("broken";,"A")
simProfiling| (+(,`simKey)!,`started`B)!+(,`time)!,2017.09.14T13:51:06.419 20..
B | 42
q)emptyD.simErrors
failed| reason 
------| --------
A | "broken"
C | ,"A" 
/If the value is set the meta function will still work.
q)getABC[notSoEmptyD]
sigil | ::
A | 10
B | 14
simErrors | (+(,`failed)!,())!+(,`reason)!,()
simProfiling| (+(,`simKey)!,`started`A`B`C)!+(,`time)!,2017.09.14T13:53:46.03..
C | 24




Caveats:
 The sim prefix is used in the dictionary