Evolutes

November 19, 2018 / pindash91 / Leave a comment

This next article was inspired by two sources:

An Advent of Code puzzle from 2017

And a playing with J article:

An evolute is a matrix whose cells are numbered to spiral out from the center.

17  16  15  14  13
18   5   4   3  12
19   6   1   2  11
20   7   8   9  10
21  22  23---> ...

The first question asks us to find the Manhattan distance between some positive integer cell and the 1st cell.

I see two ways to do this:

Build the evolute, find the location of the integer and then calculate the distance
Calculate the coordinate with respect to the center directly and take the sum of the absolute value of the coordinates.

I will start with the second method because it is simpler. If we look at the structure of the evolute we will see that each new layer of the matrix will have an odd square in it’s bottom right corner. 1 9 25 49 ….

That means we can locate the layer by finding the nearest odd square root. Here is the code for that:

f:{j:floor sqrt x; j - not j mod 2}
q)f 10
3
q)f 25
5
q)f 24
3

Next we can find the grid coordinate or how far we are from the center for that corner. I plugged in max x so we can use the function on lists of numbers. The ‘?’ verb is being used to find index which corresponds to the steps away from the center.

g:{(1+2*til max x)?f x} 
q)g 10
1
q)g 25
2
q)g 24
1

Now all we need to do is figure out where we are on the layer.

We can be in one of 5 places:

Exactly at the end of a layer
Between the bottom right corner and the top right corner
Between the top right corner and the top left corner
Between the top left corner and the bottom left corner
Between the bottom left corner and the end of the layer

We can divide by the size of the layer to calculate this. We can also find the remainder to know how far between we are. This gives us the following code:

place:{(x-j*j) div 1+j:f x}
offset:{(x-j*j) mod 1+j:f x}
q)place 1+til 25
0 0 1 1 2 2 3 3 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0
q)offset 1+til 25
0 1 0 1 0 1 0 1 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0

This allows us to write the full function to calculate the coordinate. All we need to do is to check which of those conditions are met fill in the coordinates from g, plus the appropriate offset and correct sign.

coordinate:{[x]
    x:(),x;
    f:{j:floor sqrt x; j - not j mod 2};
    g:{(1+2til max x)?y};     place:{(x-yy) div 1+y};
    offset:{(x-yy) mod 1+y};      j:f x;     brc:x=jj;
     grid:g[x;j];
     p:place[x;j];
     o:offset[x;j];
    ?[brc;flip (grid;neg[grid]);
     ?[0=p;flip (grid+1;neg[grid]+o);
      ?[1=p;flip (grid+1-o;grid+1);
       ?[2=p;flip (neg[grid+1];1+grid-o);
        flip (neg[grid+1]+o;neg[1+grid])]]]]
    }
q)coordinate 1+til 9
0  0 
1  1 
1  1 
0  1 
-1 1 
-1 0 
-1 -1
0  -1
1  -1

At this point we can find the coordinate of any cell number. Since we can do this for all the points, we should be able to create the evolute for any dimension by generating all the coordinates and then shifting them to the appropriate column/row number and inserting them into that matrix. We can do this cleverly by sorting the row and column separately, since we want a particular orientation and sorting each of them corresponds to either a horizontal transposition or a vertical transposition.

evol:{t:`row`col!(x div 2)+flip cordinate j:1+til x*x; (x;x)#exec val from `col xdesc `row xasc flip update val:j from t} 
q)evol 7
37 36 35 34 33 32 31
38 17 16 15 14 13 30
39 18 5  4  3  12 29
40 19 6  1  2  11 28
41 20 7  8  9  10 27
42 21 22 23 24 25 26
43 44 45 46 47 48 49

Now that we can create a evolute and we see that given an evolute we can find the coordinates, we also see that permuting the row and column indexes gives us different orientations. This leads us to see how Joey Tuttle/Eugene McDonnell created an evolute from scratch. Let’s raze an evolute:

q)raze evol 5
17 16 15 14 13 18 5 4 3 12 19 6 1 2 11 20 7 8 9 10 21 22 23 24 25

Now let us permute it by the magnitude and then look at the differences:

q)iasc raze evol 5
12 13 8 7 6 11 16 17 18 19 14 9 4 3 2 1 0 5 10 15 20 21 22 23 24
q)deltas iasc raze evol 5
12 1 -5 -1 -1 5 5 1 1 1 -5 -5 -5 -1 -1 -1 -1 5 5 5 5 1 1 1 1
q)deltas iasc raze evol 3
4 1 -3 -1 -1 3 3 1 1

We now see a pretty clear pattern. We are repeating a pattern of 1,neg x,-1,x.
Each time we are taking 2 elements from the pattern. We take an increasing number of them and continue until we take x elements in shot. Here is Eugene’s explanation translated into q

f:{-1+2*x}
g:{(-1;x;1;neg x)}
k:{(f x)#g x}
h:{1+til x}
j:{-1 _ raze 2#'h x}
l:{-1 rotate raze (j x)#'k x}
m:{iasc sums l x}
n:{(x;x)#m x}
q)f 5
9  
q)g 5
- -1 5 1 -5  
q)k 5
-1 5 1 -5 -1 5 1 -5 -1  
q)h 5
1 2 3 4 5  
q)j 5
1 1 2 2 3 3 4 4 5
q)l 5
-1 -1 5 1 1 -5 -5 -1 -1 -1 5 5 5 1 1 1 1 -5 -5 -5 -5 -1 -1 -1 -1  
q)m 5
24 23 22 21 20 9 8 7 6 19 10 1 0 5 18 11 2 3 4 17 12 13 14 15 16  
q)n 5
24 23 22 21 20
9  8  7  6  19
10 1  0  5  18
11 2  3  4  17
12 13 14 15 16

Combining this into function we get:

evolute:{(x;x)#iasc sums -1 rotate raze (-1 _ raze 2#'1+til x)#'(-1+2*x)#(1;neg x;-1;x)}

We can shorten it just a bit and make it a bit faster by seeing that grabbing parts j,k,l can make use of how kdb overloads ‘where’. In KDB ‘where’ gives you the indexes of the 1s in a boolean mask. For example:
q)where 101b
0 2
However, a little thought reveals that we are returning the index of the element the number at that index. So we return one 0, zero 1,one 2. Generalizing this, we can think that where of 1 2 3 should return one 0,two 1s and three 2s and indeed KDB does this.
q) where 1 2 3
0 1 1 2 2 2
Knowing this, we can rewrite j k and l, which make heavy use of each and remove all the razing.
I am going to show how I built this up.

q){((x-1)#2),1} 5
2 2 2 2 1
q){1+where ((x-1)#2),1} 5 
1 1 2 2 3 3 4 4 5
q){(where 1+(where ((x-1)#2),1))} 5
0 1 2 2 3 3 4 4 4 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 8
q){(where 1+(where ((x-1)#2),1)) mod 4} 5 /so that we cycle
0 1 2 2 3 3 0 0 0 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 0
q){(1;neg x;-1;x)(where 1+(where ((x-1)#2),1)) mod 4} 5 
1 -5 -1 -1 5 5 1 1 1 -5 -5 -5 -1 -1 -1 -1 5 5 5 5 1 1 1 1 1
q){-1 rotate (1;neg x;-1;x)(where 1+(where ((x-1)#2),1)) mod 4} 5
1 1 -5 -1 -1 5 5 1 1 1 -5 -5 -5 -1 -1 -1 -1 5 5 5 5 1 1 1 1

The last step recreates l. So the whole function using where looks like this:

evolute:{(x;x)#iasc sums -1 rotate (1;neg x;-1;x)(where 1+(where ((x-1)#2),1)) mod 4}

Part 2 of the advent of code is even more interesting. It describes us filling the spiral by summing the neighbors, the center square starts at 1. Here is the description from advent.

Then, in the same allocation order as shown above, they store the sum of the values in all adjacent squares, including diagonals.
So, the first few squares' values are chosen as follows:
Square 1 starts with the value 1.
Square 2 has only one adjacent filled square (with value 1), so it also stores 1.
Square 3 has both of the above squares as neighbors and stores the sum of their values, 2.
Square 4 has all three of the aforementioned squares as neighbors and stores the sum of their values, 4.
Square 5 only has the first and fourth squares as neighbors, so it gets the value 5.
Once a square is written, its value does not change. Therefore, the first few squares would receive the following values:

147  142  133  122   59
304    5    4    2   57
330   10    1    1   54
351   11   23   25   26
362  747  806--->   ...


What is the first value written that is larger than your puzzle input?

Now given our earlier coordinate function, we just need to add elements in into empty matrix of 0s and sum the neighbors at each step to determine what to add. First, let’s write the simple pieces of finding the neighbors and summing them. Then we will conquer the composition.

/To find the neighbors according to our rule, 
/ we basically find the coordinates for (1..9)
/ then we can add to x pair so it's centered around that one 
neighbors:{x+/:coordinate 1+til 9}
q)neighbors (1;0)
0  1
1  1
1  2
0  2
-1 2
-1 1
-1 0
0  0
1  0
/We need to shift our coordinate system depending on the size of the matrix 
shift:{i:div[count x;2]; (i+y)}
q)shift[3 3#0;coordinate 5]
0 2
/calculate the sum of the neighbors 
sumN:{sum over .[x] each neighbors y}
q)(1+evolute 5)
17 16 15 14 13
18 5  4  3  12
19 6  1  2  11
20 7  8  9  10
21 22 23 24 25
q)(1+evolute 5)[2;3]
2
/Manually sum the neghbors of 2
q)sum 4 3 12 1 2 11 8 9 10
60
q)sumN[1+evolute 5;(2;3)]
60

Okay, we are going to start with a matrix that has single 1 in the center.

center1:{.[(x;x)#0;(a;a:x div 2);:;1]} /we are ammending a 1 at the x div 2
q)center1  5
0 0 0 0 0
0 0 0 0 0
0 0 1 0 0
0 0 0 0 0
0 0 0 0 0

Now that we see how easy it is to amend an element at a particular index, we can combine this with the notion of a fold(over). The idea is that we will keep updating this matrix with new elements based on the neighbor sum:

cumEvolute:{{.[x;y;:;sumN[x;y]]}/[j;shift[j:center1[x]] coordinate 1+til (x*x)]}
q)cumEvolute 5
362 351 330 304 147
747 11  10  5   142
806 23  1   4   133
880 25  1   2   122
931 26  54  57  59 
/If we want to rotate it we can simply reverse flip
q)reverse flip cumEvolute 5
147 142 133 122 59 
304 5   4   2   57 
330 10  1   1   54 
351 11  23  25  26 
362 747 806 880 931

Using cumEvolute we can find when we generate the first integer bigger than the input, by creating a stopping condition and counting the number of iterations.

stopCum:{{$[z<max over x;x;.[x;y;:;sumN[x;y]]]}/[j;shift[j:center1[x]] coordinate 1+til (x*x);y]}
q)stopCum[5;20]
0 0  0  0 0
0 11 10 5 0
0 23 1  4 0
0 0  1  2 0
0 0  0  0 0
q)max over stopCum[5;20]
23
q)max over stopCum[10;100]
122

Controlling the Tower

November 4, 2018November 4, 2018 / pindash91 / Leave a comment

I recently came across this problem and it looked interesting enough, so I solved it. It’s from Topcoder, which is nice because they provide test cases so you can be sure that you have solved it.

Unfortunately, Topcoder has dracanion copyright laws, so I will paraphrase the problem in case the link to the description dies. You want to control some territory, the territory has t (an integer) towers and each tower is a certain number of stories s (an integer). To control the territory you must control the majority of the active towers. A tower is active if someone controls the majority of stories in a tower. If a tower has no majority controller, it is inactive and does not count. What is the minimum number of stories you must win to guarantee control of the territory?

The first thing that helped me was converting the test cases into the following two lists and created a table:

inputs:((47);(9;9);(1;1;1;1;1;1);(2;2;2;2;2;2);(1;2;3;4;5;6;7;8;9);(1;2;3;4;5;6;7;8;9;10);(1;1;100;100;100;100;200;200);(2);(999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999;999);(2;3;9);(1);(3);(4);(5);(6);(1;1);(1;2);(2;2);(1;3);(1;4);(2;3);(2;4);(3;3);(100;100);(1;1;1);(1;2;1);(2;2;1);(2;2;2);(10;100;900);(9;99;999);(2;8;4);(2;2;2;2);(2;2;2;2;4);(784;451;424;422;224;478;509;380;174;797;340;513;842;233;527;342;654;816;637);(342;574;20;606;413;479;51;630;802;257;16;925;938;951;764;393;766;414;764;208;959;222;85;526;104;848;53;638;969;214;806;318;461;950;768;200;972;746;171;463);(877;587;587;325;528;5;500;714;548;688;243;5;508;783;561;659;297;44;70;131;329;621;374;268;3;113;782;295;902);(159;797;323;629;647;941;75;226);(528;259;717;920;633;957;712;966;216;812;362;146;472;462;140;743;672;512;232;16;186;721;171;831;641;446;183;483;198;55;610;489;884;396;344;499;368;618;812;211;771;449;289);(608;405;630;751;288;836))
results:(24;14;4;7;36;45;699;2;37451;12;1;2;3;3;4;2;3;3;4;5;4;5;5;150;2;3;4;4;955;1053;11;5;8;7806;18330;10838;3404;18074;2866)
t:([]inputs;results)

q)select results, inputs from asc t / just to give a visual sample

results inputs
———————————–
1 1
2 2
2 3
3 4
3 5
4 6
24 47
2 1 1
2 1 1 1
4 1 1 1 1 1 1
699 1 1 100 100 100 100 200 200

That way, I could write a function and start testing and seeing if I was making progress.

If there is only one tower, then I obviously need to take it, which means I need to take the majority.

So that is pretty easy:

f{[x] if[count[x]=1; :sum 1+floor x%2]; :0} /one tower,
/ otherwise return 0 this is wrong but we want to return a number

I can run my function and see which cases I still haven’t solved:

q) select myResult, results, inputs from (update myResult:f each inputs from t ) where myResult<>results

This gave me a way to check if I was making progress.

Next, we know that we need to win the majority of the towers and we need to win them in the most expensive way possible in order to guarantee a victory because we don’t get to chose how our stories our distributed. So we sort the towers desc by height and fill up half of them. Then comes the tricky part, the opposition can have a simple majority in all the other towers. In that case, there will be one final tower that decides everything.

For example, suppose there are 5 towers, 6 6 5 5 4. We need to win three towers. So we win the first 2 by getting 12 stories. Then we only need one more tower but the opposition can strategically win the shortest towers. It’s important to note that winning a 4 story tower and a 5 story tower takes 3 either way. So they can do better by winning the two 5 story towers and simply stop me from winning the 4 story tower by taking 2 stories.

This explains the following slightly convoluted algorithm:

f:{
if[count[x]=1;:sum 1+floor x%2]; /one tower
t:floor count[x]%2; / majority of towers, one less if there are an odd number
v:sum t#x:desc x; /most expensive way to win t towers
r:reverse (t-count[x])#x; /remaining towers
o:sum 1+floor %[;2] t#r; /opposition needs to take at least t towers
o+:sum floor 0.5+%[;2] d _ r; /oppositions needs to prevent the rest
o-:(any(d _ r)mod 2)*(any not(d#r)mod 2) ;
/subtract if opposition wins during preventing
/ and you can strategically exchange an odd tower with an even one
v+1+sum[r]-o
};

If anyone has a simpler, more straightforward approach, or wants to tell me how my approach is wrong, I would love to hear about it.

Commutative Business Days

October 19, 2018 / pindash91 / Leave a comment

The concept of business days is pretty straightforward. For most people, business days are the 5 days a week we commute to work.

Questions like:

What is the next business day?
What was the previous business day?
List the next 5 days.

are all pretty straightforward and I expect that the average high school graduate can answer it with the aid of a calendar (assuming it includes holidays).

For some reason, the implementations I’ve seen have been pretty hideous. So here is, in my opinion, an elegant solution that also demonstrates some neat features of KDB/Q.

The first step is to get all the business days in the year. (Depending on your region/industry you might have different days. I’ll use USA business days M-F with 10 government holidays (see US court business days).

/1 calendar year
year:2018.01.01+til 365
holidays:2018.01.01 2018.01.15 2018.02.19 2018.05.28 2018.07.04 2018.09.03 2018.10.08 2018.11.12 2018.11.22 2018.12.25
yearNoWeekend:year where in[;2 3 4 5 6] year mod 7
buisnessYear:yearNoWeekend except holidays
/
Once you have the buisnessYear for the relevant period
—
we need to be aware that the calendar will not work correctly for data we have not loaded
—
We can create a data structure that will allow us to answer all these types of questions.
The data structure we will be using is a sorted dictionary.
We will have a dictionary, nDay (next day) and a pDay function (previous day).
We can look up any date in our dictionary/function to find the next or previous date.
We use a sorted dictionary so that when we look up non buisness days,
q will return the value of the closest next/previous date.
\
/create our dictionary and function,
/fill the last day in the next day list with the first day from 2019
nDay:`s#buisnessYear!`s#2019.01.02^next buisnessYear
pDay:{[nDay;d]p:nDay?d; ?[null p;?[nDay] nDay d;p]}[nDay]

/let’s find the next day July 3rd
nDay 2018.07.03
/as expected returns 2018.07.05
nDay 2018.07.04
/as expected also returns 2018.07.05

/let’s find the previous day
pDay 2018.07.05
/as expected returns 2018.07.03
pDay 2018.07.04
/as expected returns 2018.07.03

/let’s find 5th day after a given date
nDay/[5;2018.07.01]
/as expected we get 2018.07.09
/lets find 5th day previous to a given date
pDay/[5;2018.07.01]
/as expected we get 2018.06.25

/We can ask to get the next 5 days
nDay\[5;2018.07.01]
/we get the given day and next 5 days
/2018.07.01 2018.07.02 2018.07.03 2018.07.05 2018.07.06 2018.07.09
/We can ask to get the previous 5 days
pDay\[5;2018.07.01]
/we get the given day and previous 5 days in reverse order
/2018.07.01 2018.06.29 2018.06.28 2018.06.27 2018.06.26 2018.06.25

/We can ask for the next days for 5 dates
nDay 2018.01.14 2018.02.18 2018.05.27 2018.09.02 2018.10.07
/ we get 2018.01.16 2018.02.20 2018.05.29 2018.09.04 2018.10.09

/We can ask for the previous days for 5 dates
pDay 2018.01.16 2018.02.20 2018.05.29 2018.09.04 2018.10.09
/we get 2018.01.12 2018.02.16 2018.05.25 2018.08.31 2018.10.05

Bonus Section, and the real reason for the title of the essay:

Here is a final piece that might strike you as a bit weird until you untangle it.

In General:

0b=(pDay nDay d)~(nDay pDay d)

In other words, these functions don’t commute. This seems strange since nDay is simply addition by one and pDay is simply subtraction by one.

This was used by the mathematician Peano to show how you can get all the natural numbers. Simply start with 1 and keep adding 1. If you want whole numbers start with 0. If you want integers create the concept of subtraction and you can count your way to negative infinity.

This is called Peano arithmetic and it commutes. You can always get back to where you started by simply adding or subtracting one appropriately. The order in which you do so does not matter.

However, business days exhibit a certain property that prevents them from commuting. Namely, although any calendar day has a next business day, not all days are business days. Which means that non-business days are not represented intuitively. For example, the next business day of Saturday is Monday and previous business day of Monday is Friday, but the previous business day of Saturday is Friday and so the next business day of Friday is Monday. Depending on whether we started with pDay or nDay we get different results.

The missing weekend from the business days illustrates the reason why, in general, floating arithmetic does not commute. There are numbers that cannot be represented by floating point arithmetic which means that sometimes the order of additions and subtractions will make a difference to the final result.

Backtracking In Q/ word ladder

July 15, 2018July 16, 2018 / pindash91 / 1 Comment

This is a variation on the word ladder puzzle. Here is the puzzle:

You are presented with a set of strings (assume they are unique). Each string can be of arbitrary length. Your goal is to find the length of the longest chain.
A chain can start from any string. Each following link in the chain must be exactly one letter shorter than the previous link, any character from any position from the previous link can be dropped, this new link must appear in the set of words given.

For example, if you are given the set of strings (“a”,”ab”,”abc”,”abdc”, “babdc”,”dd”, “ded”)

Both
“ded” -> “dd”
“babdc” -> “abdc” -> “abc” -> “ab” -> “a”

are valid chains. The longest chain’s length is 5.

First I will present python code that my friend wrote, it is very concise and has a kdb flavor (at least in my mind).

Python Code:

def word_list(array):
  # Storing the solutions for each word in a dictionary
  dict_array = {word:0 for word in array}

  # Sorting the word list by lengths
  array.sort(key = lambda s: len(s))

  # Finding solutions to each word by using the previous solutions
  for word in array:
    options = [0]
    for i in range(len(word)):
      word_minus_i = "".join([word[k] for k in range(len(word)) if k != i])
      #check if the word appears in list of words
      if word_minus_i in dict_array:
        # If it does add a link to that word
        options.append(dict_array[word_minus_i]+1)
    dict_array[word] = max(options)
  return max(dict_array.values())+1

Now to a KDB version:

I had two versions of the code both rely on the same insight as the python code

The difference is in the post processing.

The key insight into this problem is that if you hash all the strings. Then if you create every possible one letter drop you can check if that string is in the legal set of strings, if it is you can record which longer string it maps to.

At that point you have a list of every connection between two strings. Since checking the hash takes constant time and generating every 1 letter missing string takes 1- number of letters in the string* number of unique strings. This processing takes linear time.

At that point, we have a graph, and at first I wanted to reuse the code from the previous post. I would generate all the connected components, the use those nodes to figure out how many different length strings there were. The connected component with the largest number of distinct length strings is going to be the longest chain and the number of distinct lengths is the total chain length. (this code is [3-6]x faster for small inputs (10,000, 100,000 words) than the python but is about the same speed once we get up to a set of 1 million words)

The second version, uses a quicker technique to just calculate the depth of the graph from every node.

Let’s start from the first version:
Let’s generate 10,000 strings (.Q.a is “abcdefghijklmnopqrstuvwxyz”)
We want strings of length between 1 and 20 (1+til 20)
We want ten thousand strings, so we take 10000 of the list 1 2 3 .. 20
We then sample the number from the alphabet (3?.Q.a) gives us a 3 letter string
We do this for each right (\:) number in the list
We only want the distinct words

q)words:distinct words:(10000#1+til 20)?\:.Q.a

Here we are sampling 10 of these words:

q)10?words
“uqoakdinpzdsw”
“vzahb”
“hhotqkijpkyso”
“adbtny”
“iotrggqgxvkqmnxqaxp”
“rcgnpgotlzastbv”
“idptfuivvtz”

So we need a function that will generate the indexes of all the possible 1 letter drops of a word.
The most intuitive way I know to do this, is to create equal length boolean string with one 0 and rotate that 0 around the whole string. So for example in the 3 letter word case we have:
011b
101b
110b
We then find the indexes where there is a 1 and that would gives all the 1 missing indexes.

/0b,(x-1)#1b creates a string of length x with 1 0 value
/rotate\: says to rotate this string for each right argument (which is just the range from 0 to x)
/where finds the indexes that are 1
allDrops:{where each (til x) rotate\: 0b,(x-1)#1b}

Lets do a little processing on the list of strings, and put them into a table:

/sort the words by count of the number of letters and only store the distinct words in hwords

q)hwords:distinct words iasc count each words

/create a table, with the distinct words, the number of letters in the word, and a symbolic version of the word for fast lookup, with a unique attribute so that KDB knows to hash this list

q)show t:([]wsym:`u#`$hwords;w:hwords;c:count each hwords);
wsym w c
———–
s ,”s” 1
t ,”t” 1
j ,”j” 1
n ,”n” 1

/wsym contains a symbolic version of the word
/w is just the original word
/c is the length of the word

We then want to populate a dictionary with all the indexes, so we select the distinct counts of the words and run them through the allDrops function

q)drops:n!allDrops each n:exec distinct c from t /we select the distinct counts run the function and key the distinct counts to the result from the function.
q)drops
1 | ,`long$()
2 | (,1;,0)
3 | (1 2;0 1;0 2)
4 | (1 2 3;0 1 2;0 1 3;0 2 3)
5 | (1 2 3 4;0 1 2 3;0 1 2 4;0 1 3 4;0 2 3 4)
…

We do this mostly because we know that we will need these indexes many times, but there are relatively few distinct lengths of words.

Now we calculate all the links:

/drops c will give us a list of lists of all the 1 letter missing possible indexes for each word.
/To show what this does let’s select by c so that we can see what it does for each c
select d:drops first c by c from t
c | d ..
–| ————————————————————————-..
1 | ,`long$() ..
2 | (,1;,0) ..
3 | (1 2;0 1;0 2) ..
4 | (1 2 3;0 1 2;0 1 3;0 2 3)

Because q will apply a particular list on indexes and preserve the shape, we just need to tell it, that each word should be projected onto all of these lists. This is done with the @’ this will pair off each list of lists with the words. Here is what that looks like , again grouping by c so that we can see what that looks like:

q)select by c from update d:(w)@’drops c from t
c | wsym w d ..
–| ————————————————————————-..
1 | x ,”x” ,”” ..
2 | cw “cw” (,”w”;,”c”) ..
3 | fvw “fvw” (“vw”;”fv”;”fw”) ..
4 | yoku “yoku” (“oku”;”yok”;”you”;”yku”)

We see that one letter words, become empty strings, two letter words become a list of words each 1 letter long. three letter strings become a list of 2 letter words.
Now we just need to find where these words are in our original table. If they are not there they will get the 1+last index, which is how q lets you know that a value is missing in a lookup. We will also cast all these strings to symbols so that we can do faster lookups (`$):

/col is the index of the original string and row are all the matches. We are modeling the graph using an adjacency matrix idea.
q)show sparseRes:update col:i, row:t[`wsym]?`$(w@’drops c) from t
wsym w c col row
———————-
h ,”h” 1 0 864593
r ,”r” 1 1 864593
f ,”f” 1 2 864593
w ,”w” 1 3 864593
/We see that no 1 letter string has any matches which is why all of the indexes are last in the table, this makes sense since one letter strings can’t match anything but the empty string and all of our strings have at least one character
/Again selecting by c so we can see a variety of lengths,
q)select by c from update col:i, row:t[`wsym]?`$(w@’drops c) from t
c | wsym w col row ..
–| ————————————————————————-..
1 | x ,”x” 25 ,864593 ..
2 | cw “cw” 701 3 11 ..
3 | fvw “fvw” 17269 541 275 357 ..
4 | yoku “yoku” 64705 864593 7279 2368 3524 ..
5 | vvbud “vvbud” 114598 864593 46501 864593

As we can see the row column is a list of matches and also contains false matches that go to the end of the table we would like to clean that up. We can do that using ungroup.
Ungroup will take a keyed table, and generate a row for each item in the list of records.
In our case it will create a col, row record for each item in the row column.

q)show sparseRes:ungroup `col xkey sparseRes
col wsym w c row
——————-
0 h h 1 864593
1 r r 1 864593
2 f f 1 864593

Now we want to delete all of the fake links, so that will be wherever the row is the length of the original table:

q) show sparseRes:delete from sparseRes where row=count t
col wsym w c row
—————-
26 bj b 2 2
26 bj j 2 10
27 oj o 2 2
27 oj j 2 12

At this point we are done processing the data. We have found all the connections between the words, all that is left is to find the connected components, which we know how to do from the previous post. I use a slightly different function that calculates the connected components on a sparse matrix. That is represented as a table with columns row and col.

findConnectedSparse:{[j;m]
neighbors: exec col from m where row in j; /now we are searching a table instead of a matrix
f:{n:exec col from y where row in .[_;x]; /new neighbors
x[0]:count x[1];x[1]:distinct x[1],n; /update the two pieces of x
x}[;m]; /project this function on the sparseMatrix
last f over (0; neighbors)};

allComponentsSparse:{[m]
points:til max m[`row]; /
connected:();
while[count points;
i:first points;
connected,:enlist n:`s#n:distinct asc i,findConnectedSparse[i;m]; /add point in case it is island
points:points except n];
connected}

Given these two functions that work on sparse adjacency matrixes. The final step is:

q)max {count select distinct c from x} each t allComponentsSparse sparseRes

Let’s unpack this a bit:

sparseRes is our table of links.

allComponentsSparse will return all of the connected components in this graph.

`s#0 28 42 56 61 94 117 133 149 157 170 175 181 187 195 220 241 260 288 323 3..
`s#1 62 64 77 79 98 115 126 131 144 154 162 166 173 189 191 208 214 258 267 2..
`s#2 29 40 66 91 108 112 114 125 135 154 167 168 184 211 226 245 267 281 291 ..
`s#3 69 77 94 102 109 110 119 129 159 183 192 196 212 214 216 220 224 247 259..
…
apply the table t to this will give us a table that is indexed on the connected components

+`wsym`w`c!(`q`aq`oq`qw`wq`jq`tq`yq`qv`nq`xq`qa`zq`qn`qc`qj`qf`qd`bq`qb`lq`dq..
+`wsym`w`c!(`b`fb`bg`jb`zb`hb`cb`ba`bc`be`bm`vb`kb`xb`bp`bu`by`bj`ub`mb`pb`bn..
+`wsym`w`c!(`m`fm`mz`mh`mx`dm`xm`md`me`mc`bm`nm`mm`cm`pm`mi`ml`mb`ms`rm`jm`ym..
+`wsym`w`c!(`j`jj`jb`jq`jt`jg`jd`ji`ej`jy`uj`jz`gj`jv`bj`rj`qj`je`jn`xj`lj`hj..

So for each connected component we are getting a table that has only those strings that match.
let’s look at the first one:

q)first t allComponentsSparse sparseRes
wsym w c
———–
q ,”q” 1
aq “aq” 2
oq “oq” 2
qw “qw” 2
wq “wq” 2
jq “jq” 2

We can see immediately a problem, we only need one 2 letter word but we are getting every possible match. Since we are only interested in the length we can simply count the number of distinct c.

q)count select distinct c from first t allComponentsSparse sparseRes
4

So we see that the first connected component has 4 links in the chain.

Doing this for each connected component and then selecting the max ensures that we have found the length of the longest chain.

q)max {count select distinct c from x} each t allComponentsSparse sparseRes
5

That completes the first method.

Here is the whole function together

longestChain:{[words]
hwords:distinct words iasc count each words;
t:([]wsym:`u#`$hwords;w:hwords;c:count each hwords);
allDrops:{where each (til x) rotate\: 0b,(x-1)#1b};
drops:n!allDrops'[n:1+til max t[`c]];
sparseRes:ungroup `col xkey update col:i, row:t[`wsym]?`$(w@’drops c) from t;
sparseRes:delete from sparseRes where row=count t;
max {count select distinct c from x} each t allComponentsSparse sparseRes}

findConnectedSparse:{[j;m]
neighbors: exec col from m where row in j;
f:{n:exec col from y where row in .[_;x]; /new neighbors
x[0]:count x[1];x[1]:distinct x[1],n; /update the two pieces of x
x}[;m]; /project this function on the sparse matrix
last f over (0; neighbors)};

allComponentsSparse:{[m]
points:til max m[`row];
connected:();
while[count points;
i:first points;
connected,:enlist n:`s#n:distinct asc i,findConnectedSparse[i;m]; /add point in case it is island
points:points except n];
connected}

Now for method 2, instead of relying on the previous algorithm to find all the connected components, we just want to keep track of how deep each chain is.

We can do this by creating a new table q which only has columns, row and col and is grouped by row. Since we created the original links by going from columns to rows, we know that the links travel up from shorter strings(row) to longer ones(col)

q)show q:select col by row from sparseRes
row| col ..
—| ————————————————————————..
0 | 28 42 56 61 94 117 133 149 157 170 175 181 187 195 220 241 260 288 323 3..
1 | 62 64 77 79 98 115 126 131 144 154 162 166 173 189 191 208 214 258 267 2..
2 | 29 40 66 91 108 112 114 125 135 154 167 168 168 184 211 226 245 267 281 ..
3 | 69 69 77 94 102 109 110 119 129 159 183 192 196 212 214 216 220 224 247 ..
4 | 38 48 60 74 86 92 95 115 123 131 132 135 145 184 195 197 213 238 246 249..
5 | 43 49 119 124 137 142 143 180 219 226 231 239 246 277 289 303 308 319 32..
6 | 28 31 72 73 88 100 104 126 141 145 161 175 190 198 235 240 248 253 289 3..

Now since this table is keyed on the row column, we can index into the table using a index table that has the row numbers and keep doing this until there are no more rows to return. The number of times we can index before we return nothing, is the length of the chain from that row.

First lets demonstrate indexing:

q)indexTable:([]row:0 1)
row
—
0
1
q)q ([]row:0 1)
col
——————————————————————————-
28 42 56 61 94 117 133 149 157 170 175 181 187 195 220 241 260 288 323 342 36..
62 64 77 79 98 115 126 131 144 154 162 166 173 189 191 208 214 258 267 272 27..

We get the two rows that match from q.

If we flatten this list and rename this column row. We can reindex into q,

q)q select row:raze col from q ([]row:0 1)
col
——————-
452 549 634
,493
391 391
587 786 803
395 493 759
524 629
704 834
613 823

We see the next level of neighbors, we can keep doing this. Again we can take advantage of over and just count how long before we return an empty table.

We will do this by creating a function that takes a dictionary and updates it, that way we can store both the intermediate results and the number of times we went down and all the nodes we visited.

the keys will be cur, which is the list of nodes we wish to visit next, depth, which is how deep we gone so far, and visited a list of all the nodes we saw. Then we can write a function dive which will dive one level down into the table.

dive:{ $[count x[`cur]:select row:raze col from y x[`cur];
/index the table on the current neighbors we are exploring.
/We set cur to be the next level of neighbors
/if there are any new neighbors:
/we will add 1 to depth, and add the current new neighbors to visited and return x
[x[`depth]+:1;x[`visited],:exec row from x[`cur];
:x]
/otherwise we will return x
;:x]}[;q]

We then create a table of all the nodes we would like to visit with the initialized keys:

q)show n:update visited:count[n]#() from n:([]depth:0;cur:exec row from q);
depth cur visited
—————–
0 0
0 1
0 2
0 3
0 4

lets’ run dive on one the first of n:

q)first n
depth | 0
cur | 0
visited| ()
q)dive over first n
depth | 4
cur | +(,`row)!,()
visited| 28 42 56 61 94 117 133 149 157 170 175 181 187 195 220 241 260 288 3..

We see that we get back a dictionary of all the nodes visited as well as, the maximum depth achieved. If we do this for all n, we can select the max depth and we are done.

q)exec max depth from (dive/)each n}
5

Notice, we are storing all the visited nodes, so we can reconstruct what the path actually was, but we are not actually using it. So the entire second version looks like this:

longestChainFast:{[words]
hwords:distinct words iasc count each words;
t:([]wsym:`u#`$hwords;w:hwords;c:count each hwords);
allDrops:{where each (til x) rotate\: 0b,(x-1)#1b};
drops:n!allDrops'[n:1+til max t[`c]];
sparseRes:ungroup `col xkey update col:i, row:t[`wsym]?`$(w@’drops c) from t;
sparseRes:delete from sparseRes where row=count t;
q:select col by row from sparseRes;
dive:{ $[count x[`cur]:select row:raze col from y x[`cur];
[x[`depth]+:1;x[`visited],:exec row from x[`cur];:x]
;:x]}[;q];
n:update visited:count[n]#() from n:([]depth:0;cur:exec row from q);
exec max depth from (dive/)each n}

On my computer, I was able to find the length of the longest chain for a list of million words in under 15 seconds, using this second version and in a minute using the first version. If I was using less than one hundred thousand words, then the two versions were about the same.

The python code took about a minute for a million words, (without using anything fancy, I’m sure it can be improved by replacing pieces with numpy components or using the latest version of python 3.7. where dictionary look ups are faster).

The real benefit here from KDB, is that reconstructing the path is super straightforward since we have the path indexes and can use them to index into the word table.

Graph Algo, Finding Friends

July 10, 2018July 12, 2018 / pindash91 / 1 Comment

This is a rather standard interview question:

Suppose that you have points and the points are connected by edges. Imagine that the points are numbered, then we can represent whether there exists a connection between two points by writing all the pairs of points.

(1,2)
(2,3)
(5,1)
(4,6)
We can also represent this graph, as matrix, where 1 means that there exists a connection between i and j and 0 otherwise. We call this matrix the adjacency matrix since it tells us which points are connected. adjacencymatrix_1002

Given such a matrix, we want to find all the connected components. A connected component is either just one node if it has no connections or all the nodes that can be reached by traveling along the edges.

In the picture above, each of these graphs has only one connected component, but together there are three distinct connected components, since there is no way to reach from the first graph to the second by traveling along an edge.

First I will show my original solution in KDB. Then a slight improvement.
First note that the matrix must be symmetric, since if point 1 is connected to point 2, then point 2 is connected to point 1.

I wanted to write a function that would get all the points that were connected to a single point.

If I think of a particular row in this matrix, then all the 1s represent the neighbors of this point. If I then find all of their neighbors, and so on, I will have all the points that can possibly be reached.

First imagine I have a matrix, a:
To create it, I first create 100 zeros (100#0)
Then I pick six places to add connections, (6?100)
I replace the zeros with 1s.
I add this matrix to itself flipped, so that it is symmetric (a+flip a)
Divide by two so that any place that 1s overlapped are either 1 or .5, (.5*)
Then I cast it to an integer so that they are all 1s or 0s (`long$)

q)a:`long$.5*a+flip a:10 10#@[100#0;6?100;:;1]
q)a
0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 1 0
1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0
0 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 1 0

Okay given a, I can find the neighbors of any of the 10 points.

q)a[1]
0 0 0 0 0 1 1 0 0 0
q)where a[1]
5 6

So we get a list of points that are neighbors.

We need to create a function that will keep asking for neighbors and discard any duplicates.

We can find all the neighbors of the neighbors by selecting the neighbor rows of matrix and then asking where for each row:

q)where each a 5 6
1
1

We get a list of neighbors.

We can then raze this list to flatten it:

q)raze where each a 5 6
1 1

Then we can ask for the distinct items:

q)distinct raze where each a 5 6
,1

Now we would like to add this list to our original neighbors and check if we can find more neighbors:

q)distinct raze where each a 5 6 1
1 5 6

Running this again we get:

q)distinct raze where each 1 5 6
5 6 1

So there are no more neighbors to find, just the order changes and we will flip back and forth.

Anytime, we need to apply a function like this we can make use of the adverb in q called over. Over will apply the function until the results are the same for two successive iterations.

So my function to find all connected with comments looks like:

findAllConnected:{[i;m]
neighbors: where m[i]; /find the initial list of neighbors
f:{distinct raze x,where each y x}[;m]; /create the function where second argument is fixed to be the adjacency matrix.
f over neighbors} /run the function over the initial neighbors and keep adding neighbors until there are no more to add.

/testing on a:
q)findAllConnected[1;a]
5 6 1

As expected this gives us all points connected to point 1, including point 1 itself.
To list all of the connected components we need to run this on each of the points:

q)findAllConnected[;a] each til count a
3 0
5 6 1
8 2 9
0 3
7 4
1 5 6
1 5 6
4 7
2 9 8
8 2 9

We get 10 rows specifying the connections. Now we need find the unique set.

The easiest way to do this is to sort each one and ask for the distinct lists:

q)distinct asc each findAllConnected[;a] each til count a
`s#0 3
`s#1 5 6
`s#2 8 9
`s#4 7

Which gives us 4 connected components.

We can look for the islands, by checking if there are any elements not on any of these lists. If so, they must unconnected to anything else.

Items not in the list are islands (only one node no connections)

q)connected:distinct asc each findAllConnected[;a] each til count a
q)islands:(til count a) except raze connected
`long$()

q)count each connected /size of the components

We get an empty list, which means there are no islands to report.

Problem solved.

For those who have followed, you can see that we are actually doing a lot of duplicate work, since we keep asking for neighbors for points that we have already seen. To solve this we simply keep track of how many neighbors we have found so far and only query for the neighbors we have yet to see.

Instead of x being a list of neighbors, it is now a list with two items. The first is the number of neighbors found so far and the second is the list of neighbors.

We can use dot(.) application of drop(_) to remove the neighbors we have seen so far from the neighbors list. (.[_;x]):

findAllConnectedFast:{[i;m]

neighbors: where m[i];

f:{n:raze where each y .[_;x]; /new neighbors

x[0]:count x[1];x[1]:distinct x[1],n; /update the two pieces of x

x}[;m]; /project this function on the matrix

last f over (0; neighbors)} /apply the function on the neighbors, with 0 currently visited

/running on a:

q)findAllConnectedFast[1;a]
5 6 1

/large sparse matrix to test with

q)c:7h$.5*c+flip c:1000 1000#@[1000000#0;1000?1000000;:;1]

/Time both on my 2012 macbook
q)\t distinct asc each findAllConnected[;c] each til count c
11960
q)\t distinct asc each findAllConnectedFast[;c] each til count c
1170

A 10x improvement, the keen observer will notice that we don’t have to run the function on all the points in the first place. We can go down the points and skip all the points that we have already seen. Here is what that looks like:

allComponents:{[m]

points:til count m;

connected:();

while[count points;

i:first points;

connected,:enlist n:`s#n:distinct asc i,findAllConnectedFast[i;m]; /add point in case it is island

points:points except n];

connected}

q)allComponents a
`s#0 3
`s#1 5 6
`s#2 8 9
`s#4 7

q)\t allComponents c
6

This results in a 1000x increase speed up. Because we are only calling the function the minimum number of times.

Stable Sorting

June 1, 2018July 11, 2018 / pindash91 / Leave a comment

Stable Sorting preserves the order of a column even after sorting on a second column.

KDB provides stable sorting by default. This is a great thing, and it allows you to easily implement grouped attributes on a table.

A table that looks like:

q)t:([]time:09:30:00+`time$til 9;sym:9#`AAPL`MSFT`IBM;px:100*9?1.0)
q)t
time sym px
————————–
09:30:00.000 AAPL 81.12026
09:30:00.001 MSFT 20.86614
09:30:00.002 IBM 99.07116
09:30:00.003 AAPL 57.94801
09:30:00.004 MSFT 90.29713
09:30:00.005 IBM 20.11578
09:30:00.006 AAPL 6.832366
09:30:00.007 MSFT 59.89167
09:30:00.008 IBM 4.881728

which is sorted by time and sort it by symbol so that all the symbols that are the same are together, while still preserving the time attribute.

q)`sym xasc t
time sym px
————————–
09:30:00.000 AAPL 81.12026
09:30:00.003 AAPL 57.94801
09:30:00.006 AAPL 6.832366
09:30:00.002 IBM 99.07116
09:30:00.005 IBM 20.11578
09:30:00.008 IBM 4.881728
09:30:00.001 MSFT 20.86614
09:30:00.004 MSFT 90.29713
09:30:00.007 MSFT 59.89167

While this seems pretty cool, I want to argue that this behavior is not only good, it is also simpler than the alternative.

In many languages, perl,java,python there is something called a comparator interface. Which usually asks the user to define a function

F(a,b) => if a comes before b;
1
if a comes after b:
-1
if they are equal
0

The search then uses some algo to swap the items.

However, we can simplify this interface. We can call it swapporator, it only allows two options, swap, True or False. The items are presented in the order of the original list to the swap function. This will automatically be stable.

As of Join Performance a surprising result

April 26, 2018 / pindash91 / Leave a comment

The Kx wiki describes in detail how to structure as of join queries for best performance.

https://code.kx.com/wiki/Reference/aj

3. There is no need to select on quote, i.e. irrespective of the number of quote records, use:
aj[`sym`time;select .. from trade where ..;quote]
instead of
aj[`sym`time;select .. from trade where ..;
             select .. from quote where ..]

The reason for this is that since the quote table is partitioned on date and grouped on sym the as of join function simply scans the sectors of the disk in linear order and grabs the first matches using binary search.

Linear access to disk is very different from random access. So when you perform a search linearly on partitioned historical database it runs pretty fast.

However, someone asked me if you were going to query over and over, the same data whether then it made sense to do some pre-filtering on the quote table before doing multiple ajs.

At the time, I said it shouldn’t be faster. I thought that the overhead of reading from disk afresh each time was going to be much smaller than the overhead of allocating a very large amount of room in memory.

So I tested it:

taj1:{[t;d] now:.z.T;
do[10;aj[`sym`time;select from t where date=d;select from quote where date=d]]; after:.z.T;
after-now}

taj2:{[t;d]
now:.z.T;
quotecache:select from quote where date=d, sym in exec sym from t; do[10;aj[`sym`time;select from t where date=d;quotecache]];
after:.z.T;
after-now}

The timings, were not even close the taj2, which precaches data when run 10 times, for a trade table with only 1000 records on only 100 symbols took more than 5 minutes to run. While taj1 was taking only 5 seconds, that is approximately 2 aj per second.

When I scaled the number of symbols to 1000, the cached version didn’t comeback and I had to cancel the query after 25 minutes. The non cached version took 10 seconds or 1 aj per second.

The intuition is correct, if you can prefilter, then not having to read in all the data twice should be faster. So I checked the meta of the cached table, it was loosing the p attribute while filtering.

I then created version taj3:

taj3:{[t;d]
now:.z.T;
quotecache:update `p#sym from select from quote where date=d, sym in exec sym from t;
do[10;aj[`sym`time;select from t where date=d;quotecache]];
after:.z.T;
after-now}

If you reapply the p attribute the first query for 1000 symbols costs you 3.6 seconds, but increasing the number of times you run the aj is almost free. So running 10 times only costs 4 seconds. Running 20 times the taj3 also took 4 seconds. So if you know you have a limited universe than reducing and pre-filtering is worth it if you will be running the queries again and again, JUST REMEMBER TO REAPPLY THE P ATTRIBUTE. Since the quote table is only filtered, the p attribute will be a really cheap operation, since order is preserved.

Q Idioms assemembered

March 1, 2018October 11, 2024 / pindash91 / Leave a comment

This is more of an expanding list of Q idioms I have had to either assemble or remember or some combination.

Cross product of two lists is faster in table form

q)show x:til 3

0 1 2

q)show y:2#7

7 7

q)x cross y

0 7

0 7

1 7

1 7

2 7

2 7

q) ([] x) cross ([] y)

x y

—

0 7

0 7

1 7

1 7

2 7

2 7

\t til[1000] cross 1000#5

85

\t flip value flip ([] til 1000) cross ([] x1:1000#5)

33

/and if you are happy working with it in table form

\t ([] til 1000) cross ([] x1:1000#5)

5
Take last N observations for a column in a tableBoth of these can be used to create a list of indexes and the columns can be simply projected on to the list of indexes

/a) Intuitive way take last n from c

q) N:10; C:til 100000;

q) {[n;c]c{y-x}[til n] each til count c}[N;C] /timing 31 milli

0

1 0

2 1 0

3 2 1 0

4 3 2 1 0

….

/b) Faster way using xprev and flip

q) \t {[n;c] flip (1+til n) xprev \\: c}[10;c] /timing 9 millesec
Create a Polynomial Function from the coefficients

poly:{[x] (‘)[wsum[x;];xexp/:[;til count x]]};

f:poly 0 1 2 3;

f til 5

0 6 34 102 228
Count Non Null entries

All in K:

fastest:{(#x)-+/^x}

slower:{+/~^x}

slowest:{#*=^x}

Q translation:

fastest:{count[x]-sum null x}

slower:{sum not null x}

slowest:{count first group null x}
Camel case char separated symbols:

camelCase:{[r;c]`$ssr[;r;””] each @'[h;i;:;]upper h @’ i:1+ss'[;r] h:string x}
most common case
q)c:` sv/: `a`b cross `e`fff`ggk cross `r`f
`a.e.r`a.e.f`a.fff.r`a.fff.f`a.ggk.r`a.ggk.f`b.e.r`b.e.f`b.fff.r`b.fff.f`b.ggk.r`b.ggk.f
camelCase[“.”;c]
`aER`aEF`aFffR`aFffF`aGgkR`aGgkF`bER`bEF`bFffR`bFffF`bGgkR`bGgkF

6. Progress style bars:

p:0;do[10; p+:1; system “sleep .1”;1 “\r “,p#”#”];-1 “”;

{[p]1 “\r “,(a#”#”),((75-a:7h$75*p%100)#” “),string[p],”%”;if[p=100;-1 “”;]}

7. AutoCorr:

autocorr:{x%first x:x{(y#x)$neg[y]#x}/:c-til c:count x-:avg x}

8. Index of distinct elements:

idistinct:{first each group x}

idistinct{x?distinct x}

9. index and transpose a matrix

{sv[x;::] y}

Tree Tables/Parent Vector representation of Dictionaries

October 3, 2017October 10, 2017 / pindash91 / Leave a comment

Introduction:

http://archive.vector.org.uk/art10500340

Vector, the Journal of the British APL Association

TreeTables and the parent vector are magical things and they are a great way to flatten deeply nested structures. In particular, I will use this representation of a dictionary in the next post to allow you to import q libraries under a different namespace, kind of like what python allows with “import x as y”.

In this post I will focus on how we can easily rewrite a dictionary into a flat table with 4 columns. Let me motivate this a bit first. Take a deeply nested dictionary:

a| `b`c!(2;`d`e!(6;`f`g!7 8))
b| `c`d!4 5
c| `d`e!(6;`f`g!7 8)

In json that might look like this:

{  
  "a":{  
    "b":2,
    "c":{  
      "d":6,
      "e":{  
        "f":7,
        "g":8
      }
    }
  },
  "b":{  
    "c":4,
    "d":5
  },
  "c":{  
    "d":6,
    "e":{  
      "f":7,
      "g":8
    }
  }
}

As we can see each key is a letter and each value is either another dictionary or a number. In q we can apply functions into these arbitrarily nested dictionaries provided all the values conform. So for example we can add 10+d

a| `b`c!(12;`d`e!(16;`f`g!17 18))
b| `c`d!14 15
c| `d`e!(16;`f`g!17 18)

However, once your dictionaries stop conforming things can get a bit hairy. As a simple example let’s add a simple dictionary f!f to the key f in d

a| `b`c!(2;`d`e!(6;`f`g!7 8))
b| `c`d!4 5
c| `d`e!(6;`f`g!7 8)
f| (,`f)!,`f

Suddenly 10+d throws a type error. Because we can’t add 10 to the non numeric dictionary. We can always right logic to avoid those cases, but it would be simpler to simply pull out all the conforming values add 10 to them and then replace them in the original structure. A tabular tree representation aids in this. As an example the same dictionary in tabular form: (I have omitted some results in the middle for ease of reading.) Which I will call treeTable:

l p c d
———
0 0 :: 1
1 0 `a 1
1 0 `b 1
1 0 `c 1
1 0 `f 1
2 1 `b 1
2 1 `c 1
…
4 15 8 0
5 21 7 0
5 22 8 0

The column c (child) contains every key and value, column d tells you if the row is leaf node or has a dictionary below it. Pulling out all the numeric children is as simple as:

numeric:6 7 8 9h
select c from treeTable where not d, (abs type each c) in numeric
c
-
2
4
5
6
6
7
8
7
8

This table can then be modified and assuming we can convert it back into its dictionary form we could then work with nested data easily.

How the Magic is done:

At this point you either believe this structure is useful or you believe it isn’t but you are still interested in understanding how this transformation happens.

Let’s look at the original treeTable and unpack the meaning of the columns.

The first column l represents the level of the dictionary that the key or value is located in.
The second column p is the index of the parent row in this table. The root of the table is self parenting meaning that the 0th row is at the 0th level and it’s parent is 0 (keep this in mind it will be useful in a second). All elements will have a parent.
The third column c is the child and it is the value at this level of nesting. It will either be a key if there are more levels below or it will just be the value at that level.
The fourth and final column d indicates whether or not this row is a dictionary. That is, whether the child should be treated as a key or a value.

To convert a dictionary into a treeTable we use breadth first search, that is the purpose of the l column. We first define a primitive treeTable with only one row the root.

l p  c  d
---------
0 0  :: 1

All treeTables will have this row. If the thing we are trying to convert into a treeTable is not a dictionary but is instead SOMEKIND_OF_THING_THAT_IS_NOT_A_DICTIONARY. Then there will be only one more row in this table.

l p  c  d
---------
0 0  :: 1
1 0 SOMEKIND_OF_THING_THAT_IS_NOT_A_DICTIONARY 0

We will then know we are done because there are no more dictionaries to unpack at the last level.

If instead we get a dictionary. We will first record all the keys at that first level and return the table.

Our ability to find a value at a particular level is enabled through the use of the parent column in the treeTable. Suppose we have a simple dictionary:

{  
 "d":6,
 "e":{  
   "f":7,
   "g":8
  }
}

We can record this as the following two columns:

p c index
----------
0 :: 0 
0 `d 1 
0 `e 2 
2 `f 3 
2 `g 4 
1  6 5 
3  7 6 
4  8 7

The first row is the root it is self parenting. The next two rows are both top level keys. So their parent is the root. f and g both are under e so their parent is 2, but 6 is under d so it’s parent is 1. To make this easier to see, I have added the virtual index column which is always available. Then the final two rows are under f and g respectively.

If we want the parent of particular row, and we have the parent column, which I will call p:0 0 0 2 2 1 3 4

We can index into that row to get the parent. p 6 -> 3 which is f

We can repeat this until we get to the root. p 3 -> 2; p 2-> 0 ; p 0 -> 0. To find the root in one step we can use KDB’s built-in converge function which will apply until two consecutive results are the same. This is why it was so convenient for the root to be self-parenting. So see the path to the root use scan instead of over.

p over 6 -> 0

p scan 6 -> 3 2 0. Now that we have a path, we just need to get the keys that correspond to that path, this is done by indexing the path against the child column.

c 3 2 0 -> f e ::

We now can get the unique path to any element.

The next step is using the path to index into any level of the dictionary. This is accomplished with a special object called getItems.

getItems is defined by combining the indexing at depth verb with a function that reverses the path list and checks if the path list happens to be only the root. In which case, we simply return the original item.

Using just those two ideas, we are able to construct the treeTable. The algorithm is to index one level at a time each time recording the if the level contains dictionaries or not. If it contains no dictionaries we are done and we will return the same table twice in row, which means that our function will converge. Using breadth first search we avoid any stackoverflow issues that could happen with a recursive solution, instead the function becomes tail recursive, meaning all the necessary ingredients to call the function again are returned as the output. That is why on the first call the function returns the first row of a treeTable. That way each call after simply indexes deeper into the original dictionary to return more levels of the treeTable.

The Code and An Example:

toTreeTable:{[d]
 getItems:('[;] over (.[d;];{$[x~(enlist[::]);x;1_reverse x]}));
 tTT:{[getItems;t] 
 $[98h~type t;;t:([]l:(1#0);p:0;c:(::);d:1b)];
 lev:last t[`l];
 k:exec i from t where l=lev,d;
 $[count k;;:t];
 paths:t[`c] (t[`p]\')k;
 items:getItems each paths;
 id:where bd:99h=type each items;
 p:raze (count each key each items[id])#'k[id];
 c:raze key each items[id];
 df:count[p]#1b;
 lvl:count[p]#lev+1;
 id:where not bd;
 p:p,k[id];
 c:c,items[id];
 df:df,count[id]#0b;
 lvl:lvl,count[id]#lev+1;
 t upsert flip `l`p`c`d!(lvl;p;c;df)}[getItems];
 tTT over ()}



/an example of a nested structure 


b:`c`d!4 5
e:`f`g!7 8
c:`d`e!(6;e)
a:`b`c!(2; c)
d:`a`b`c!(a;b;c)
q)toTreeTable d
l p c d
---------
0 0 :: 1
1 0 `a 1
1 0 `b 1
1 0 `c 1
2 1 `b 1
2 1 `c 1
2 2 `c 1
2 2 `d 1
2 3 `d 1
2 3 `e 1
3 5 `d 1
3 5 `e 1
3 9 `f 1
3 9 `g 1
3 4 2 0
3 6 4 0
3 7 5 0
3 8 6 0
4 11 `f 1
4 11 `g 1
4 10 6 0
4 12 7 0
4 13 8 0
5 18 7 0
5 19 8 0

And Back Again!

Now that we covered how to get a treeTable we can also understand how to go back to a dictionary.

We apply the opposite approach. The core function returns a dictionary. Each time we return a dictionary that is slightly deeper than the previous time. We put placeholder empty dictionaries until we build the final result. Since we know whether each row is a key or a value, we know whether the current item requires a placeholder.

toDictFromTreeTable:{[tt]
 tD:{[tt;dSoFar;lev]
 dS:exec {x!count[x]#enlist[()!()]}[c] by p from tt where l=lev, d;
 dS:dS,.[!; value exec p,c from tt where l=lev, not d];
 pR:tt[`c](-1_|:) each (tt[`p]\')[key dS];
 pC:tt[`c]key dS;
 paths:raze each {(1_x;enlist[y])}'[pR;pC];
 $[lev>1;.[;;:;]/[dSoFar;paths;value dS];first value dS]}[tt];
 tD/[()!();1+til last tt[`l]]}

Wow This is Even More General Than We Thought:

When I first built this, I tried to make sure that I covered simple dictionaries and values. So I was curious what would happen to keyed tables. Keyed tables are special in that they are essentially dictionaries whose key and values are both dictionaries. Since a dictionary is a pair of lists and a list of dictionaries is a table. A keyed table is simply a dictionary whose key is a table and and whose value is a table. A trivial example to illustrate this point:

q)k:([]k:til 5)
k
-
0
1
2
3
4
q)v:([]v:10*til 5)
v 
--
0 
10
20
30
40
q)kv:k!v
k| v 
-| --
0| 0 
1| 10
2| 20
3| 30
4| 40
/Indexing against the key table returns the value table
q)kv[k]
v 
--
0 
10
20
30
40
/but we can also apply select only certain rows using the k column
/in this case I reverse the key table and take the first 2 rows.
q)kv[2#reverse k]
v 
--
40
30

Now what happens if we turn a key table into a treeTable:

l p c d
---------------
0 0 :: 1
1 0 (,`k)!,0 1
1 0 (,`k)!,1 1
1 0 (,`k)!,2 1
1 0 (,`k)!,3 1
1 0 (,`k)!,4 1
2 1 `v 1
2 2 `v 1
2 3 `v 1
2 4 `v 1
2 5 `v 1
3 6 0 0
3 7 10 0
3 8 20 0
3 9 30 0
3 10 40 0

It converts the key part of the table into key dictionaries that are the parents of the value dictionaries in the table. And we can turn it back:

q)toDictFromTreeTable toTreeTable kv
k| v 
-| --
0| 0 
1| 10
2| 20
3| 30
4| 40
q)kv ~toDictFromTreeTable toTreeTable kv
1b

In other words, keyedTables are treated like dictionaries, this means that if you only want to look at values, you will only see values, simply by select from the treeTable where not d. The internal dictionaries inside a keyed table are broken apart into their component dictionaries and the values are stored independently.

Tables Get Treated as singletons.

Since tables are actually lists of dictionaries, and lists are treated as values. A table is also treated as a value and placed directly into the child column.

q)t:([] til 10)
x
-
0
1
2
3
4
5
6
7
8
9
q)toTreeTable t
l p c d
---------------------------------
0 0 :: 1
1 0 +(,`x)!,0 1 2 3 4 5 6 7 8 9 0

The function from TreeTable correctly undoes the toTreeTable function
but the treeTable form is actually more nested than the original table.

q)toDictFromTreeTable toTreeTable t
x
-
0
1
2
3
4
5
6
7
8
9

We can fix this by expanding our parent vector to notate whether a current element is a dictionary, list or an atom. That way we would create a node that is the head of every list and then iterate through the indexes in the list. This is left as an exercise, or until I need this functionality.

Another Dependency Management Scheme

September 17, 2017October 1, 2017 / pindash91 / Leave a comment

This post is based on an accumulator pattern I encountered when converting recursive functions to be tail recursive.

For a long time I have been looking for a way to cleanly handle errors that arise in functions. It’s well known among programmers that only about 20% of code does any real work, the rest is there to take care of IO and error conditions. That’s one of the afterthoughts that Joe Morasco has in How to learn a new programming language.

It takes me a few hours to recall how to make the linked list
work and get something on the air. To allow someone else to play the
game unsupervised (which means doing some error handling) usually
takes me a few days of programming.

KDBQ lets you omit a lot of ceremony and boilerplate so you can get to work faster. Often a function that does something useful, ie real work, is a few lines. However, you still need to do the error handling somewhere and it’s best if we can keep our functions doing mostly work. If there are specific error conditions that need to be handled then by all means handle it, otherwise assume that the input you get is what you expect. (I know this sounds crazy to all those paranoid defensive programmers, but frankly you might like where this post goes.)

The only ceremony that I insist on is that every function you write takes a dictionary as the input. I find this to be pretty liberating in general as you can easily add functionality over time without breaking an API and it allows for the function to implement sensible default values while allowing customization.

What I eventually settled on was something that would allow me to create dependent functions by ordering them.

I think of them as “getters.” Each function “gets” a value using a dictionary of configurations.

A list of getters can be easily seen on a page and read. If you know what each getter is doing then you can keep reading, otherwise you can go look at the definition.

I then would create a getter dictionary, each key in the getter dictionary will eventually be assigned a value. The value is either the value that was already in the input or the value of applying the current getter to the dictionary.

To make this a bit clearer, suppose that I want to create a function that gives my pet animal a name.

I have the following dependencies:

What pet animal am I getting? Turtle, Elephant, Dog, Snake

getAnimalType:{[args] first 1?`Turtle`Elephant`Dog`Snake}

Is the animal a boy or a girl? (obviously turtles and elephants are more likely to be girls)

getBoyorGirl:{[args] 
   d:`Turtle`Elephant`Dog`Snake!0.7 0.7 0.5 0.5;
    $[first d[args[`AnimalType]]>1?1.0;`girl;`boy]}

Choose a name based on those two criteria.

getName:{[args] $[`boy~args[`BoyorGirl];`Kunee;`Kuni]} 
/(we like the name IPA transcription kʊniː)

So here is the basic idea: We combine these three separate verbs into a larger verb:

getPetAnimalName:{[args] 
  getters:()!();
  getters[`AnimalType]:getAnimalType;
  getters[`BoyorGirl]:getBoyorGirl;
  getters[`Name]:getName;
  :setIfMissing /[args;key getters; value getters]}

We can read this function by looking at what keys are being set in the getters dictionary. The keys tell you what this function plans on using or setting; the values tell you how the function will set them if the key is missing in the dictionary you pass in.

So for example if you don’t know what you want you can simply pass an empty dictionary to getPetAnimalName which will give you back something like:

q)getPetAnimalName[()!()]
AnimalType| Turtle
BoyorGirl | girl
Name | Kuni

Now suppose you are set on a particular pet animal you can pass a dictionary that has the animal preset.

q)presetAnimalDict:enlist[`AnimalType]!enlist[`Elephant]
q)getPetAnimalName[presetAnimalDict]
AnimalType| Elephant
BoyorGirl | girl
Name | Kuni

So far we have simply seen that this construct of setIfMissing allows us to push in default arguments and have default verbs that use our preset arguments. Now we can also show that we also get basic error handling for free.

Suppose that we break the function that chooses the animal type. We would expect that the function will not return boy or a girl if we pass nothing, but should work as usual if we do pass it an animal type.

/break function getAnimalType
q)getAnimalType:{[args] break;first 1?`Turtle`Elephant`Dog`Snake}
q)getPetAnimalName[()!()] /we get back as much as was possible 
Name | Kuni
q)getPetAnimalName[presetAnimalDict] /works as before
AnimalType| Elephant
BoyorGirl | girl
Name | Kuni

Mechanics of how the verb setIfMissing operates:

The premise of this verb, as its name suggests, is that it sets the key in the dictionary that was passed in if that key was missing. If that key exists, it does nothing. To set a key we call the function that was passed in the dictionary we have so far. This allows functions to depend on values from previous functions or from arguments that are passed in. If a function throws an error, it catches the error and tries to set the next key that is given to it.

I have created essentially two different versions of this verb. They follow two different philosophies.

Simple Version: with primitive error handling and logging

The first version is the most simple and just logs what keys it is setting and what keys it failed to set. Here it is, pretty short:

log:{neg[2] (string .z.Z)," : ",x,"\t",y;}
setIfMissing:{[a;k;f]  log["INFO";"setting key: ",string k];
  .[{[a;k;f]$[ k in key a;a; @[a;k;:;f[a]]]};
    (a;k;f); 
     {[a;k;e] log["ERROR";
   "Fail to set key: ",(string k)," Error: ",e]];a}[a;k]]};

Here I have fully elucidated every step:

/first we create a logging utility
.log:{
 neg[2] /this is the handle to standard error 
 (string .z.Z) /we convert the time into a string
 ," : ",x,"\t",y; / and append the type of error to the message
 };
/so for example 
/.log["INFO";"HELLO WORLD"] 
/prints 2017.09.17T09:06:14.309 : INFO HELLO WORLD

/now we define setIfMissing
setIfMissing:{[a;k;f] /a is the dictionary,k is the key, f is the function
 .log["INFO";"setting key: ",string k]; /log that we will be setting k
 .[ /protected execution will trap any errors
 {[a;k;f] /a,k,f are the same as the beginning
 $[ k in key a; /check if k is in the dictionary
 a; /if it is return a
 @[ /general apply, allow you to modify contents of a container
 a; /a is our container or dictionary in this case
 k; /k is the key or index in the container we will modify
 :; /ammend (set) is the function we will use 
 f[a] /is the value we will set, which is the verb f applied to a
 / this is what allows f to depend on any set items in the dictionary
 ] /this ends general apply 
 ] /this ends the conditional of whether k was in a
 }; / this ends the first function in the protected execution
 (a;k;f); /These are the arguments to the protected execution
 {[a;k;e] /a is still the dictionary, k is the key and e is the error
 .log["ERROR";"Fail to set key: ",(string k)," Error: ",e]; /log error
 a} /return the dictionary, and ends the trap function
 / this allows us to chain functions even if there is an error
 [a;k] /pass in the value of a and k, 
 /otherwise the error handler only sees the error
 ] /ends the protected execution
 }; /ends setIfMissing

Fancier version with introspection and profiling:

The second version of setIfMissing is likely to please those defensive programmers that I mentioned earlier. I will not go through and elucidate this entire function. Instead I will point out the salient differences in this significantly more involved version of setIfMissing.

The first difference is that there no longer is text logging. Instead the dictionary that you pass in gets updated with all of the information about what happened. There are two tables that get added simErrors and simProfiling. So for example here is what our examples from before look like:

q)getPetAnimalName[()!()]
            | ::
simErrors   | (+(,`failed)!,`AnimalType`BoyorGirl`Name)!+(,`reason)!,("break"..
simProfiling| (+(,`simKey)!,,`initialized)!+(,`time)!,,2017.09.17T09:53:24.471
q)getPetAnimalName[presetAnimalDict]
AnimalType | `Elephant
 | ::
simErrors | (+(,`failed)!,())!+(,`reason)!,()
simProfiling | (+(,`simKey)!,`initialized`AnimalType`BoyorGirl`Name`WelcomeM..
BoyorGirl | `girl
Name | `Kuni

We can look at those tables

q)getPetAnimalName[()!()][`simErrors]
failed    | reason 
----------| ------------
AnimalType| "break" 
BoyorGirl | "AnimalType"
Name      | "BoyorGirl" 
q)getPetAnimalName[()!()][`simProfiling]
simKey     | time 
-----------| -----------------------
initialized| 2017.09.17T09:55:59.166
q)getPetAnimalName[presetAnimalDict][`simErrors]
failed| reason
------| ------
q)getPetAnimalName[presetAnimalDict][`simProfiling]
simKey     | time 
-----------| -----------------------
initialized| 2017.09.17T10:09:04.062
AnimalType | 2017.09.17T10:09:04.062
BoyorGirl  | 2017.09.17T10:09:04.062
Name       | 2017.09.17T10:09:04.062

As we can see, we now get all of the logging information from the object itself. This is very powerful because it allows us to do queries against this data later to either improve performance by using the profiling table or to see where the errors are coming from.

The second feature takes advantage of the simErrors table. Here is where we become super defensive, instead of naively trying to run every function against every key, we can preemptively fail if we know a function depends on an earlier key that failed. This is the reason, we now no longer return a name if boyOrGirl failed. Since getName explicitly depends on boyOrGirl being set, so we can premtively bail and not even bother running the query. Therefore, you can see that reason given will either be the error message from running the function or the keys that were required. To demonstrate this a bit further, let’s add another function to our getPetAnimalName a welcome message:

getWelcomeMessage:{[args] 
  "Welcome Home, ", string args[`AnimalType]," ", args[`Name]}
getPetAnimalName:{[args] 
 getters:()!();
 getters[`AnimalType]:getAnimalType;
 getters[`BoyorGirl]:getBoyorGirl;
 getters[`Name]:getName;
 getters[`WelcomeMessage]:getWelcomeMessage;
 :setIfMissing /[args;key getters; value getters]}

Now if we run getPetAnimalName:

q)getPetAnimalName[()!()][`simErrors]
failed        | reason 
--------------| -----------------
AnimalType    | "break" 
BoyorGirl     | "AnimalType" 
Name          | "BoyorGirl" 
WelcomeMessage| "AnimalType,Name"

As we can see all keys that were required were mentioned as the reason for the failure. In particular WelcomeMessage failed because both AnimalType and Name were missing. We can do this by inspecting the function before calling it, if any of the keys in the failed column of simErrors table appear in the function, we report them and bail.

Finally, the last feature of setIfMissing is that we can pass the dictionary by name or by value. So far in all the examples we have passed the dictionary by value and gotten a new value back with some keys set. However, if we call setIfMissing with a name of a dictionary it will return the name and modify the dictionary in place.

q)presetAnimalDict
AnimalType| Elephant
q)getPetAnimalName[`presetAnimalDict]
`presetAnimalDict
q)presetAnimalDict
AnimalType    | `Elephant
              | ::
simErrors     | (+(,`failed)!,())!+(,`reason)!,()
simProfiling  | (+(,`simKey)!,`initialized`AnimalType`BoyorGirl`Name`WelcomeM..
BoyorGirl     | `girl
Name          | `Kuni
WelcomeMessage| "Welcome Home, Elephant Kuni!"

Finally here is the code for the introspecting and more defensive setIfMissing:

setIfMissing:{[a;k;f] 
 if[not `simErrors in key a[];
 simT:``simErrors`simProfiling!
 (::;([failed:()]reason:());([simKey:(1#`initialized)]time:1#.z.Z));
 $[neg[11h]~type a;
 a set a[],simT;
 a:a,simT]
 ];
 .[{[a;k;f]
 $[ k in key a;
 @[a;`simErrors`simProfiling;
 {[x;y]y@x};
 ({[k;d]d _ k}[k];{[k;d]d upsert (k;.z.Z)}[k])]; 
 [$[any c:@[0!a[`simErrors];`failed] in `$1_'s where(s:-4!last value f) like "`*";
 'raze "," sv string @[0!a[`simErrors];`failed] where c;
 res:f[a[]]
 ];
 @[a;(`simErrors;`simProfiling;k);
 {[x;y]y@x};
 ({[k;d]d _ k}[k];{[k;d]d upsert (k;.z.Z)}[k];{y;x}[res])]]
 ]};
 (a;k;f);
 {[a;k;e]  @[a;`simErrors;upsert;(k;e)]}[a;k]]};

/#### Short and Sweet Documentation:
How to use:
/Create a list of getter function that take a dictionary as inputs
getA:{[kwargs]5};
getB:{[kwargs]42};
getC:{[kwargs]kwargs[`A]+kwargs[`B]};




/Create a meta function that composes these getters
getABC:{[kwargs]
 getters:()!();
 getters[`A]:getA;
 getters[`B]:getB;
 getters[`C]:getC;
 :kwargs:setIfMissing/[kwargs;key getters;value getters]}

/Apply the meta function to either a dictionary or a name of a dictionary
emptyD:()!()
q)getABC[emptyD]
sigil | ::
simErrors | (+(,`failed)!,())!+(,`reason)!,()
simProfiling| (+(,`simKey)!,`started`A`B`C)!+(,`time)!,2017.09.14T13:44:24.05..
A | 5
B | 42
C | 47
q)getABC[`emptyD]
`emptyD
q)emptyD
sigil | ::
simErrors | (+(,`failed)!,())!+(,`reason)!,()
simProfiling| (+(,`simKey)!,`started`A`B`C)!+(,`time)!,2017.09.14T13:45:02.09..
A | 5
B | 42
C | 47
q)notSoEmptyD:`A`B!(10;14)
q)getABC[notSoEmptyD]
sigil | ::
A | 10
B | 14
simErrors | (+(,`failed)!,())!+(,`reason)!,()
simProfiling| (+(,`simKey)!,`started`A`B`C)!+(,`time)!,2017.09.14T13:47:59.24..
C | 24




/We can look at profiling information
q)emptyD.simProfiling
simKey | time 
-------| -----------------------
started| 2017.09.14T13:45:02.099
A | 2017.09.14T13:45:02.099
B | 2017.09.14T13:45:02.099
C | 2017.09.14T13:45:02.099

/We get automatic error handling and prememptive failiure
/let's add an error into getA
getA:{[kwargs]'`broken};
emptyD:()!();
q)emptyD:getABC[emptyD]
sigil | ::
simErrors | (+(,`failed)!,`A`C)!+(,`reason)!,("broken";,"A")
simProfiling| (+(,`simKey)!,`started`B)!+(,`time)!,2017.09.14T13:51:06.419 20..
B | 42
q)emptyD.simErrors
failed| reason 
------| --------
A | "broken"
C | ,"A" 
/If the value is set the meta function will still work.
q)getABC[notSoEmptyD]
sigil | ::
A | 10
B | 14
simErrors | (+(,`failed)!,())!+(,`reason)!,()
simProfiling| (+(,`simKey)!,`started`A`B`C)!+(,`time)!,2017.09.14T13:53:46.03..
C | 24




Caveats:
 The sim prefix is used in the dictionary

	pt.instafollowfast.c… on Backtracking In Q/ word l…
	Alex on The Is-Ought Distinction
	Fred Mleczko on How the Mighty have Fallen, or…
	pindash91 on A Better Way To Load Data into…
	Andrew Stanton (@and… on A Better Way To Load Data into…

iabdb

It's All Been Done Before.

Author: pindash91