data parallel

I want to calculate the moving window max or min in a data parallel way.
Lets start with the two argument function max

max(x,y) returns the greater of x and y
max(x,x) returns x (idempotent)

max(5,3) return 5

max(max(5,3),max(5,3)) reduces to max(5,5) which just returns 5

If we want max of an entire list we can simply think of it as a reduction in the lisp/APL sense

max(head (x), (max (head (tail (x), max( head(tail(tail(x))), …. )

or in a more readable way, replace max with |, and insert it between every element in the list

x[0] | x[1] | x[2] | x[3] this is a standard reduction/over.

Here is a concrete example:

x:5 4 3 2 7 2 9 1

Max over(x) –> 9

|/x –> 9

We can look at the intermediate results

Max scan(x) –> 5 5 5 5 7 7 9 9

|\x –> 5 5 5 5 7 7 9 9

Now let’s say that we want to look at the 3 slice moving window.

Let’s take advantage of the fact that max(x,x) yields x, idempotent

We can calculate the max between each pair in our list. Read max each-prior (‘:)

(|’:)x –> 5 5 4 3 7 7 9 9

applying this function n-1 times gives us a moving max

so (|’:) (|’:) x is 3 slice moving window, which we can rewrite as
2 |’:/ x

5 5 5 4 7 7 9 9

We can see that |\x is equivalent to (count[x]-1) |’:/ x which is data parallel by construction. In other words, we are doing an adjacent transform.

To make this a bit clearer, we can show intermediate results by using a scan(\) instead of an over(/)

(count[x]-1)|’:\x

5 4 3 2 7 2 9 1
5 5 4 3 7 7 9 9
5 5 5 4 7 7 9 9
5 5 5 5 7 7 9 9
5 5 5 5 7 7 9 9
5 5 5 5 7 7 9 9
5 5 5 5 7 7 9 9
5 5 5 5 7 7 9 9

One thing we can notice is that we could terminate early, since we know that if an adjacent element did not change there is no way for the max value to propagate further.

Which means we can rewrite (count[x]-1)|’:\x to simply (|’:\)x

and indeed this gives us:

(|’:\)x

5 4 3 2 7 2 9 1
5 5 4 3 7 7 9 9
5 5 5 4 7 7 9 9
5 5 5 5 7 7 9 9

Technically this is still slower than maxs(x) but if we had gpu support for each-prior(‘:) we could calculate maxs in n calls to max prior which has a parallelization factor of n/2. Depending on the range of x, the n calls might be bound significantly such that for small ranges, for example you are dealing with a short int (8 bytes), you can effectively compute the maxs in O(c*n/2) time with parallelism. Where c is a function of the effective range of your inputs — max[x]-min[x].

Now let’s get back to the problem of max in a sliding window, like this leetcode problem. This problem is classified as hard and the reason is that leetcode doesn’t want you to use parallelism to solve sliding window, it wants you to do it in O(n) time directly.

We know we can solve the problem in O(n) time when the window is equal to 1,2, and n.
identity({x}), max each prior ({|’:x}), and maxs ({|\x}) respectively.

The key to solving it for all window sizes is to recognize that each entry only depends on itself and its neighbors and that the result is the same if you duplicate neighbors – again idempotentcy comes to the rescue. max(1 2 3) is equal to max(1 1 2 2 3 3). Another good explanation can be found here. This is technically a dynamic programming approach with two pointers, but I have never seen it explained in what I think is the most straightforward way.
Let’s borrow the example from leetcode and work through what should happen.

Suppose list of numbers n=1 4 3 0 5 2 6 7 and window size k=3. Then reusing the prior code to show intermediate steps, we get:

1 4 3 0 5 2 6 7 1 4 4 3 5 5 6 7 1 4 4 4 5 5 6 7

If we look at a particular slot, 0 for example, it gets overtaken by 4, but the 5 slot only needs to compete with 3. In other words, each element is looking at at most n-1 elements to the right and n-1 elements to the left. So we can rearrange n to do this in a natural way. Let’s reshape n, so that there are k columns. q supports ragged arrays, so our last row is not the same length which is nice.

q)(0N;k)#n 1 4 3 0 5 2 6 7

We can then compute the maxs of each row

q)maxs each (0N;k)#n 1 4 4 0 5 5 6 7

And flatten it back:

q)l:raze maxs each (0N;k)#n q)l 1 4 4 0 5 5 6 7

This gives us our left window. Now we will repeat the steps but instead of going from the left we will go from the right.
The simplest way go from the right is to reverse the list apply the function as normal from the left and reverse the list again. In J this operation is called under, as in going under anesthesia, perform the operation and then wake you from the anesthesia. This is equivalent to looking at the cummulative max from the right or walking backward through the array.

q)under:{[f;g](f g f ::)}
q)(reverse maxs reverse)~under[reverse;maxs] /this is the same
1b

q)(reverse maxs reverse ::) each (0N;k)#n 4 4 3 5 5 2 7 7

Now we just need to flatten the list and call it r.

q)r:raze (reverse maxs reverse ::) each (0N;k)#n

Because r is going from the right, we need to shift it by k-1 elements to the right so that we are not using data from the future when looking at what the current max should be. Printing l on top of r, we get:

q)(l;(k-1) xprev r) 1 4 4 0 5 5 6 7 0N 0N 4 4 3 5 5 2

As we can see, a simple max along the columns will give us the correct answer. (Note: the 0N in the beginning correspond to nulls)

q)max (l;(k-1) xprev r) 1 4 4 4 5 5 6 7

This idea can be generalized, so let’s write a generic function that can take an idempotent operation and create a moving window version.

fmx:{[f;g;h;m;x]l:raze (f')w:(0N;m)#x;r:raze (g')w;h[l;(m-1) xprev r]}

fmx takes a function f that will be applied to generate the left window, a function g that will be used to generate the right window and h a function that will combine left and right windows shifting r based on the window.

We can now generate (mmax, mmin, mfill):
fmmax:fmx[maxs;(reverse maxs reverse ::);|] fmmin:fmx[mins;(reverse mins reverse ::);{&[x;x^y]}] /fill the nulls with x fmfill:fmx[fills;(reverse ({y^x}) reverse ::);{y^x}]

We can also generate a function that will allow us to inspect what elements we have access to in the right and left windows so that we can debug/make new functions, with some small modifications.

inspect:fmx[,\;(reverse (,\) reverse ::);{([]l:x;r:y)}]

Here we define f to concatenate the elements seen so far, g is the reverse concatenate, and h is return a table of the l and r.

When we run inspect on our original n, we see that every row has the information from the appropriate sliding window in it, though sometimes more than once.

q)inspect[k;n]
l      r       
--------------
,1     `long$()
1 4    `long$()
1 4 3  1 4 3   
,0     4 3     
0 5    3       
0 5 2  0 5 2   
,6     5 2     
6 7    2

We can see that the combination of left and right windows will always have at least the three elements required.

To summarize, we saw that idempotent functions, like max and min, allow for parallelization and allow us to use the dynamic programming two pointer technique to solve sliding window calculation in O(n).

Below is the python code for sliding maximum window, written in numpy/APL style, I’m not sure it makes the concept clearer, but more people read python than q. numpy doesn’t like ragged arrays, so there is a bit of extra code to handle the cases where count of n is not evenly divisible by k.

def maxSlidingWindow(k,nums)
  cmax=np.maximum.accumulate
  n=len(nums)
  z=np.zeros(k*np.ceil(n/k).astype(int))
  z[:n]=nums
  z=z.reshape(-1,k)
  l=np.resize(cmax(z,1).reshape(z.size),n)
  r=np.resize(np.flip(cmax(np.flip(z,1),1),1).reshape(z.size),n)
  return list(np.max(np.stack([l[k-1:],r[:r.size-(k-1)]]),0).astype(int))

all code can be found here:
https://github.com/pindash/q_misc/blob/master/sliding_window_max.q

	pt.instafollowfast.c… on Backtracking In Q/ word l…
	Alex on The Is-Ought Distinction
	Fred Mleczko on How the Mighty have Fallen, or…
	pindash91 on A Better Way To Load Data into…
	Andrew Stanton (@and… on A Better Way To Load Data into…

iabdb

It's All Been Done Before.

data parallel

Idempotent + Moving Window is simply a reduction