Dosage Search [Part 2]

So since publishing the first post on this topic. I have researched this question.

First I need to acknowledge, that my CS friends have all pointed out the similarity between this question and the two egg puzzle.

You are given two eggs and placed in a 100 floor building. You need to find the highest floor where the egg could survive a drop.

If you only have one egg, then you obviously just go up one floor at a time. Eventually the egg breaks or you reach floor 100. In first case, the floor below is the answer. In the second,  floor 100, since there are no higher floors.

If you have 10 eggs. You can just do binary search and find the right floor very quickly.

Since you have two eggs you can do better than one egg. The post I linked to earlier has a solution that tries to ensure the same number of drops as the worst case of binary search. Another approach is to drop the first egg on 2^n floor. So the first time would be from floor 2 then from floor 4, then from floor 8 and that way quickly climb up the building. This is similar to how network connections back off when a service is not responding. Retrying less frequently as the number of retries goes up.  (This allows the service to restart without falling over from the bombardment of ever more frequent requests)

As similar as this egg approach is, it fails to capture something that I think is essential to the dosage question. That is that, there is a difference between making a mistake of dropping the egg the fifty floors too high and making a mistake by one floor.  — Even if during the search you can’t tell, as that would be an obvious way to find the result in two tries.

So, I ran the code from the previous section and got some result for lists that had up to 1000 elements. However, python started to take too long for larger lists.

This prompted me to rewrite the functions into q. q and it’s sibling k, is an obscure language except to those in the financial services industry. It was created by a genius, Arthur Whitney, whose interview speaks for itself. The language itself is a derivative of APL and Lisp with a few more good ideas thrown in for good measure.  (More on this language in a different post)

Anyway the same algorithm as before which in python took 30+ hours ran in 20 minutes in q.

The relative speed of the k code. Allowed me to see a pattern for the divisor. As the list grows longer, the divisor that we use to split the list should get larger, albeit slowly.

Data (More at the bottom)
Number of elements in list divisor Average cost
50 5 7.78
100 6 11.48
500 13 27.128
1000 17 38.85
4000 31 78.87175
5000 35 88.3382
10000 49 125.4369

 

Since the divisor wasn’t growing for lists that were similar in size, I realized the mistake I made by keeping the divisor constant. Obviously as the list gets shorter we need to update the divisor that we are using to divide the list, since the problem has been reduced to searching a smaller list.

 

This led me to create another type of binary search with an updating divisor. The divisor grows proportional to log(n)^3.

This also increased the speed of the search since I was no longer doing linear search on any list smaller than the divisor I was testing. To explain: if you have a static divisor, then when you start you have a list of 1 million elements. So you divide by 200 and you know if the item you are searching for is in the first range [0:5,000) or in the second range (5000: 1,000,000]. However, gradually the list gets smaller, but you keep dividing the list in this very uneven way, so that when the number of elements is less than 200, you keep looking at the first element. This is equivalent to linear search.

If instead we start to divide the list by smaller divisors, then we can get closer to binary search and since much of the list is eliminated our chances of making a huge mistake are also smaller.

Here is the q code with the two versions of search: (this might be unreadable to most people but I plan on having a tutorial series on k, q, kdb soon)

dif_cost: {:x-y};
trivial_cost: {x+y; :1};
scalar_cost: {x+y; :100};
/binary search with cost function
st: {[l;item;cf;d] cost:0; while [((count l) > 1); m: (count l) div d; $[l[m]<item; [cost+:1; l: (m+1)_l]; l[m]>item; [cost +: (cf[l[m]; item]); l:m#l]; :cost+:1];]; cost};
/calculates average cost for a particular divisor and n, by searching for each element 
/then avg all those costs
st_avg: {[n; d] i: til n; res: @[i; i; st[i; ; dif_cost; d]]; avg res };
 
/iterates over divisors only going from 2 to 10 *( floor (log (n)))
/example for 100 it goes from 2 3 4 … 37 38 39 
/ this covers the divisors that are relevant and minimizes the cost and removes unneeded computations
st_div: {[n] d: $[n<50; 2_til n; 2_til 10 * floor log n]; res: st_avg[n] each d; d[first where res = min res],min res}

 

/Algorithm with updating divisor

s_auto: {[l;item;cf, f] cost:0; while [((count l) > 1); d: max (2, floor (log (count l) xexp f) ); m: (count l) div d; $[l[m]<item; [cost+:1; l: (m+1)_l]; l[m]>item; [cost +: (cf[l[m]; item]); l:m#l]; :cost+:1];]; cost};

/Then we can minimize over f, which is the factor that we exponentiate the log N.

 

 

Data continued:

num d cost
50 5 7.78
100 6 11.48
150 7 14.32667
200 8 16.755
250 9 18.768
300 10 20.72667
350 11 22.49143
400 11 24.15
450 12 25.66
500 13 27.128
550 13 28.51455
600 13 29.85833
650 14 31.12
700 14 32.34143
750 14 33.50267
800 15 34.62875
850 15 35.75529
900 15 36.84667
950 16 37.84526
1000 17 38.85
1050 17 39.83143
1100 17 40.78818
1150 17 41.75652
1200 17 42.7125
1250 19 43.6
1300 19 44.46538
1350 19 45.35333
1400 19 46.18
1450 19 47.02966
1500 19 47.84467
1550 19 48.68581
1600 20 49.46
1650 21 50.25697
1700 21 51.01176
1750 21 51.79314
1800 21 52.54167
1850 21 53.27514
1900 22 54.02211
1950 22 54.73641
2000 22 55.4305
2050 23 56.16927
2100 23 56.82714
2150 23 57.52884
2200 23 58.20273
2250 23 58.88222
2300 23 59.55609
2350 25 60.20553
2400 25 60.85167
2450 25 61.47714
2500 25 62.0944
2550 25 62.74235
2600 25 63.33962
2650 25 64.00113
2700 25 64.59259
2750 26 65.21273
2800 27 65.84
2850 27 66.39614
2900 27 67.0331
2950 27 67.58475
3000 27 68.20133
3050 27 68.73803
3100 27 69.29355
3150 28 69.88635
3200 27 70.44688
3250 28 71.00954
3300 28 71.56636
3350 29 72.11164
3400 29 72.64853
3450 29 73.16986
3500 29 73.70571
3550 30 74.30901
3600 29 74.79778
3650 29 75.31123
3700 31 75.84622
3750 31 76.37627
3800 31 76.87974
3850 31 77.36545
3900 31 77.85179
3950 31 78.39165
4000 31 78.87175
4050 31 79.37975
4100 31 79.88659
4150 31 80.37084
4200 31 80.87167
4250 32 81.35765
4300 31 81.8307
4350 33 82.3223
4400 33 82.81409
4450 32 83.28472
4500 33 83.76067
4550 33 84.21473
4600 33 84.69391
4650 33 85.15935
4700 33 85.60809
4750 33 86.07411
4800 33 86.555
4850 33 86.98887
4900 33 87.45633
4950 35 87.92222
5000 35 88.3382
10000 49 125.4369

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s