# Dosage Search [Part 2]

So since publishing the first post on this topic. I have researched this question.

First I need to acknowledge, that my CS friends have all pointed out the similarity between this question and the two egg puzzle.

You are given two eggs and placed in a 100 floor building. You need to find the highest floor where the egg could survive a drop.

If you only have one egg, then you obviously just go up one floor at a time. Eventually the egg breaks or you reach floor 100. In first case, the floor below is the answer. In the second,  floor 100, since there are no higher floors.

If you have 10 eggs. You can just do binary search and find the right floor very quickly.

Since you have two eggs you can do better than one egg. The post I linked to earlier has a solution that tries to ensure the same number of drops as the worst case of binary search. Another approach is to drop the first egg on 2^n floor. So the first time would be from floor 2 then from floor 4, then from floor 8 and that way quickly climb up the building. This is similar to how network connections back off when a service is not responding. Retrying less frequently as the number of retries goes up.  (This allows the service to restart without falling over from the bombardment of ever more frequent requests)

As similar as this egg approach is, it fails to capture something that I think is essential to the dosage question. That is that, there is a difference between making a mistake of dropping the egg the fifty floors too high and making a mistake by one floor.  — Even if during the search you can’t tell, as that would be an obvious way to find the result in two tries.

So, I ran the code from the previous section and got some result for lists that had up to 1000 elements. However, python started to take too long for larger lists.

This prompted me to rewrite the functions into q. q and it’s sibling k, is an obscure language except to those in the financial services industry. It was created by a genius, Arthur Whitney, whose interview speaks for itself. The language itself is a derivative of APL and Lisp with a few more good ideas thrown in for good measure.  (More on this language in a different post)

Anyway the same algorithm as before which in python took 30+ hours ran in 20 minutes in q.

The relative speed of the k code. Allowed me to see a pattern for the divisor. As the list grows longer, the divisor that we use to split the list should get larger, albeit slowly.

Data (More at the bottom)
 Number of elements in list divisor Average cost 50 5 7.78 100 6 11.48 500 13 27.128 1000 17 38.85 4000 31 78.87175 5000 35 88.3382 10000 49 125.4369

Since the divisor wasn’t growing for lists that were similar in size, I realized the mistake I made by keeping the divisor constant. Obviously as the list gets shorter we need to update the divisor that we are using to divide the list, since the problem has been reduced to searching a smaller list.

This led me to create another type of binary search with an updating divisor. The divisor grows proportional to log(n)^3.

This also increased the speed of the search since I was no longer doing linear search on any list smaller than the divisor I was testing. To explain: if you have a static divisor, then when you start you have a list of 1 million elements. So you divide by 200 and you know if the item you are searching for is in the first range [0:5,000) or in the second range (5000: 1,000,000]. However, gradually the list gets smaller, but you keep dividing the list in this very uneven way, so that when the number of elements is less than 200, you keep looking at the first element. This is equivalent to linear search.

If instead we start to divide the list by smaller divisors, then we can get closer to binary search and since much of the list is eliminated our chances of making a huge mistake are also smaller.

Here is the q code with the two versions of search: (this might be unreadable to most people but I plan on having a tutorial series on k, q, kdb soon)

dif_cost: {:x-y};
trivial_cost: {x+y; :1};
scalar_cost: {x+y; :100};
/binary search with cost function
st: {[l;item;cf;d] cost:0; while [((count l) > 1); m: (count l) div d; \$[l[m]<item; [cost+:1; l: (m+1)_l]; l[m]>item; [cost +: (cf[l[m]; item]); l:m#l]; :cost+:1];]; cost};
/calculates average cost for a particular divisor and n, by searching for each element
/then avg all those costs
st_avg: {[n; d] i: til n; res: @[i; i; st[i; ; dif_cost; d]]; avg res };

/iterates over divisors only going from 2 to 10 *( floor (log (n)))
/example for 100 it goes from 2 3 4 … 37 38 39
/ this covers the divisors that are relevant and minimizes the cost and removes unneeded computations
st_div: {[n] d: \$[n<50; 2_til n; 2_til 10 * floor log n]; res: st_avg[n] each d; d[first where res = min res],min res}

/Algorithm with updating divisor

s_auto: {[l;item;cf, f] cost:0; while [((count l) > 1); d: max (2, floor (log (count l) xexp f) ); m: (count l) div d; \$[l[m]<item; [cost+:1; l: (m+1)_l]; l[m]>item; [cost +: (cf[l[m]; item]); l:m#l]; :cost+:1];]; cost};

/Then we can minimize over f, which is the factor that we exponentiate the log N.

Data continued:

 num d cost 50 5 7.78 100 6 11.48 150 7 14.32667 200 8 16.755 250 9 18.768 300 10 20.72667 350 11 22.49143 400 11 24.15 450 12 25.66 500 13 27.128 550 13 28.51455 600 13 29.85833 650 14 31.12 700 14 32.34143 750 14 33.50267 800 15 34.62875 850 15 35.75529 900 15 36.84667 950 16 37.84526 1000 17 38.85 1050 17 39.83143 1100 17 40.78818 1150 17 41.75652 1200 17 42.7125 1250 19 43.6 1300 19 44.46538 1350 19 45.35333 1400 19 46.18 1450 19 47.02966 1500 19 47.84467 1550 19 48.68581 1600 20 49.46 1650 21 50.25697 1700 21 51.01176 1750 21 51.79314 1800 21 52.54167 1850 21 53.27514 1900 22 54.02211 1950 22 54.73641 2000 22 55.4305 2050 23 56.16927 2100 23 56.82714 2150 23 57.52884 2200 23 58.20273 2250 23 58.88222 2300 23 59.55609 2350 25 60.20553 2400 25 60.85167 2450 25 61.47714 2500 25 62.0944 2550 25 62.74235 2600 25 63.33962 2650 25 64.00113 2700 25 64.59259 2750 26 65.21273 2800 27 65.84 2850 27 66.39614 2900 27 67.0331 2950 27 67.58475 3000 27 68.20133 3050 27 68.73803 3100 27 69.29355 3150 28 69.88635 3200 27 70.44688 3250 28 71.00954 3300 28 71.56636 3350 29 72.11164 3400 29 72.64853 3450 29 73.16986 3500 29 73.70571 3550 30 74.30901 3600 29 74.79778 3650 29 75.31123 3700 31 75.84622 3750 31 76.37627 3800 31 76.87974 3850 31 77.36545 3900 31 77.85179 3950 31 78.39165 4000 31 78.87175 4050 31 79.37975 4100 31 79.88659 4150 31 80.37084 4200 31 80.87167 4250 32 81.35765 4300 31 81.8307 4350 33 82.3223 4400 33 82.81409 4450 32 83.28472 4500 33 83.76067 4550 33 84.21473 4600 33 84.69391 4650 33 85.15935 4700 33 85.60809 4750 33 86.07411 4800 33 86.555 4850 33 86.98887 4900 33 87.45633 4950 35 87.92222 5000 35 88.3382 10000 49 125.4369