So since publishing the first post on this topic. I have researched this question.
First I need to acknowledge, that my CS friends have all pointed out the similarity between this question and the two egg puzzle.
You are given two eggs and placed in a 100 floor building. You need to find the highest floor where the egg could survive a drop.
If you only have one egg, then you obviously just go up one floor at a time. Eventually the egg breaks or you reach floor 100. In first case, the floor below is the answer. In the second, floor 100, since there are no higher floors.
If you have 10 eggs. You can just do binary search and find the right floor very quickly.
Since you have two eggs you can do better than one egg. The post I linked to earlier has a solution that tries to ensure the same number of drops as the worst case of binary search. Another approach is to drop the first egg on 2^n floor. So the first time would be from floor 2 then from floor 4, then from floor 8 and that way quickly climb up the building. This is similar to how network connections back off when a service is not responding. Retrying less frequently as the number of retries goes up. (This allows the service to restart without falling over from the bombardment of ever more frequent requests)
As similar as this egg approach is, it fails to capture something that I think is essential to the dosage question. That is that, there is a difference between making a mistake of dropping the egg the fifty floors too high and making a mistake by one floor. — Even if during the search you can’t tell, as that would be an obvious way to find the result in two tries.
So, I ran the code from the previous section and got some result for lists that had up to 1000 elements. However, python started to take too long for larger lists.
This prompted me to rewrite the functions into q. q and it’s sibling k, is an obscure language except to those in the financial services industry. It was created by a genius, Arthur Whitney, whose interview speaks for itself. The language itself is a derivative of APL and Lisp with a few more good ideas thrown in for good measure. (More on this language in a different post)
Anyway the same algorithm as before which in python took 30+ hours ran in 20 minutes in q.
The relative speed of the k code. Allowed me to see a pattern for the divisor. As the list grows longer, the divisor that we use to split the list should get larger, albeit slowly.
Number of elements in list | divisor | Average cost |
50 | 5 | 7.78 |
100 | 6 | 11.48 |
500 | 13 | 27.128 |
1000 | 17 | 38.85 |
4000 | 31 | 78.87175 |
5000 | 35 | 88.3382 |
10000 | 49 | 125.4369 |
Since the divisor wasn’t growing for lists that were similar in size, I realized the mistake I made by keeping the divisor constant. Obviously as the list gets shorter we need to update the divisor that we are using to divide the list, since the problem has been reduced to searching a smaller list.
This led me to create another type of binary search with an updating divisor. The divisor grows proportional to log(n)^3.
This also increased the speed of the search since I was no longer doing linear search on any list smaller than the divisor I was testing. To explain: if you have a static divisor, then when you start you have a list of 1 million elements. So you divide by 200 and you know if the item you are searching for is in the first range [0:5,000) or in the second range (5000: 1,000,000]. However, gradually the list gets smaller, but you keep dividing the list in this very uneven way, so that when the number of elements is less than 200, you keep looking at the first element. This is equivalent to linear search.
If instead we start to divide the list by smaller divisors, then we can get closer to binary search and since much of the list is eliminated our chances of making a huge mistake are also smaller.
Here is the q code with the two versions of search: (this might be unreadable to most people but I plan on having a tutorial series on k, q, kdb soon)
/Algorithm with updating divisor
s_auto: {[l;item;cf, f] cost:0; while [((count l) > 1); d: max (2, floor (log (count l) xexp f) ); m: (count l) div d; $[l[m]<item; [cost+:1; l: (m+1)_l]; l[m]>item; [cost +: (cf[l[m]; item]); l:m#l]; :cost+:1];]; cost};
/Then we can minimize over f, which is the factor that we exponentiate the log N.
Data continued:
num | d | cost |
50 | 5 | 7.78 |
100 | 6 | 11.48 |
150 | 7 | 14.32667 |
200 | 8 | 16.755 |
250 | 9 | 18.768 |
300 | 10 | 20.72667 |
350 | 11 | 22.49143 |
400 | 11 | 24.15 |
450 | 12 | 25.66 |
500 | 13 | 27.128 |
550 | 13 | 28.51455 |
600 | 13 | 29.85833 |
650 | 14 | 31.12 |
700 | 14 | 32.34143 |
750 | 14 | 33.50267 |
800 | 15 | 34.62875 |
850 | 15 | 35.75529 |
900 | 15 | 36.84667 |
950 | 16 | 37.84526 |
1000 | 17 | 38.85 |
1050 | 17 | 39.83143 |
1100 | 17 | 40.78818 |
1150 | 17 | 41.75652 |
1200 | 17 | 42.7125 |
1250 | 19 | 43.6 |
1300 | 19 | 44.46538 |
1350 | 19 | 45.35333 |
1400 | 19 | 46.18 |
1450 | 19 | 47.02966 |
1500 | 19 | 47.84467 |
1550 | 19 | 48.68581 |
1600 | 20 | 49.46 |
1650 | 21 | 50.25697 |
1700 | 21 | 51.01176 |
1750 | 21 | 51.79314 |
1800 | 21 | 52.54167 |
1850 | 21 | 53.27514 |
1900 | 22 | 54.02211 |
1950 | 22 | 54.73641 |
2000 | 22 | 55.4305 |
2050 | 23 | 56.16927 |
2100 | 23 | 56.82714 |
2150 | 23 | 57.52884 |
2200 | 23 | 58.20273 |
2250 | 23 | 58.88222 |
2300 | 23 | 59.55609 |
2350 | 25 | 60.20553 |
2400 | 25 | 60.85167 |
2450 | 25 | 61.47714 |
2500 | 25 | 62.0944 |
2550 | 25 | 62.74235 |
2600 | 25 | 63.33962 |
2650 | 25 | 64.00113 |
2700 | 25 | 64.59259 |
2750 | 26 | 65.21273 |
2800 | 27 | 65.84 |
2850 | 27 | 66.39614 |
2900 | 27 | 67.0331 |
2950 | 27 | 67.58475 |
3000 | 27 | 68.20133 |
3050 | 27 | 68.73803 |
3100 | 27 | 69.29355 |
3150 | 28 | 69.88635 |
3200 | 27 | 70.44688 |
3250 | 28 | 71.00954 |
3300 | 28 | 71.56636 |
3350 | 29 | 72.11164 |
3400 | 29 | 72.64853 |
3450 | 29 | 73.16986 |
3500 | 29 | 73.70571 |
3550 | 30 | 74.30901 |
3600 | 29 | 74.79778 |
3650 | 29 | 75.31123 |
3700 | 31 | 75.84622 |
3750 | 31 | 76.37627 |
3800 | 31 | 76.87974 |
3850 | 31 | 77.36545 |
3900 | 31 | 77.85179 |
3950 | 31 | 78.39165 |
4000 | 31 | 78.87175 |
4050 | 31 | 79.37975 |
4100 | 31 | 79.88659 |
4150 | 31 | 80.37084 |
4200 | 31 | 80.87167 |
4250 | 32 | 81.35765 |
4300 | 31 | 81.8307 |
4350 | 33 | 82.3223 |
4400 | 33 | 82.81409 |
4450 | 32 | 83.28472 |
4500 | 33 | 83.76067 |
4550 | 33 | 84.21473 |
4600 | 33 | 84.69391 |
4650 | 33 | 85.15935 |
4700 | 33 | 85.60809 |
4750 | 33 | 86.07411 |
4800 | 33 | 86.555 |
4850 | 33 | 86.98887 |
4900 | 33 | 87.45633 |
4950 | 35 | 87.92222 |
5000 | 35 | 88.3382 |
10000 | 49 | 125.4369 |
Pingback: Dosage Search [Part 1] | iabdb