Dosage Search [Part 2]

So since publishing the first post on this topic. I have researched this question.

First I need to acknowledge, that my CS friends have all pointed out the similarity between this question and the two egg puzzle.

You are given two eggs and placed in a 100 floor building. You need to find the highest floor where the egg could survive a drop.

If you only have one egg, then you obviously just go up one floor at a time. Eventually the egg breaks or you reach floor 100. In first case, the floor below is the answer. In the second, floor 100, since there are no higher floors.

If you have 10 eggs. You can just do binary search and find the right floor very quickly.

Since you have two eggs you can do better than one egg. The post I linked to earlier has a solution that tries to ensure the same number of drops as the worst case of binary search. Another approach is to drop the first egg on 2^n floor. So the first time would be from floor 2 then from floor 4, then from floor 8 and that way quickly climb up the building. This is similar to how network connections back off when a service is not responding. Retrying less frequently as the number of retries goes up. (This allows the service to restart without falling over from the bombardment of ever more frequent requests)

As similar as this egg approach is, it fails to capture something that I think is essential to the dosage question. That is that, there is a difference between making a mistake of dropping the egg the fifty floors too high and making a mistake by one floor. — Even if during the search you can’t tell, as that would be an obvious way to find the result in two tries.

So, I ran the code from the previous section and got some result for lists that had up to 1000 elements. However, python started to take too long for larger lists.

This prompted me to rewrite the functions into q. q and it’s sibling k, is an obscure language except to those in the financial services industry. It was created by a genius, Arthur Whitney, whose interview speaks for itself. The language itself is a derivative of APL and Lisp with a few more good ideas thrown in for good measure. (More on this language in a different post)

Anyway the same algorithm as before which in python took 30+ hours ran in 20 minutes in q.

The relative speed of the k code. Allowed me to see a pattern for the divisor. As the list grows longer, the divisor that we use to split the list should get larger, albeit slowly.

Data (More at the bottom)

Number of elements in list	divisor	Average cost
50	5	7.78
100	6	11.48
500	13	27.128
1000	17	38.85
4000	31	78.87175
5000	35	88.3382
10000	49	125.4369

Since the divisor wasn’t growing for lists that were similar in size, I realized the mistake I made by keeping the divisor constant. Obviously as the list gets shorter we need to update the divisor that we are using to divide the list, since the problem has been reduced to searching a smaller list.

This led me to create another type of binary search with an updating divisor. The divisor grows proportional to log(n)^3.

This also increased the speed of the search since I was no longer doing linear search on any list smaller than the divisor I was testing. To explain: if you have a static divisor, then when you start you have a list of 1 million elements. So you divide by 200 and you know if the item you are searching for is in the first range [0:5,000) or in the second range (5000: 1,000,000]. However, gradually the list gets smaller, but you keep dividing the list in this very uneven way, so that when the number of elements is less than 200, you keep looking at the first element. This is equivalent to linear search.

If instead we start to divide the list by smaller divisors, then we can get closer to binary search and since much of the list is eliminated our chances of making a huge mistake are also smaller.

Here is the q code with the two versions of search: (this might be unreadable to most people but I plan on having a tutorial series on k, q, kdb soon)

dif_cost: {:x-y};

trivial_cost: {x+y; :1};

scalar_cost: {x+y; :100};

/binary search with cost function

st: {[l;item;cf;d] cost:0; while [((count l) > 1); m: (count l) div d; $[l[m]<item; [cost+:1; l: (m+1)_l]; l[m]>item; [cost +: (cf[l[m]; item]); l:m#l]; :cost+:1];]; cost};

/calculates average cost for a particular divisor and n, by searching for each element

/then avg all those costs

st_avg: {[n; d] i: til n; res: @[i; i; st[i; ; dif_cost; d]]; avg res };

/iterates over divisors only going from 2 to 10 *( floor (log (n)))

/example for 100 it goes from 2 3 4 … 37 38 39

/ this covers the divisors that are relevant and minimizes the cost and removes unneeded computations

st_div: {[n] d: $[n<50; 2_til n; 2_til 10 * floor log n]; res: st_avg[n] each d; d[first where res = min res],min res}

/Algorithm with updating divisor

s_auto: {[l;item;cf, f] cost:0; while [((count l) > 1); d: max (2, floor (log (count l) xexp f) ); m: (count l) div d; $[l[m]<item; [cost+:1; l: (m+1)_l]; l[m]>item; [cost +: (cf[l[m]; item]); l:m#l]; :cost+:1];]; cost};

/Then we can minimize over f, which is the factor that we exponentiate the log N.

Data continued:

num	d	cost
50	5	7.78
100	6	11.48
150	7	14.32667
200	8	16.755
250	9	18.768
300	10	20.72667
350	11	22.49143
400	11	24.15
450	12	25.66
500	13	27.128
550	13	28.51455
600	13	29.85833
650	14	31.12
700	14	32.34143
750	14	33.50267
800	15	34.62875
850	15	35.75529
900	15	36.84667
950	16	37.84526
1000	17	38.85
1050	17	39.83143
1100	17	40.78818
1150	17	41.75652
1200	17	42.7125
1250	19	43.6
1300	19	44.46538
1350	19	45.35333
1400	19	46.18
1450	19	47.02966
1500	19	47.84467
1550	19	48.68581
1600	20	49.46
1650	21	50.25697
1700	21	51.01176
1750	21	51.79314
1800	21	52.54167
1850	21	53.27514
1900	22	54.02211
1950	22	54.73641
2000	22	55.4305
2050	23	56.16927
2100	23	56.82714
2150	23	57.52884
2200	23	58.20273
2250	23	58.88222
2300	23	59.55609
2350	25	60.20553
2400	25	60.85167
2450	25	61.47714
2500	25	62.0944
2550	25	62.74235
2600	25	63.33962
2650	25	64.00113
2700	25	64.59259
2750	26	65.21273
2800	27	65.84
2850	27	66.39614
2900	27	67.0331
2950	27	67.58475
3000	27	68.20133
3050	27	68.73803
3100	27	69.29355
3150	28	69.88635
3200	27	70.44688
3250	28	71.00954
3300	28	71.56636
3350	29	72.11164
3400	29	72.64853
3450	29	73.16986
3500	29	73.70571
3550	30	74.30901
3600	29	74.79778
3650	29	75.31123
3700	31	75.84622
3750	31	76.37627
3800	31	76.87974
3850	31	77.36545
3900	31	77.85179
3950	31	78.39165
4000	31	78.87175
4050	31	79.37975
4100	31	79.88659
4150	31	80.37084
4200	31	80.87167
4250	32	81.35765
4300	31	81.8307
4350	33	82.3223
4400	33	82.81409
4450	32	83.28472
4500	33	83.76067
4550	33	84.21473
4600	33	84.69391
4650	33	85.15935
4700	33	85.60809
4750	33	86.07411
4800	33	86.555
4850	33	86.98887
4900	33	87.45633
4950	35	87.92222
5000	35	88.3382
10000	49	125.4369

	pt.instafollowfast.c… on Backtracking In Q/ word l…
	Alex on The Is-Ought Distinction
	Fred Mleczko on How the Mighty have Fallen, or…
	pindash91 on A Better Way To Load Data into…
	Andrew Stanton (@and… on A Better Way To Load Data into…

iabdb

It's All Been Done Before.

Dosage Search [Part 2]

One thought on “Dosage Search [Part 2]”

Leave a comment Cancel reply

Share this:

Related

One thought on “Dosage Search [Part 2]”

Leave a comment Cancel reply