I am taking a brief break from the tutorial series (I’m down the rabbit hole on adverbs and I haven’t sneaked/snuck back out).
While on the Aquaq Group I got the following message: (check out the opensource torq)
I’m trying to build a table that looks like
now y2
———————————-
1 8 1 9
2 5 6 5 13 7
This is a table with list of float as a column type. I manage to get something by doing the following
table:([] now:enlist `float$(); y2:enlist `float$())
.[`table;();,;((1;20;1);(1;7;9))]
.[`table;();,;((1;8);(1;9))]
I get
now y2
—————————————–
`float$() `float$()
1 20 1 1 7 9
1 8 1 9
Is there a way to create the table without the `float$()? Thanks.
I came up with some solutions:
The first just load the data in from your set
q)now: 2.0+til 4
q)y2:1.0*til 4
flip `now`y2!(enlist now; enlist y2)
Or initialize it the way it was done in the question and then immediately delete everything.
q)delete from `table where i>0
But the question got me thinking, that there was something wrong with the data model.
It seems unlikely that the best way to handle data is with lists of lists especially in q.
The reason is simple. simple lists in q/k are really fast. they are contiguous blocks of memory. A list of lists are essentially a list of pointers to smaller contiguous blocks. So to test this I created the data-structure that I think would accomplish everything the questioner wanted:
First let’s create some dummy data and put it into the form of the questioner
q) x: ?[;1.0] each 1+100?9
q)y: ?[;1.0] each 1+100?9
?[;1.0] each 1+100?9 this code is two blocks. The first ?[;1.0] is a function that will return a list of random floats between 0 and 1 based on the number.
For example (til 10)!(10?1.0) This is just a dictionary where the key is a number from 1 to 10 and the value is the list of random floats.
0| 0.50994288921356201 1| 0.90591901540756226 2| 0.033651269972324371 3| 0.35632792115211487 4| 0.93293070793151855 5| 0.40784171223640442 6| 0.84914475679397583 7| 0.95187389850616455 8| 0.30074122548103333 9| 0.42668652534484863
The second part 1+100?9 create a 100 element list of random numbers between 1-9.
The each adverb applies the first function (which is a projection) onto each of the elements in this list.
Which gives us a list of lists of floats.
We can see this easily by applying the verbs count and type.
q) count x
100
q)count each x
1 1 4 9 6 4 7 2 1 8 6 9 4 5 7 1 2 6 4 7 8 4 8 2 2 9 4 5 9 2 9 6 3 5 1 2 9 6 2.. /random numbers in fact the same ones generated by the original command 1+100?9
q)(count each) each x
,1
,1
1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1
..
Similarly we can do the same with type
q) type x
0h /a mixed list
q)type each x
9 9 9 9 9 9 9 9 9 9 9 .. /the type for a list of floats
q)(count each) each x
-9 -9 -9 -9 -9 -9 -9 -9h
-9 -9 -9h
-9 -9 -9h
-9 -9 -9 -9 -9 -9 -9h
-9 -9 -9 -9h
-9 -9 -9 -9 -9 -9 -9h
-9 -9 -9 -9 -9 -9 -9h
-9 -9 -9 -9 -9 -9h
-9 -9 -9h
-9 -9 -9 -9 -9h
.. /the type for an atom float. (remember negative means atom)
So to create the table that the question was asking about we simply use table notation:
q)table:([]x;y)
x y --------------------------------------------------------------------------------------------------------------------------------------------------------------- ,0.069488948909565806 ,0.61275040498003364 ,0.18058245000429451 0.074964647181332111 0.21553308074362576 0.98548699356615543 0.09629874653182.. 0.62432846496812999 0.34225802053697407 0.16392032452858984 0.7786109980661422 0.49375488748773932 0.48304210440255702 0.49691119533963501 0.259749417193233.. 0.53403982776217163 0.88706594565883279 0.81696920236572623 0.530728749930858.. 0.04861030331812799 0.13365717371925712 0.99836521386168897 0.152357474202290.. 0.36319364630617201 0.15776891494169831 0.45724554779008031 0.974702587584033.. 0.87366164615377784 0.31437360099516809 0.12580484454520047 0.635657876031473.. 0.56634279619902372 0.87585472338832915 0.78419443848542869 0.83294638991355896 0.78629159089177847 0.26546700927428901 0.12905376916751266 0.761838810984045.. 0.087665317580103874 0.23685944243334234 0.043390074744820595 0.8760851731058.. 0.10706076468341053 0.85431159776635468 0.012844447279348969 0.32782130455598235 0.72754358011297882 0.89185469760559499 0.651039635296911 0.48640229227021337 0.68811673740856349.. ,0.5765261803753674 ,0.28950302815064788
Done.
Now to make the structure that we want to make, we need to turn these lists into one list. That’s easily done with a verb raze. However, we will lose all information about the list of lists. It just flattens the whole thing.
q) raze x
0.69158544205129147 0.92801779927685857 0.95605599298141897 0.909228001954033..
So we need to zip up each element with it’s row number.
We can easily zip each row and row number like this:
q) (til count x),’x
(0;0.069488948909565806)
(1;0.18058245000429451)
(2;0.62432846496812999;0.34225802053697407;0.16392032452858984;0.778610998066..
(3;0.53403982776217163;0.88706594565883279;0.81696920236572623;0.530728749930..
(4;0.36319364630617201;0.15776891494169831;0.45724554779008031;0.974702587584..
(5;0.56634279619902372;0.87585472338832915;0.78419443848542869;0.832946389913..
(6;0.087665317580103874;0.23685944243334234;0.043390074744820595;0.8760851731..
(7;0.32782130455598235;0.72754358011297882)
(8;0.5765261803753674)
..
What we really want is to zip up each item with it’s row number.
q) (til count x),”x
,(0;0.069488948909565806)
,(1;0.18058245000429451)
((2;0.62432846496812999);(2;0.34225802053697407);(2;0.16392032452858984);(2;0..
((3;0.53403982776217163);(3;0.88706594565883279);(3;0.81696920236572623);(3;0..
((4;0.36319364630617201);(4;0.15776891494169831);(4;0.45724554779008031);(4;0..
((5;0.56634279619902372);(5;0.87585472338832915);(5;0.78419443848542869);(5;0..
((6;0.087665317580103874);(6;0.23685944243334234);(6;0.043390074744820595);(6..
((7;0.32782130455598235);(7;0.72754358011297882))
,(8;0.5765261803753674)
.. /notice that the row number is now next to each element
Now we can simply raze this and get
q) raze (til count x),”x
0 0.069488948909565806
1 0.18058245000429451
2 0.62432846496812999
2 0.34225802053697407
2 0.16392032452858984
2 0.7786109980661422
3 0.53403982776217163
3 0.88706594565883279
3 0.81696920236572623
..
Because we need to account for the fact that these list can have variable size, it’s best if we give them also an element number this will also be necessary for creating the table.
Remembering that we can find the element number by using the idiom til count we add the elements place in the list with the following
q) (til each count each x),”x
,(0;0.069488948909565806)
,(0;0.18058245000429451)
((0;0.62432846496812999);(1;0.34225802053697407);(2;0.16392032452858984);(3;0..
((0;0.53403982776217163);(1;0.88706594565883279);(2;0.81696920236572623);(3;0..
((0;0.36319364630617201);(1;0.15776891494169831);(2;0.45724554779008031);(3;0..
((0;0.56634279619902372);(1;0.87585472338832915);(2;0.78419443848542869);(3;0..
((0;0.087665317580103874);(1;0.23685944243334234);(2;0.043390074744820595);(3..
((0;0.32782130455598235);(1;0.72754358011297882))
,(0;0.5765261803753674)
If we add back our row numbers we get
q)(til count x),”(til each count each x),”x
,(0;0;0.069488948909565806)
,(1;0;0.18058245000429451)
((2;0;0.62432846496812999);(2;1;0.34225802053697407);(2;2;0.16392032452858984..
((3;0;0.53403982776217163);(3;1;0.88706594565883279);(3;2;0.81696920236572623..
((4;0;0.36319364630617201);(4;1;0.15776891494169831);(4;2;0.45724554779008031..
((5;0;0.56634279619902372);(5;1;0.87585472338832915);(5;2;0.78419443848542869..
((6;0;0.087665317580103874);(6;1;0.23685944243334234);(6;2;0.0433900747448205..
((7;0;0.32782130455598235);(7;1;0.72754358011297882))
,(8;0;0.5765261803753674)
.. /the first number is the row number, the second the element number and finally the element itself
If we now raze this entire mess. We get a list of lists of three elements.
q)raze (til count x),”(til each count each x),”x
0 0 0.069488948909565806
1 0 0.18058245000429451
2 0 0.62432846496812999
2 1 0.34225802053697407
2 2 0.16392032452858984
2 3 0.7786109980661422
3 0 0.53403982776217163
..
Now we simply create a table from this normalized list. let’s call it xn
q)xn:raze (til count x),”(til each count each x),”x
q)tx:([rid:xn[;0];eid:xn[;1]]x:xn[;2])
q) tx
rid eid| x -------| -------------------- 0 0 | 0.069488948909565806 1 0 | 0.18058245000429451 2 0 | 0.62432846496812999 2 1 | 0.34225802053697407 2 2 | 0.16392032452858984 2 3 | 0.7786109980661422 3 0 | 0.53403982776217163 3 1 | 0.88706594565883279 3 2 | 0.81696920236572623 3 3 | 0.53072874993085861 3 4 | 0.49852972221560776 3 5 | 0.081481670029461384
Let’s pretend this is a cooking show and we have done all the previous steps to y.
Here is our y table
rid eid| y -------| -------------------- 0 0 | 0.61275040498003364 1 0 | 0.074964647181332111 1 1 | 0.21553308074362576 1 2 | 0.98548699356615543 1 3 | 0.096298746531829238 1 4 | 0.60473616444505751 1 5 | 0.98754768050275743 1 6 | 0.46964235557243228 2 0 | 0.49375488748773932 2 1 | 0.48304210440255702 2 2 | 0.49691119533963501
Now we simply need to join these two tables. Q makes this really easy with union join (uj) this will automatically join on the key columns if they are the same and add nulls where the data is missing.
q)t:tx uj ty
Some pretty clever stuff happens underneath. First the row ids are matched up and then the element ids are matched the columns will be matched up. If the second table has missing data it will be filled in with missing values. If one of the keys is missing in the first table then the second table rows will be appended to the end.
t now looks like:
rid eid| x y -------| ----------------------------------------- 0 0 | 0.069488948909565806 0.61275040498003364 1 0 | 0.18058245000429451 0.074964647181332111 2 0 | 0.62432846496812999 0.49375488748773932 2 1 | 0.34225802053697407 0.48304210440255702 2 2 | 0.16392032452858984 0.49691119533963501 2 3 | 0.7786109980661422 0.25974941719323397 3 0 | 0.53403982776217163 0.04861030331812799 3 1 | 0.88706594565883279 0.13365717371925712 3 2 | 0.81696920236572623 0.99836521386168897 3 3 | 0.53072874993085861 0.15235747420229018 3 4 | 0.49852972221560776 0.55927985697053373 3 5 | 0.081481670029461384 0.5587915435899049 3 6 | 0.022724361391738057 0.73704647365957499 3 7 | 0.99410609365440905 0.14246862730942667 3 8 | 0.70526488753966987
Because Q will append the missing values from the second table to the end we need to resort the rid column so that we can easily find the rows that go together as in the original structure. This also explains why the only missing rows appear in the y column. Lets sort the rid.
q) `rid xasc `t
rid eid| x y -------| ----------------------------------------- 0 0 | 0.069488948909565806 0.61275040498003364 1 0 | 0.18058245000429451 0.074964647181332111 1 1 | 0.21553308074362576 1 2 | 0.98548699356615543 1 3 | 0.096298746531829238 1 4 | 0.60473616444505751 1 5 | 0.98754768050275743 1 6 | 0.46964235557243228 2 0 | 0.62432846496812999 0.49375488748773932 2 1 | 0.34225802053697407 0.48304210440255702 2 2 | 0.16392032452858984 0.49691119533963501 2 3 | 0.7786109980661422 0.25974941719323397 2 4 | 0.4275001569185406 2 5 | 0.032995089888572693 2 6 | 0.83006249205209315 3 0 | 0.53403982776217163 0.04861030331812799 3 1 | 0.88706594565883279 0.13365717371925712 3 2 | 0.81696920236572623 0.99836521386168897 3 3 | 0.53072874993085861 0.15235747420229018 3 4 | 0.49852972221560776 0.55927985697053373 3 5 | 0.081481670029461384 0.5587915435899049 3 6 | 0.022724361391738057 0.73704647365957499 3 7 | 0.99410609365440905 0.14246862730942667 3 8 | 0.70526488753966987
We are done. We have preserved all the information of the original and normalized the Xs and Ys into a single list.
The original table
here is row 6 two different ways:
q) table 6
x| 0.1797088 0.8024157 0.5423326 y| 0.1165928 0.3872393 0.850714 0.04278559 0.6582815
q) select from table where i=6
x y ------------------------------------------------------------------------------- 0.1797088 0.8024157 0.5423326 0.1165928 0.3872393 0.850714 0.04278559 0.6582815
In the new table we can get it easily only through sql select
q)select from t where rid=6
rid eid| x y -------| -------------------- 6 0 | 0.1797088 0.1165928 6 1 | 0.8024157 0.3872393 6 2 | 0.5423326 0.850714 6 3 | 0.04278559 6 4 | 0.6582815
Getting back to the original table from this new refactored table is much easier:
q)select x:enlist x, y:enlist y by rid from t
rid| x y ---| ----------------------------------------------------------------------------------------------------- 0 | 0.09087531 0.6808 0.03380431 0.8703531 0.8429108 0.4163949 1 | 0.4949937 0.5467487 0.5798376 2 | 0.3858707 0.2949814 0.8767788 0.1909843 0.6298909 0.2454277 3 | 0.4772341 0.3541479 0.3746654 0.1032089 0.2343539 0.5393872 0.5799004 4 | 0.5165465 0.1865312 0.8763466 0.4432449 0.3670384 5 | 0.6212155 0.6579 0.163703 0.8951596 0.1463084 6 | 0.1797088 0.8024157 0.5423326 0.1165928 0.3872393 0.850714 0.04278559 0.6582815 7 | 0.7720778 0.7476806 0.8778866 0.9120672 0.955969 0.3724551 0.7092033 0.5509018 0.180806 8 | 0.5743992 0.01259725 0.7083142 0.3574173 0.06719808 0.9434847 0.60344
However, our new vectorized table allows us to find data about these individual values and we can use it to zip up x and y values that might belong together or do any other fun calculations.