Tables full of Lists or Not

I am taking a brief break from the tutorial series (I’m down the rabbit hole on adverbs and I haven’t sneaked/snuck back out).

While on the Aquaq Group I got the following message: (check out the opensource torq)

I’m trying to build a table that looks like

now                   y2
———————————-
1 8                     1 9
2 5 6                  5 13 7

This is a table with list of float as a column type. I manage to get something by doing the following

table:([] now:enlist `float$(); y2:enlist `float$())
.[`table;();,;((1;20;1);(1;7;9))]
.[`table;();,;((1;8);(1;9))]

I get

now                   y2
—————————————–
`float$()             `float$()
1 20 1               1 7 9
1 8                    1 9

Is there a way to create the table without the `float$()? Thanks.
I came up with some solutions:

The first just load the data in from your set

q)now: 2.0+til 4

q)y2:1.0*til 4

flip `now`y2!(enlist now; enlist y2)

Or initialize it the way it was done in the question and then immediately delete everything.

q)delete from `table where i>0

But the question got me thinking, that there was something wrong with the data model.

It seems unlikely that the best way to handle data is with lists of lists especially in q.

The reason is simple. simple lists in q/k are really fast. they are contiguous blocks of memory. A list of lists are essentially a list of pointers to smaller contiguous blocks. So to test this I created the data-structure that I think would accomplish everything the questioner wanted:

First let’s create some dummy data and put it into the form of the questioner

q) x: ?[;1.0] each 1+100?9
q)y: ?[;1.0] each 1+100?9

?[;1.0] each 1+100?9 this code is two blocks. The first ?[;1.0] is a function that will return a list of random floats between 0 and 1 based on the number.

For example (til 10)!(10?1.0) This is just a dictionary where the key is a number from 1 to 10 and the value is the list of random floats.

0| 0.50994288921356201
1| 0.90591901540756226
2| 0.033651269972324371
3| 0.35632792115211487
4| 0.93293070793151855
5| 0.40784171223640442
6| 0.84914475679397583
7| 0.95187389850616455
8| 0.30074122548103333
9| 0.42668652534484863

The second part 1+100?9 create a 100 element list of random numbers between 1-9.

The each adverb applies the first function (which is a projection) onto each of the elements in this list.

Which gives us a list of lists of floats.

We can see this easily by applying the verbs count and type.

q) count x
100
q)count each x
1 1 4 9 6 4 7 2 1 8 6 9 4 5 7 1 2 6 4 7 8 4 8 2 2 9 4 5 9 2 9 6 3 5 1 2 9 6 2.. /random numbers in fact the same ones generated by the original command 1+100?9
q)(count each) each x
,1
,1
1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1
..

Similarly we can do the same with type

q) type x
0h /a mixed list
q)type each x
9 9 9 9 9 9 9 9 9 9 9 .. /the type for a list of floats
q)(count each) each x
-9 -9 -9 -9 -9 -9 -9 -9h
-9 -9 -9h
-9 -9 -9h
-9 -9 -9 -9 -9 -9 -9h
-9 -9 -9 -9h
-9 -9 -9 -9 -9 -9 -9h
-9 -9 -9 -9 -9 -9 -9h
-9 -9 -9 -9 -9 -9h
-9 -9 -9h
-9 -9 -9 -9 -9h
.. /the type for an atom float. (remember negative means atom)

So to create the table that the question was asking about we simply use table notation:

q)table:([]x;y)

x                                                                               y                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------
,0.069488948909565806                                                           ,0.61275040498003364                                                           
,0.18058245000429451                                                            0.074964647181332111 0.21553308074362576 0.98548699356615543 0.09629874653182..
0.62432846496812999 0.34225802053697407 0.16392032452858984 0.7786109980661422  0.49375488748773932 0.48304210440255702 0.49691119533963501 0.259749417193233..
0.53403982776217163 0.88706594565883279 0.81696920236572623 0.530728749930858.. 0.04861030331812799 0.13365717371925712 0.99836521386168897 0.152357474202290..
0.36319364630617201 0.15776891494169831 0.45724554779008031 0.974702587584033.. 0.87366164615377784 0.31437360099516809 0.12580484454520047 0.635657876031473..
0.56634279619902372 0.87585472338832915 0.78419443848542869 0.83294638991355896 0.78629159089177847 0.26546700927428901 0.12905376916751266 0.761838810984045..
0.087665317580103874 0.23685944243334234 0.043390074744820595 0.8760851731058.. 0.10706076468341053 0.85431159776635468 0.012844447279348969                   
0.32782130455598235 0.72754358011297882                                         0.89185469760559499 0.651039635296911 0.48640229227021337 0.68811673740856349..
,0.5765261803753674                                                             ,0.28950302815064788

Done.

Now to make the structure that we want to make, we need to turn these lists into one list. That’s easily done with a verb raze. However, we will lose all information about the list of lists. It just flattens the whole thing.

q) raze x
0.69158544205129147 0.92801779927685857 0.95605599298141897 0.909228001954033..

So we need to zip up each element with it’s row number.

We can easily zip each row and row number like this:

q) (til count x),’x
(0;0.069488948909565806)
(1;0.18058245000429451)
(2;0.62432846496812999;0.34225802053697407;0.16392032452858984;0.778610998066..
(3;0.53403982776217163;0.88706594565883279;0.81696920236572623;0.530728749930..
(4;0.36319364630617201;0.15776891494169831;0.45724554779008031;0.974702587584..
(5;0.56634279619902372;0.87585472338832915;0.78419443848542869;0.832946389913..
(6;0.087665317580103874;0.23685944243334234;0.043390074744820595;0.8760851731..
(7;0.32782130455598235;0.72754358011297882)
(8;0.5765261803753674)
..
What we really want is to zip up each item with it’s row number.
q) (til count x),”x
,(0;0.069488948909565806)
,(1;0.18058245000429451)
((2;0.62432846496812999);(2;0.34225802053697407);(2;0.16392032452858984);(2;0..
((3;0.53403982776217163);(3;0.88706594565883279);(3;0.81696920236572623);(3;0..
((4;0.36319364630617201);(4;0.15776891494169831);(4;0.45724554779008031);(4;0..
((5;0.56634279619902372);(5;0.87585472338832915);(5;0.78419443848542869);(5;0..
((6;0.087665317580103874);(6;0.23685944243334234);(6;0.043390074744820595);(6..
((7;0.32782130455598235);(7;0.72754358011297882))
,(8;0.5765261803753674)
.. /notice that the row number is now next to each element

Now we can simply raze this and get

q) raze (til count x),”x
0 0.069488948909565806
1 0.18058245000429451
2 0.62432846496812999
2 0.34225802053697407
2 0.16392032452858984
2 0.7786109980661422
3 0.53403982776217163
3 0.88706594565883279
3 0.81696920236572623
..

Because we need to account for the fact that these list can have variable size, it’s best if we give them also an element number this will also be necessary for creating the table.

Remembering that we can find the element number by using the idiom til count we add the elements place in the list with the following

q) (til each count each x),”x
,(0;0.069488948909565806)
,(0;0.18058245000429451)
((0;0.62432846496812999);(1;0.34225802053697407);(2;0.16392032452858984);(3;0..
((0;0.53403982776217163);(1;0.88706594565883279);(2;0.81696920236572623);(3;0..
((0;0.36319364630617201);(1;0.15776891494169831);(2;0.45724554779008031);(3;0..
((0;0.56634279619902372);(1;0.87585472338832915);(2;0.78419443848542869);(3;0..
((0;0.087665317580103874);(1;0.23685944243334234);(2;0.043390074744820595);(3..
((0;0.32782130455598235);(1;0.72754358011297882))
,(0;0.5765261803753674)

If we add back our row numbers we get

q)(til count x),”(til each count each x),”x
,(0;0;0.069488948909565806)
,(1;0;0.18058245000429451)
((2;0;0.62432846496812999);(2;1;0.34225802053697407);(2;2;0.16392032452858984..
((3;0;0.53403982776217163);(3;1;0.88706594565883279);(3;2;0.81696920236572623..
((4;0;0.36319364630617201);(4;1;0.15776891494169831);(4;2;0.45724554779008031..
((5;0;0.56634279619902372);(5;1;0.87585472338832915);(5;2;0.78419443848542869..
((6;0;0.087665317580103874);(6;1;0.23685944243334234);(6;2;0.0433900747448205..
((7;0;0.32782130455598235);(7;1;0.72754358011297882))
,(8;0;0.5765261803753674)
.. /the first number is the row number, the second the element number and finally the element itself

If we now raze this entire mess. We get a list of lists of three elements.

q)raze (til count x),”(til each count each x),”x
0 0 0.069488948909565806
1 0 0.18058245000429451
2 0 0.62432846496812999
2 1 0.34225802053697407
2 2 0.16392032452858984
2 3 0.7786109980661422
3 0 0.53403982776217163
..

Now we simply create a table from this normalized list. let’s call it xn

q)xn:raze (til count x),”(til each count each x),”x
q)tx:([rid:xn[;0];eid:xn[;1]]x:xn[;2])
q) tx

rid eid| x                   
-------| --------------------
0   0  | 0.069488948909565806
1   0  | 0.18058245000429451 
2   0  | 0.62432846496812999 
2   1  | 0.34225802053697407 
2   2  | 0.16392032452858984 
2   3  | 0.7786109980661422  
3   0  | 0.53403982776217163 
3   1  | 0.88706594565883279 
3   2  | 0.81696920236572623 
3   3  | 0.53072874993085861 
3   4  | 0.49852972221560776 
3   5  | 0.081481670029461384

Let’s pretend this is a cooking show and we have done all the previous steps to y.

Here is our y table

rid eid| y                   
-------| --------------------
0   0  | 0.61275040498003364 
1   0  | 0.074964647181332111
1   1  | 0.21553308074362576 
1   2  | 0.98548699356615543 
1   3  | 0.096298746531829238
1   4  | 0.60473616444505751 
1   5  | 0.98754768050275743 
1   6  | 0.46964235557243228 
2   0  | 0.49375488748773932 
2   1  | 0.48304210440255702 
2   2  | 0.49691119533963501

Now we simply need to join these two tables. Q makes this really easy with union join (uj) this will automatically join on the key columns if they are the same and add nulls where the data is missing.

q)t:tx uj ty
Some pretty clever stuff happens underneath. First the row ids are matched up and then the element ids are matched the columns will be matched up. If the second table has missing data it will be filled in with missing values. If one of the keys is missing in the first table then the second table rows will be appended to the end.

t now looks like:

rid eid| x                    y                   
-------| -----------------------------------------
0   0  | 0.069488948909565806 0.61275040498003364 
1   0  | 0.18058245000429451  0.074964647181332111
2   0  | 0.62432846496812999  0.49375488748773932 
2   1  | 0.34225802053697407  0.48304210440255702 
2   2  | 0.16392032452858984  0.49691119533963501 
2   3  | 0.7786109980661422   0.25974941719323397 
3   0  | 0.53403982776217163  0.04861030331812799 
3   1  | 0.88706594565883279  0.13365717371925712 
3   2  | 0.81696920236572623  0.99836521386168897 
3   3  | 0.53072874993085861  0.15235747420229018 
3   4  | 0.49852972221560776  0.55927985697053373 
3   5  | 0.081481670029461384 0.5587915435899049  
3   6  | 0.022724361391738057 0.73704647365957499 
3   7  | 0.99410609365440905  0.14246862730942667 
3   8  | 0.70526488753966987

Because Q will append the missing values from the second table to the end we need to resort the rid column so that we can easily find the rows that go together as in the original structure. This also explains why the only missing rows appear in the y column. Lets sort the rid.

q) `rid xasc `t

rid eid| x                    y                   
-------| -----------------------------------------
0   0  | 0.069488948909565806 0.61275040498003364 
1   0  | 0.18058245000429451  0.074964647181332111
1   1  |                      0.21553308074362576 
1   2  |                      0.98548699356615543 
1   3  |                      0.096298746531829238
1   4  |                      0.60473616444505751 
1   5  |                      0.98754768050275743 
1   6  |                      0.46964235557243228 
2   0  | 0.62432846496812999  0.49375488748773932 
2   1  | 0.34225802053697407  0.48304210440255702 
2   2  | 0.16392032452858984  0.49691119533963501 
2   3  | 0.7786109980661422   0.25974941719323397 
2   4  |                      0.4275001569185406  
2   5  |                      0.032995089888572693
2   6  |                      0.83006249205209315 
3   0  | 0.53403982776217163  0.04861030331812799 
3   1  | 0.88706594565883279  0.13365717371925712 
3   2  | 0.81696920236572623  0.99836521386168897 
3   3  | 0.53072874993085861  0.15235747420229018 
3   4  | 0.49852972221560776  0.55927985697053373 
3   5  | 0.081481670029461384 0.5587915435899049  
3   6  | 0.022724361391738057 0.73704647365957499 
3   7  | 0.99410609365440905  0.14246862730942667 
3   8  | 0.70526488753966987

We are done. We have preserved all the information of the original and normalized the Xs and Ys into a single list.

The original table

here is row 6 two different ways:

q) table 6

x| 0.1797088 0.8024157 0.5423326
y| 0.1165928 0.3872393 0.850714 0.04278559 0.6582815

q) select from table where i=6

x                             y                                                
-------------------------------------------------------------------------------
0.1797088 0.8024157 0.5423326 0.1165928 0.3872393 0.850714 0.04278559 0.6582815

In the new table we can get it easily only through sql select

q)select from t where rid=6

rid eid| x         y         
-------| --------------------
6   0  | 0.1797088 0.1165928 
6   1  | 0.8024157 0.3872393 
6   2  | 0.5423326 0.850714  
6   3  |           0.04278559
6   4  |           0.6582815

Getting back to the original table from this new refactored table is much easier:

q)select x:enlist x, y:enlist y by rid from t

rid| x                                                   y                                                
---| -----------------------------------------------------------------------------------------------------
0  | 0.09087531 0.6808 0.03380431 0.8703531 0.8429108    0.4163949                                        
1  | 0.4949937                                           0.5467487 0.5798376                              
2  | 0.3858707 0.2949814 0.8767788                       0.1909843 0.6298909 0.2454277                    
3  | 0.4772341 0.3541479                                 0.3746654 0.1032089 0.2343539 0.5393872 0.5799004
4  | 0.5165465 0.1865312 0.8763466                       0.4432449 0.3670384                              
5  | 0.6212155 0.6579                                    0.163703 0.8951596 0.1463084                     
6  | 0.1797088 0.8024157 0.5423326                       0.1165928 0.3872393 0.850714 0.04278559 0.6582815
7  | 0.7720778 0.7476806 0.8778866 0.9120672             0.955969 0.3724551 0.7092033 0.5509018 0.180806  
8  | 0.5743992 0.01259725 0.7083142                      0.3574173 0.06719808 0.9434847 0.60344

However, our new vectorized table allows us to find data about these individual values and we can use it to zip up x and y values that might belong together or do any other fun calculations.

	pt.instafollowfast.c… on Backtracking In Q/ word l…
	Alex on The Is-Ought Distinction
	Fred Mleczko on How the Mighty have Fallen, or…
	pindash91 on A Better Way To Load Data into…
	Andrew Stanton (@and… on A Better Way To Load Data into…

iabdb

It's All Been Done Before.

Tables full of Lists or Not

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply