This is a rather standard interview question:

Suppose that you have points and the points are connected by edges. Imagine that the points are numbered, then we can represent whether there exists a connection between two points by writing all the pairs of points.

(1,2)

(2,3)

(5,1)

(4,6)

We can also represent this graph, as matrix, where 1 means that there exists a connection between i and j and 0 otherwise. We call this matrix the adjacency matrix since it tells us which points are connected.

Given such a matrix, we want to find all the connected components. A connected component is either just one node if it has no connections or all the nodes that can be reached by traveling along the edges.

In the picture above, each of these graphs has only one connected component, but together there are three distinct connected components, since there is no way to reach from the first graph to the second by traveling along an edge.

First I will show my original solution in KDB. Then a slight improvement.

First note that the matrix must be symmetric, since if point 1 is connected to point 2, then point 2 is connected to point 1.

I wanted to write a function that would get all the points that were connected to a single point.

If I think of a particular row in this matrix, then all the 1s represent the neighbors of this point. If I then find all of their neighbors, and so on, I will have all the points that can possibly be reached.

First imagine I have a matrix, a:

To create it, I first create 100 zeros (100#0)

Then I pick six places to add connections, (6?100)

I replace the zeros with 1s.

I add this matrix to itself flipped, so that it is symmetric (a+flip a)

Divide by two so that any place that 1s overlapped are either 1 or .5, (.5*)

Then I cast it to an integer so that they are all 1s or 0s (`long$)

q)a:`long$.5*a+flip a:10 10#@[100#0;6?100;:;1]

q)a

0 0 0 1 0 0 0 0 0 0

0 0 0 0 0 1 1 0 0 0

0 0 0 0 0 0 0 0 1 0

1 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 1 0 0

0 1 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0 0

0 0 0 0 1 0 0 0 0 0

0 0 1 0 0 0 0 0 0 1

0 0 0 0 0 0 0 0 1 0

Okay given a, I can find the neighbors of any of the 10 points.

q)a[1]

0 0 0 0 0 1 1 0 0 0

q)where a[1]

5 6

So we get a list of points that are neighbors.

We need to create a function that will keep asking for neighbors and discard any duplicates.

We can find all the neighbors of the neighbors by selecting the neighbor rows of matrix and then asking where for each row:

q)where each a 5 6

1

1

We get a list of neighbors.

We can then raze this list to flatten it:

q)raze where each a 5 6

1 1

Then we can ask for the distinct items:

q)distinct raze where each a 5 6

,1

Now we would like to add this list to our original neighbors and check if we can find more neighbors:

q)distinct raze where each a 5 6 1

1 5 6

Running this again we get:

q)distinct raze where each 1 5 6

5 6 1

So there are no more neighbors to find, just the order changes and we will flip back and forth.

Anytime, we need to apply a function like this we can make use of the adverb in q called over. Over will apply the function until the results are the same for two successive iterations.

So my function to find all connected with comments looks like:

findAllConnected:{[i;m]

neighbors: where m[i]; /find the initial list of neighbors

f:{distinct raze x,where each y x}[;m]; /create the function where second argument is fixed to be the adjacency matrix.

f over neighbors} /run the function over the initial neighbors and keep adding neighbors until there are no more to add./testing on a:

q)findAllConnected[1;a]

5 6 1

As expected this gives us all points connected to point 1, including point 1 itself.

To list all of the connected components we need to run this on each of the points:

q)findAllConnected[;a] each til count a

3 0

5 6 1

8 2 9

0 3

7 4

1 5 6

1 5 6

4 7

2 9 8

8 2 9

We get 10 rows specifying the connections. Now we need find the unique set.

The easiest way to do this is to sort each one and ask for the distinct lists:

q)distinct asc each findAllConnected[;a] each til count a

`s#0 3

`s#1 5 6

`s#2 8 9

`s#4 7

Which gives us 4 connected components.

We can look for the islands, by checking if there are any elements not on any of these lists. If so, they must unconnected to anything else.

Items not in the list are islands (only one node no connections)

q)connected:distinct asc each findAllConnected[;a] each til count a

q)islands:(til count a) except raze connected

`long$()q)count each connected /size of the components

We get an empty list, which means there are no islands to report.

Problem solved.

For those who have followed, you can see that we are actually doing a lot of duplicate work, since we keep asking for neighbors for points that we have already seen. To solve this we simply keep track of how many neighbors we have found so far and only query for the neighbors we have yet to see.

Instead of x being a list of neighbors, it is now a list with two items. The first is the number of neighbors found so far and the second is the list of neighbors.

We can use dot(.) application of drop(_) to remove the neighbors we have seen so far from the neighbors list. (.[_;x]):

findAllConnectedFast:{[i;m]

neighbors: where m[i];

f:{n:raze where each y .[_;x]; /new neighbors

x[0]:count x[1];x[1]:distinct x[1],n; /update the two pieces of x

x}[;m]; /project this function on the matrix

last f over (0; neighbors)} /apply the function on the neighbors, with 0 currently visited

/running on a:

q)findAllConnectedFast[1;a]

5 6 1/large sparse matrix to test with

q)c:7h$.5*c+flip c:1000 1000#@[1000000#0;1000?1000000;:;1]

/Time both on my 2012 macbook

q)\t distinct asc each findAllConnected[;c] each til count c

11960

q)\t distinct asc each findAllConnectedFast[;c] each til count c

1170

A 10x improvement, the keen observer will notice that we don’t have to run the function on all the points in the first place. We can go down the points and skip all the points that we have already seen. Here is what that looks like:

allComponents:{[m]

points:til count m;

connected:();

while[count points;

i:first points;

connected,:enlist n:`s#n:distinct asc i,findAllConnectedFast[i;m]; /add point in case it is island

points:points except n];

connected}

q)allComponents a

`s#0 3

`s#1 5 6

`s#2 8 9

`s#4 7q)\t allComponents c

6

This results in a 1000x increase speed up. Because we are only calling the function the minimum number of times.