Is the given Graph a tree? Faster than below approach –

Issue

I was given a question during an interview and although my answer was accepted at the end they wanted a faster approach and I went blank..

Question :

Given an undirected graph, can you see if it’s a tree? If so, return true and false otherwise.

A tree:

A - B
|
C - D

not a tree:

A
/ \
B - C
/
D

You’ll be given two parameters: n for number of nodes, and a multidimensional array of edges like such: [[1, 2], [2, 3]], each pair representing the vertices connected by the edge.

Note:Expected space complexity : O(|V|)
The array edges can be empty

Here is My code: 105ms

def is_graph_tree(n, edges):
nodes = [None] * (n + 1)

for i in range(1, n+1):
nodes[i] = i

for i in range(len(edges)):
start_edge = edges[i]
dest_edge = edges[i]

if nodes[start_edge] != start_edge:
start_edge = nodes[start_edge]

if nodes[dest_edge] != dest_edge:
dest_edge = nodes[dest_edge]

if start_edge == dest_edge:
return False

nodes[start_edge] = dest_edge

return len(edges) <= n - 1

Solution

You could approach this from the perspective of tree leaves. Every leaf node in a tree will have exactly one edge connected to it. So, if you count the number of edges for each nodes, you can get the list of leaves (i.e. the ones with only one edge).

Then, take the linked node from these leaves and reduce their edge count by one (as if you were removing all the leaves from the tree. That will give you a new set of leaves corresponding to the parents of the original leaves. Repeat the process until you have no more leaves.

[EDIT] checking that the number of edges is N-1 eliminiates the need to do the multi-root check because there will be another discrepancy (e.g. double link, missing node) in the graph if there are multiple ‘roots’ or a disconnected subtree

If the graph is a tree, this process should eliminate all nodes from the node counts (i.e. they will all be flagged as leaves at some point).

Using the Counter class (from collections) will make this relatively easy to implement:

from collections import Counter

def isTree(N,E):
if N==1 and not E: return True                # root only is a tree
if len(E) != N-1:  return False               # a tree has N-1 edges
counts  = Counter(n for ab in E for n in ab)  # edge counts per node
if len(counts) != N : return False            # unlinked nodes
while True:
leaves = {n for n,c in counts.items() if c==1} # new leaves
if not leaves:break
for a,b in E:                             # subtract leaf counts
if counts[a]>1 and b in leaves: counts[a] -= 1
if counts[b]>1 and a in leaves: counts[b] -= 1
for n in leaves: counts[n] = -1           # flag leaves in counts
return all(c==-1 for c in counts.values())    # all must become leaves

output:

G = [[1,2],[1,3],[4,5],[4,6]]
print(isTree(6,G)) # False (disconnected sub-tree)

G = [[1,2],[1,3],[1,4],[2,3],[5,6]]
print(isTree(6,G)) # False (doubly linked node 3)

G = [[1,2],[2,6],[3,4],[5,1],[2,3]]
print(isTree(6,G)) # True

G = [[1,2],[2,3]]
print(isTree(3,G)) # True

G = [[1,2],[2,3],[3,4]]
print(isTree(4,G)) # True

G = [[1,2],[1,3],[2,5],[2,4]]
print(isTree(6,G)) # False (missing node)

Space complexity is O(N) because the counts dictionary has one entry per node(vertex) with an integer as value. Time complexity will be O(ExL) where E is the number of edges and L is the number of levels in the tree. The worts case time is O(E^2) for a tree where all parents have only one child node. However, since the initial condition is for E to be less than V, the worst case will actually be O(V^2)

Note that this algorithm makes no assumption on edge order or numerical relationships between node numbers. The root (last node to be made a leaf) found by this algorithm is not necessarily the only possible root given that, unless the nodes have an implicit cardinality relationship (or edges have an order), there could be ambiguous scenarios:

[1,2],[2,3],[2,4]  could be:

1                 2            3
|_2        OR     |_1     OR   |_2
|_3             |_3            |_1
|_4             |_4            |_4

If a cardinality relationship between node numbers or an order of edges can be relied upon, the algorithm could potentially be made more time efficient (because we could easily determine which node is the root and start from there).

[EDIT2] Alternative method using groups.

When the number of edges is N-1, if the graph is a tree, all nodes should be reachable from any other node. This means that, if we form groups of reachable nodes for each node and merge them together based on the edges, we should end up with a single group after going through all the edges.

Here is the modified function based on that approach:

def isTree(N,E):
if N==1 and not E: return True                # root only is a tree
if len(E) != N-1:  return False               # a tree has N-1 edges
groups = {n:[n] for ab in E for n in ab}      # each node in its own group
if len(groups) != N : return False            # unlinked nodes
for a,b in E:
groups[a].extend(groups[b])               # merge groups
for n in groups[b]: groups[n] = groups[a] # update nodes' groups
return len(set(map(id,groups.values()))) == 1 # only one group when done

Given that we start out with fewer edges than nodes and that group merging will consume at most 2x a group size (so also < N), the space complexity will remain O(V). The time complexity will also be O(V^2) at for the worts case scenarios