# Streaming algorithm for counting triangles in a graph

As described in the reference, the algorithm (see at the bottom) supposes to output an estimator $$\hat T$$ for the # of triangles in a given graph $$G = (V, E)$$, denoted $$T$$. It is written that "it can easily be shown" that $$E[\hat T] = T$$ But unfortunately, I’m not seeing it. Trying to analyze $$E[\hat T]$$, I think as follows:

• At line 1, denote the probability to randomly (and uniformly) choose an edge which is part of a triangle as $$p$$. Since triangles can share edges, $$\frac T m \le p \le \frac {3T} m$$ For example, consider the following case:

The central triangle doesn’t add new edges to the # of possibilities to choose an edge which is part of a triangle. You can imagine a different configuration, in which there are only the 3 outer triangles and they don’t touch each other (in this configuration, we won’t see the central 4th triangle). In both cases ((case i) 4 triangles as seen in the image; (case ii) 3 disjoint triangles), the probability to choose an edge which is part of a triangle is 1 (although the # of triangles is different).

• At line 2, the probability to choose uniformly at random a vertex which "closes a triangle" with the edge from the previous step is exactly $$\frac 1 {n-2}$$.

Therefore I only see that

$$T \le E[\hat T] \le 3T$$

What am I missing?

Another question I have is regarding line 3. The stream is ordered, and we first pick a random edge $$(u, v)$$ (line 1), then a random vertex $$w$$ from $$V \backslash \{u, v\}$$ (line 2). I feel that the analysis should take into account that at line 3 we check whether $$(u, w)$$ and $$(v, w)$$ appear after $$(u, v)$$ in the stream. Maybe after I’ll understand the answer to my first question, it will be clearer.

Algorithm:

1. Pick an edge $$(u, v)$$ uniformly at random from the stream.
2. Pick a vertex $$w$$ uniformly at random from $$V \backslash \{u, v\}$$
3. If $$(u, w)$$ and $$(v, w)$$ appear after $$(u, v)$$ in the stream, then output $$m(n-2)$$. Else, output $$0$$.

Also, although I didn’t see it written, I believe there’s an assumption that $$V$$ is known ahead.

Reference: Data streams lecture notes by Prof. Amit Chakrabarti, section "15.3 Triangle Counting", https://www.cs.dartmouth.edu/~ac/Teach/data-streams-lecnotes.pdf

Best regards