As described in the reference, the algorithm (see at the bottom) supposes to output an estimator $ \hat T$ for the # of triangles in a given graph $ G = (V, E)$ , denoted $ T$ . It is written that "it can easily be shown" that $ $ E[\hat T] = T $ $ But unfortunately, I’m not seeing it. Trying to analyze $ E[\hat T]$ , I think as follows:
-
At line 1, denote the probability to randomly (and uniformly) choose an edge which is part of a triangle as $ p$ . Since triangles can share edges, $ $ \frac T m \le p \le \frac {3T} m $ $ For example, consider the following case:

The central triangle doesn’t add new edges to the # of possibilities to choose an edge which is part of a triangle. You can imagine a different configuration, in which there are only the 3 outer triangles and they don’t touch each other (in this configuration, we won’t see the central 4th triangle). In both cases ((case i) 4 triangles as seen in the image; (case ii) 3 disjoint triangles), the probability to choose an edge which is part of a triangle is 1 (although the # of triangles is different).
-
At line 2, the probability to choose uniformly at random a vertex which "closes a triangle" with the edge from the previous step is exactly $ \frac 1 {n-2}$ .
Therefore I only see that
$ $ T \le E[\hat T] \le 3T $ $
What am I missing?
Another question I have is regarding line 3. The stream is ordered, and we first pick a random edge $ (u, v)$ (line 1), then a random vertex $ w$ from $ V \backslash \{u, v\}$ (line 2). I feel that the analysis should take into account that at line 3 we check whether $ (u, w)$ and $ (v, w)$ appear after $ (u, v)$ in the stream. Maybe after I’ll understand the answer to my first question, it will be clearer.
Algorithm:
- Pick an edge $ (u, v)$ uniformly at random from the stream.
- Pick a vertex $ w$ uniformly at random from $ V \backslash \{u, v\}$
- If $ (u, w)$ and $ (v, w)$ appear after $ (u, v)$ in the stream, then output $ m(n-2)$ . Else, output $ 0$ .
Also, although I didn’t see it written, I believe there’s an assumption that $ V$ is known ahead.
Reference: Data streams lecture notes by Prof. Amit Chakrabarti, section "15.3 Triangle Counting", https://www.cs.dartmouth.edu/~ac/Teach/data-streams-lecnotes.pdf
Best regards