Smallest Circuit for Square of Sparse Symmetric Matrix

I have an n by n symmetric matrix, and I would like to compute its square in as small a circuit complexity as possible. It’s sparse: there are sqrt(n) nonzero entries in each row/column, so the input will be sparse by the output will be nearly a density of 1. A normal sparse multiplication algorithm would take O(n^2) time which is clearly optimal, but the circuit depth for any sparse representation I can think of is at least O(n^3). (The act of “accessing a given location in memory” is not efficient in circuits.)

Using a fast matrix multiplication method like Coppersmith Winograd would bring me to O(n^~2.4). I know I can use the symmetry to reduce by a constant factor, but I figure not more than that. I’m hoping that the fact that it’s squaring (and not multiplying two different matrices) can reduce the exponent, but I don’t know how — I know that it can cut my work down a bit because the first iteration of C-W will have some identical results. I can’t find any way to use the sparsity at all.

Numerical precision isn’t an issue for me; the input is a 0-1 matrix.

In terms of practical purposes, I particularly care about n~3000, but am generally curious about asymptotically good answers too. (Just saying this to explain that very huge constant factors are a big negative for me.) Any references very much appreciated!