Calculate amount of FLOPs for an eigenvalue problem solver

I’ve got 2 complex, non symmetric, matrices $ A_{1000×1000}$ , $ B_{1000×1000}$ and I am using Matlab to get it’s eigenvalues (functions like eig or eigs). Both matrices are different – one is more dense and the other one has more complex values. To compare the complexity of the eigenvalue solving process for both matrices I would like to calculate the amount of FLOPs needed for this procedure. Ofcourse it is possible to calculate the time need for the eigenvalue solver to complete it’s task, but this is highly unstable, since a lot of background processes might be creating some noise.

In Matlab there is no function that would allow me to get FLOPs for eigs but I might use an other software, since the only thing I need are these matrices $ A,B$ which can be exportet. Does anyone have an Idea how I could reach my goal?

How to measure FLOPS for CUDA code?

I have a naive matrix-matrix multiplication code in CUDA which has complexity of O(n^3). So, I’m trying to measure performance of my code with:

FLOPS = 2*N^3/(Time in kernel)

I tested with 2000*2000 matrix and it took 0.000035 seconds. From this, performance should be 457 TFLOPS while theoretical peak performance of my card is less than 5 TFLOPS. I know this value simply doesn’t make sense. Following is how I measured wall time of my kernel:

get_walltime(&t_start);    GPU_multi <<< blocksPerGrid, threadsPerBlock >>> (dev_A, dev_B, dev_C, MAX); get_walltime(&t_end); walltime = t_end - t_start; 

Does it mean my code is skipping calculation at some point? Or, am I using wrong way to measure performance of a program? I printed first 10*10 section of resulting matrix and values seem reasonable.

Minimum no. of flip flops for the given sequence


We want to design a synchronous counter that counts the sequence 0-1-0-2-0-3 and then repeats. The minimum number of flip-flops required to implement this counter is________

According to me, the answer should be 2, as the max. value in the sequence is 3 and 3 can be repeated with 2 bits only.

But my solution manual says the answer should be 4.

I can’t really fathom this answer given by my manual.

Am I right? If not why am I getting a wrong answer?

Thanks in advance!