How can I optimize a program’s performance when no profiling tools are available?

I am currently working on an OpenGl program whose performance I would like to improve. The performance is okay but not ideal on powerful dedicated GPUs, but is abysmal on integrated graphics (< 10 fps). In a normal program (CPU-based, no OpenGl or other GPU API), I would run a profiler (perhaps the one built into CLion) on the program, see where most of the time is spent, and then work on a better algorithm for those areas or find a way to reduce the amount that that area is called.

Using this technique on my OpenGl program shows that the vast majority of the program’s time (~86%) on its main thread (the one that I want to optimize) is spent in the OpenGl driver’s .so file. Additionally, the CPU usage of the program while it is running is very low, but the GPU usage hovers between 95% and 100%. Taken together, these pieces of information tell me that the bottleneck is in the GPU, so that is where I should optimize.

This where a problem occurs. My normal technique of using using a profiler to guide my optimizations won’t work without s specific GPU profiler, however. As such, I did some research to find a profiler that will tell me where GPU processing time is being spent. I could not find anything that is remotely usable. Everything was either Windows-only (I run exclusively Linux, and my program isn’t ported to Windows yet — nor will it be until it is much further along), no longer updated, and/or costs way more than the budget for this project is.

As such, I ask: how can I optimize my program’s performance when the relevant profiler does not exist? I tried guessing where the issues are and optimizing from that, however it made no difference whatsoever even though I was able to ascertain that my optimizations (frustum culling) did result in less work for the GPU by about half. A good answer will give some profiling technique that is applicable to Opengl on Linux, or will give a technique that works without a profiler.