Cuda memory issues using large Pytorch model and Tensorflow model together

The Pytorch network is Tencent DSFD:

and the Tensorflow network is WideResNet:

I’m using the former for face detection (very robust to all poses) and the latter for gender detection. They both get instantiated in the same driver script. I am on Ubuntu 18.04 with Pytorch GPU and Tensorflow GPU installed as Conda packages in a conda env. I read on a blog by Puget Systems that this is all I need to do with regards to Cuda and Cudnn installation and in fact it does seem that the GPU is used and thus results in Cuda Runtime memory errors.

My proposed solution is to detect faces first, save the bounding boxes and crops of faces as .npz files and then run the gender detection separately.

It was working fine with a much smaller gender detection model (Hassner and Levi’s) but the accuracy was too low.

Would installing Cuda and Cudnn on the Ubuntu 18.04 server itself make a difference? Ie. Do I have a better chance of running this script with both models? The GPU is a GTX 1080 with 8 GB memory.