Most of the discussion about RNN and LSTM alludes to the varying ability of different RNNs to capture "long term dependency". However, most demonstrations use generated text to show the absence of long term dependency for vanilla RNN.

Is there any way to explicitly measure the term dependency of a given trained RNN, much like ACF and PACF of a given ARMA time series?

I am currently trying to look at the (Frobenius norm of) gradients of memories $ s_k$ against inputs $ x_k$ , summed over training examples $ \{x^i\}_{i=1}^N$ – $ \sum_{i=1}^N \big\|\frac{\partial s_k}{\partial x_k}(x^i)\big\|_F$ . I would like to know if there are more refined or widely-used alternatives.

Thank you very much!