I have a very large dataset (31552000 lines) of xyz coordinates in the following format

`1 2 3 4 5 6 7 8 9 . . . `

I have to take a distance using the special method below.

`Distance[{a_, b_, c_}, {d_, e_, f_}] := Sqrt[(If[Abs[a - d] >= (40/2), Abs[a - d] - 40, Abs[a - d]])^2 + (If[ Abs[b - e] >= (40/2), Abs[b - e] - 40, Abs[b - e]])^2 + (If[ Abs[c - f] >= (40/2), Abs[c - f] - 40, Abs[c - f]])^2] `

Then I import the data.

`data = Partition[ Partition[ReadList["input.txt", {Real, Real, Real}], 16], 38]; `

The formatting is kind of strange. Every 16 rows is one molecule, and every 38 molecules is one timestep. I take the distance between the 16th atom of each molecule and the 5th atom of each molecule.Then I select the distances that are less than 5.55 and determine the length of the resulting list. This is repeated for each of the 29,000 timesteps.

`analysis = Flatten[ Table[ Table[ Length[ Select[ Table[ Distance[data[[r, y, 16]], data[[r, x, 5]]], {x, 1, 38}], # <= 5.55 &] ], {y, 1, 38}], {r, 1, 29000}] ]; `

This last section is my most computationally intensive part. For 29000 timesteps and 38 molecules, it takes 40 minutes to process fully. It also takes too much memory (16+ gigs per kernel) to parallelize. Is there any other method that will improve the performance? I have tried using compile, but I realized that Table, the biggest bottleneck, is already complied to machine code.

Below is an example of a dataset that takes my computer 2 minutes to complete with the analysis code. It is scalable to larger timesteps by changing 4000 to larger numbers.

`data = Partition[ Partition[Partition[Table[RandomReal[{0, 40}], (3*16*38*4000)], 3], 16], 38] `