Assuming IO is not an issue, is saving intermediate results considered a best practice? What are the pros and cons, and situations that warrant doing so or not?
Say I have two components along a long pipeline,
others--> Component_1 --> Component_2 --> others.
I can either save the output from
Component_1, pass the path to
Component_2, and have
Component_2 read and process from there. Or, I can
return output from
Component_1, and pass
This is for data processing tasks, where the server process itself can run continuously, but each user input data causes a single run through the pipeline, and it completes before it retrieves the next item from the input queue.
Pros of saving:
1. Makes testing and debugging a bit easier? I don’t have to save the output from
Component_1 in my test/debug code, before doing stuff to it, if I don’t want to rerun
Component_1. A debugger that saves all intermediate data can do that as well of course, but it saving everything means it might take a while to run.
2. Makes debugging if actual runs fail easier. Same point as previous.
1. Performance hit, but we assume it is negligible here.
2. Having to
move all intermediate files to
trash somewhere during/at the end of runs.
3. Having a separate
debug folder with unique ID tag for each run, but that’s usually necessary anyway, if only to store output that the UI retrieves and presents.