This question is the follow-up to this previous question.

## Background

Using this simulation I investigate a system in which enzymes proliferate in cells. During the replications of enzymes, parasites can come to be due to mutation. They can drive the system into extinction. I’m interested in where in the parameter space coexistence is possible.

I have made the changes advised by HoboProber. Namely correction of style and implementing the model relying on Numpy. So now the system is a 2-dimensional array. Cells are the rows of the array. The values of the first column are the numbers of enzymes and the values of the second column are the numbers of parasites.

## My request

The speed of the new implementation is already way better than that of the previous one. But as I would like to increase `population_size`

and `gen_max`

every bit of performance improvement counts.

So far I examined the system with population sizes ranging from 100 to 1000 cells and with the maximal number of generations being 10000. The amount of increase in population size depends on performance. A million cells would be a perfectly reasonable assumption concerning the modelled system. The maximal number of generations should be 20-30000.

- Primarily, does the code make use of vectorization and Numpy as effectively as it can? E.g. at which parts should I pay attention to C-ordering and F-ordering or should I at all?
- Could performance profit from making use of multithreading/multiprocessing in this case? E.g. when the proliferation of molecules happens, the replication events in different cells are independent of each other. They could happen in parallel maybe.
- Could performance profit from making use of static typing and compiling? E.g. using Cython or Numba.

Of course any advice is highly appreciated! (E.g. on writing data to file more effectively.)

## The code

`# -*- coding: utf-8 -*- """ Collect data on an enzyme-parasite system explicitly assuming compartmentalization. Functions --------- simulation() Simulate mentioned system. write_out_file() Write data to csv output file. """ import csv import time import numpy as np def simulation(population_size, cell_size, replication_rate_p, mutation_rate, gen_max): """ Simulate an enzyme-parasite system explicitly assuming compartmentalization. Parameters ---------- population_size : int The number of cells. cell_size : int The maximal number of replicators of cells at which cell division takes place. replication_rate_p : int or float The fitness (replication rate) of the parasites relative to the fitness (replication rate) of the enzymes. Example ------- $ replication_rate_p = 2 This means that the parasites' fitness is twice as that of the enzymes. mutation_rate : int or float The probability of mutation during a replication event. gen_max : int The maximal number of generations. A generation corresponds to one outer while cycle. If the system extincts, the number of generations doesn't reach gen_max. Yield ------- generator object Contains data on the simulated system. """ def fitness(population): """ Calculate fitnesses of cells. Fitness of a cell = number of enzymes/(number of enzymes + number of parasites) Parameter --------- population : ndarray The system itself. Return ------ ndarray The fitness of each cell of the system. """ return population[:, 0]/population.sum(axis=1) def population_stats(population): """ Calculate statistics of the system. Parameter --------- population : ndarray The system itself. Return ------- tuple Contains statistics of the simulated system. """ gyak_sums = population.sum(axis=0) gyak_means = population.mean(axis=0) gyak_variances = population.var(axis=0) gyak_percentiles_25 = np.percentile(population, 25, axis=0) gyak_medians = np.median(population, axis=0) gyak_percentiles_75 = np.percentile(population, 75, axis=0) fitness_list = fitness(population) return ( gyak_sums[0], gyak_sums[1], (population[:, 0] > 1).sum(), gyak_means[0], gyak_variances[0], gyak_percentiles_25[0], gyak_medians[0], gyak_percentiles_75[0], gyak_means[1], gyak_variances[1], gyak_percentiles_25[1], gyak_medians[1], gyak_percentiles_75[1], fitness_list.mean(), fitness_list.var(), np.percentile(fitness_list, 25), np.median(fitness_list), np.percentile(fitness_list, 75) ) # Creating the system with the starting state being # half full cells containing only enzymes. population = np.zeros((population_size, 2), dtype=int) population[:, 0] = int(cell_size//2) gen = 0 yield (gen, *population_stats(population), population_size, cell_size, mutation_rate, replication_rate_p, "aft") print(f"N = {population_size}, rMax = {cell_size}, " f"aP = {replication_rate_p}, U = {mutation_rate}") while population.size > 0 and gen < gen_max: gen += 1 # Replicator proliferation until cell_size in each cell. while np.any(population.sum(axis=1) < cell_size): # Calculating probabilites of choosing a parasite to replication. repl_probs_p = population[population.sum(axis=1) < cell_size].copy() repl_probs_p[:, 1] *= replication_rate_p repl_probs_p = repl_probs_p[:, 1]/repl_probs_p.sum(axis=1) # Determining if an enzyme or a parasite replicates, # and if an enzyme replicates, will it mutate to a parasite. # (Outcome can differ among cells. Parasites don't mutate.) repl_choices = np.random.random_sample(repl_probs_p.shape[0]) mut_choices = np.random.random_sample(repl_probs_p.shape[0]) lucky_replicators = np.zeros(repl_probs_p.shape[0], dtype=int) lucky_replicators[ (repl_choices < repl_probs_p) | (mut_choices < mutation_rate) ] = 1 population[population.sum(axis=1) < cell_size, lucky_replicators] += 1 if gen % 100 == 0: yield (gen, *population_stats(population), population_size, cell_size, mutation_rate, replication_rate_p, "bef") # Each cell divides. new_population = np.empty_like(population) new_population[:, 0] = np.random.binomial(population[:, 0], 0.5) new_population[:, 1] = np.random.binomial(population[:, 1], 0.5) population -= new_population # Discarding dead cells. population = np.concatenate([population[population[:, 0] > 1, :], new_population[new_population[:, 0] > 1, :]]) # Choosing survivor cells according to their fitnesses # if there are more viable cells than population_size. # Hence population_size or less cells move on to the next generation. if (population.size > 0) & (population.shape[0] > population_size): fitness_list = fitness(population) fitness_list = fitness_list/fitness_list.sum() population = population[np.random.choice(population.shape[0], population_size, replace=False, p=fitness_list), :] elif population.size == 0: for i in range(2): yield (gen+i, *(0, 0)*9, population_size, cell_size, mutation_rate, replication_rate_p, "aft") print(f"{gen} generations are done. Cells are extinct.") if (gen % 100 == 0) & (population.size > 0): yield (gen, *population_stats(population), population_size, cell_size, mutation_rate, replication_rate_p, "aft") if (gen % 1000 == 0) & (population.size > 0): print(f"{gen} generations are done.") def write_out_file(result, n_run): """ Write data to csv output file. Parameters ---------- result : generator object or list of generator objects Contains data on the simulated system. n_run : int The number of consecutive runs. """ local_time = time.strftime("%m_%d_%H_%M_%S_%Y", time.localtime(time.time())) with open("output_data_" + local_time + ".csv", "w", newline="") as out_file: out_file.write( "gen;" "eSzamSum;pSzamSum;alive;" "eSzamAtl;eSzamVar;eSzamAKv;eSzamMed;eSzamFKv;" "pSzamAtl;pSzamVar;pSzamAKv;pSzamMed;pSzamFKv;" "fitAtl;fitVar;fitAKv;fitMed;fitFKv;" "N;rMax;U;aP;boaSplit\n" ) out_file = csv.writer(out_file, delimiter=";") counter = 0 print(counter, "/", n_run) for i in result: out_file.writerows(i) counter += 1 print(counter, "/", n_run) RESULT = [simulation(100, 20, 1, 0, 10000)] RESULT.append(simulation(100, 20, 1, 1, 10000)) N_RUN = 2 write_out_file(RESULT, N_RUN) # Normally I call the functions from another script, # these last 4 lines are meant to be just an example. ``` `