How to reduce the time of calculation using ParalleDo instead of Do Loop?

I am trying to use ParalleDo function through ParametricNDSolveValue and NIntegrate.

The question is pretty similar how to apply paralleDo to my code

First I intend to use Do loop and its works fine but It cost a longer time, so to reduce the computational time I want to use ParalleDo, but I get an error message and I don’t understand where it comes from.

Moreover, I intend to write the output data in data file formate by using

  file = OpenWrite[        "file1.dat",         FormatType -> TableForm]; 

But I got the following error OutputStream[file1.dat ,3] is not open.

Below is my minimal working code and some comments:

 l1 = 0.81;         Z = 1500;         x0 = 10;         v0 = 0.02;         \[Epsilon] = $  MachineEpsilon;          l0 = 0.0714`20.;          ps = ParametricNDSolveValue[{y''[r] +                2 y'[r]/r == -4 \[Pi] l k Exp[-y[r]], y[\[Epsilon]] == y0,              y'[\[Epsilon]] == 0, WhenEvent[r == 1, y'[r] -> y'[r] + Z l]}, {y,              y'}, {r, \[Epsilon], R}, {k, l},             Method -> {"StiffnessSwitching"}, AccuracyGoal -> 5,             PrecisionGoal -> 4, WorkingPrecision -> 15];        file = OpenWrite[            "file1.dat",             FormatType -> TableForm];         ParallelDo[x = i x0;           v = i^3 v0;           R = Rationalize[v^(-1/3), 0];           l = Rationalize[l1/(i x0), 0];           nn = FindRoot[Last[ps[y0, l]][R], {y0, -1}, Evaluated -> False][[1,              2]];           Tot = 4 \[Pi] nn NIntegrate[              r^2 Exp[-First[ps[nn, l]][r]], {r, \[Epsilon], R},               PrecisionGoal -> 4];           Print[NumberForm[i*1., 5], "  ", NumberForm[Tot, 5]];, {i, 292/100,             31/10, 1/100}] // Quiet // AbsoluteTiming Close[file]; 

Implementing a better algorithm for calculation of bacterial growth

I am working on a mathematical model that describes the growth of 4 different bacterial populations and immune system cells under certain conditions. The model is governed by the equations below.

POPULATION 1:

$ {\underbrace{\frac{dN_{PS}}{dt}}_{\text{ Population change rate }}=\underbrace{r N_{PS}}_{\text{ Exponential growth }}\cdot\underbrace{\left(1-\frac{N_{PS}+N_{PR}}{K}\right)}_{\text{ Growth limitation }}-\underbrace{\theta_{PS}N_{PS}}_{\text{ Natural death }}-\underbrace{A_{PS}N_{PS}}_{\text{Biofilm formation}}+\underbrace{D_{BS}N_{BS}}_{\text{Biofilm dispersion}}-\underbrace{\phi_{PS}N_{PS}}_{\text{Mutation rate}}-\underbrace{\eta\delta_{PS}A_{m}N_{PS}}_{\text{Antibiotic inhibition}}-\left\{ \underbrace{\Gamma N_{PS}I}_{\text{Immune system}}\right\} }$

POPULATION 2:

$ {\frac{dN_{BS}}{dt}=(r-c_{b})N_{BS}\cdot\left(1-\frac{N_{BS}+N_{BR}}{K}\right)-\theta_{BS}N_{BS}+A_{PS}N_{PS}-D_{BS}N_{BS}-\phi_{BS}N_{BS}-\eta\delta_{BS}A_{m}N_{BS}-\left\{ \Gamma N_{BS}I\right\} }$

POPULATION 3:

$ {\frac{dN_{PR}}{dt}=(r-c_{r})N_{PR}\cdot\left(1-\frac{N_{PS}+N_{PR}}{K}\right)-\theta_{PR}N_{PR}-A_{PR}N_{PR}+D_{BR}N_{BR}+\phi_{PS}N_{PS}-\eta\delta_{PR}A_{m}N_{PR}-\left\{ \Gamma N_{PR}I\right\} }$

POPULATION 4:

$ {\frac{dN_{BR}}{dt}=(r-(c_{b}+c_{r}))N_{BR}\cdot\left(1-\frac{N_{BS}+N_{BR}}{K}\right)-\theta_{BR}N_{BR}+A_{PR}N_{PR}-D_{BR}N_{BR}+\phi_{BS}N_{BS}-\eta\delta_{BR}A_{m}N_{BR}-\left\{ \Gamma N_{BR}I\right\} }$

NAIVE CELLS (IMMUNE SYSTEM):

$ {\frac{dV}{dt}=\frac{-\sigma VB}{\pi+B}}$

EFFECTOR CELLS (IMMUNE SYSTEM):

$ {{\frac{dE}{dt}=(2\sigma V+\sigma E)\cdot\frac{B}{\pi+B}-hE\left(1-\frac{B}{\pi+B}\right)}}$

MEMORY CELLS (IMMUNE SYSTEM):

$ {{\frac{dM}{dt}=fEh\left(1-\frac{B}{\pi+B}\right)}}$

TOTAL BACTERIAL DENSITY:

$ {\frac{dB}{dt}=N_{PS}+N_{PR}+N_{BS}+N_{BR}}$

IMMUNE SYSTEM DENSITY:

$ {\frac{dI}{dt}=V+E+M}$

ANTIBIOTIC UPTAKE 1:

$ {{\eta}=\begin{cases} 1 & t_{1}\leq t\leq t_{1}+t_{2}\ 0 & t_{1}>t\:or\:t>t_{1}+t_{2} \end{cases}}$

ANTIBIOTIC UPTAKE 2:

$ {{\eta}=\begin{cases} 1 & B\geq\varOmega\ 0 & B<\varOmega \end{cases}}$

I am interested in how $ N_{PS}, N_{PR}, N_{BS}, N_{BR}, V, E, M $ change over time for which I implemented an algorithm in Python to solve the equations. Greek letters and other undescribed parameters (e.g. r, K, etc.) are mainly constants that are set at the beginning of the execution of the program.

One example of the functions used is shown below. Currently as you can also see from the code I’m using an Euler method to solve the equations, however I’d like to implement at least Heun’s method or even some higher order Runge-Kutta method to solve them.

I’m stuck already with Heun’s method and don’t know how to implement it. I ask for help with how to modify the following code and change it to for example Heun’s if that’s even possible with those equations.

def sensitive_bacteria_PS(previous_density, growth_rate, density_PR, maximum_density_P, death_rate_PS, attachment_rate_PS, mutation_rate_PS, antibiotic_uptake, antibiotic_inhibition_PS, antibiotic_concentration, lymphocyte_inhibition, total_immune_cells, detachment_rate_BS, density_BS, time_step):      return previous_density + ((previous_density * (growth_rate * (1 - (previous_density + density_PR) / maximum_density_P) - death_rate_PS - attachment_rate_PS - mutation_rate_PS - antibiotic_uptake * antibiotic_inhibition_PS * antibiotic_concentration - lymphocyte_inhibition * total_immune_cells) + detachment_rate_BS * density_BS) * time_step) 

Cause of disparity between methods of calculation of specific angular momentum?

I’m trying to numerically solve for the delta-V cost of a transfer between two close elliptic orbits which are inclined relative to each other. The method I’m using essentially calculates the velocity vectors of the initial orbit at one node, the final orbit at the opposing node, and then calculates the transfer orbit from initial flight path angle, initial radius, and final radius.

One key step is to calculate the specific angular momentum vector and eccentricity vector of the transfer orbit, in order to calculate the perifocal-to-inertial direction cosine matrix for the transfer orbit. However, when I calculate the angular momentum vector h of the transfer orbit in the inertial frame from the cross product of the position and velocity vectors in the inertial frame, I find significant error (relative error is -3.9521e-8) between the magnitude of this vector and the scalar specific angular momentum calculated earlier in the code.

This is strange to me because that scalar angular momentum is used to calculate the velocity vector. I’m confused as to where the loss of precision is occurring.

I’ve tried providing inputs with greater precision, specifically the mu value I have been using, but this did not shift the relative error at all. When I use the same cross product method to calculate specific angular momentum of orbits 1 and 2, the error is on the order of machine precision.

”’MATLAB

    mu = 3.98600437823e+14;      thetaNT = -55.1582940061466; % deg     eT = 0.022905923178296;     aT = 7.243582592195826e+06; % m      r1A = 7.146263097977215e+06; % m     v1RA = -1.390985544431790e+02; % m/s     v1ThetaA = 7.494958913236144e+03; % m/s      eR1 = [0.355828643065080;-0.934551216774375;0];     eTheta1 = [0.934551216774375;0.355828643065080;0];      nCpf1 = [0.263190394679355,-0.840751409136755,0.473146789255815;         0.880932410956014,0.00949753358184791,-0.473146789255815;         0.393305102275257,0.541338032000730,0.743144825477394];     nCpf2 = [0.107314578042381,-0.875080710676727,0.471929370924401;         0.879361618777851,-0.137938482815824,-0.455736896003458;         0.463903788257849,0.463903788257849,0.754709580222772];       v1A = sqrt(v1RA^2 + v1ThetaA^2); % Total speed of orbit 1 at A     r1A = K1;      hT = sqrt(aT*mu*(1-eT^2)); % Specific angular momentum of transfer orbit      eRTB = [-cosd(thetaNT);sind(thetaNT+180);0];     eThetaTB = [-sind(thetaNT+180);-cosd(thetaNT);0];      % Calculation of radial speed and tangential speed     vTRA = mu/hT*eT*sind(thetaNT);     vTThetaA = mu/hT*(1+eT*cosd(thetaNT));      vTA = sqrt(vTRA^2+vTThetaA^2);      vTRB = mu/hT*eT*sind(thetaNT+180);     vTThetaB = mu/hT*(1-eT*cosd(thetaNT));      % Conversion of radius and speeds into radius and velocity vectors     % in perifocal frames     r1APF1 = r1A.*eR1;     v1APF1 = v1RA.*eR1 + v1ThetaA.*eTheta1;      vTBPFT = vTRB.*eRTB + vTThetaB.*eThetaTB;      v2BPF2 = v2RB.*eR2 + v2ThetaB.*eTheta2;      % Conversion to inertial reference frame     r1AN = nCpf1*r1APF1;     v1AN = nCpf1*v1APF1;      v2BN = nCpf2*v2BPF2;      rTAN = r1AN;     vTAN = v1AN.*(vTA/v1A);      % Calculation of angular momentum and eccentricity vectors in     % inertial frame     hTN = cross(rTAN, vTAN);     diff = (norm(hTN)-hT)/hT 

”’

I would expect diff to be on the order of machine precision, roughly 2.2e-16, but it is instead on the order of 1e-8, 8 orders of magnitude larger than it should be.

Factorial calculation UI freezes on loading many items in ListBox

I’ve been experiencing this performance issue for about 3 days and really need your help. I’m a novice at c# and cannot understand what I can do to optimize my application.

The problem is that the UI is not as responsive as it must be while many items are loading.

I decided to write an application on learning purpose where we can calculate factorial of the input big number. It’s also important to make calculations concurrently in several threads.

I also want to bind a progress bar to the progress of calculations and show interim calculations.

For example: Input number: 5

Interim calculations:

0!=1 1!=1 2!=2 3!=6 4!=24 5!=120 

Result: 5!=120

Everything is fine when the input number is less then 1000. If it’s more then freezes are coming.

What I’ve tried:

  1. At first I’ve tried to write my own observable concurrent collection where I add my items from other threads.
  2. Then I’ve tried to push interim results, which are calculated by other threads, into ConcurrentQueue from which I will dequeue them into my observable collection.
  3. I’ve tried to turn on virtualization but it also didn’t help. Actually, I’m pretty sure that the problem is in my collection that ItemSource is binded to is growing and when I add more elements to this collection then it starts redrawing completly.

CalculationProcessView:

<UserControl x:Class="FactorialCalculator.WPF.Views.CalculationProcessView"          xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"          xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"          xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"           xmlns:d="http://schemas.microsoft.com/expression/blend/2008"           xmlns:local="clr-namespace:FactorialCalculator.WPF.Views"          mc:Ignorable="d"           d:DesignHeight="450" d:DesignWidth="800">  <StackPanel>     <Label Content="Calculation:"/>      <Border BorderBrush="Gainsboro" BorderThickness="0,2">         <ListBox              x:Name="CalculationProcessListBox"             Height="100"             ItemsSource="{Binding Path=Calculations, Mode=OneWay, IsAsync=True}"             VirtualizingPanel.IsContainerVirtualizable="True"             VirtualizingPanel.IsVirtualizing="True"             VirtualizingPanel.VirtualizationMode="Recycling"             VirtualizingPanel.CacheLengthUnit="Page"             VirtualizingPanel.CacheLength="2,2">              <ListBox.Template>                 <ControlTemplate>                     <ScrollViewer>                         <ItemsPresenter />                     </ScrollViewer>                 </ControlTemplate>             </ListBox.Template>              <ListBox.ItemsPanel>                 <ItemsPanelTemplate>                     <VirtualizingStackPanel />                 </ItemsPanelTemplate>             </ListBox.ItemsPanel>          </ListBox>     </Border>      <ProgressBar Minimum="0"                  Maximum="{Binding ProgressBarMaxValue}"                  Value="{Binding Calculations.Count, Mode=OneWay}"                  Height="28" /> </StackPanel> 

CalculatorFormViewModel:

namespace FactorialCalculator.WPF.ViewModels {     public class CalculatorFormViewModel : BaseViewModel, ICalculatorFormViewModel     {         private string _progressBarMaxValue;         private ICommand _calculateCommand;         private IModelFactory _modelFactory;         private SystemTimer _dispatcherTimer;         private ICalculationModel _calculation;         private ICommandFactory<RelayCommand> _commandFactory;         private IFactorialCalculationService _factorialCalculationService;         private static object _syncLock = new object();          public CalculatorFormViewModel(Dispatcher uiDispatcher,                                        IModelFactory modelFactory,                                        ICommandFactory<RelayCommand> commandFactory,                                        IFactorialCalculationService factorialCalculationService) : base(uiDispatcher)         {             _modelFactory = modelFactory;             _commandFactory = commandFactory;             _factorialCalculationService = factorialCalculationService;             _calculation = _modelFactory.CreateCalculationModel();             Calculations = new ConcurrentSortedList<ICalculationModel>();             BindingOperations.EnableCollectionSynchronization(Calculations, _syncLock);         }          public ConcurrentSortedList<ICalculationModel> Calculations { get; set; }          public string ProgressBarMaxValue         {             get { return _progressBarMaxValue; }             set             {                 _progressBarMaxValue = value;                 NotifyPropertyChanged("ProgressBarMaxValue");             }         }          public ICalculationModel Calculation         {             get             {                 return _calculation;             }             set             {                 _calculation = value;                 NotifyPropertyChanged("Calculation");             }         }          #region ICalculatorFormViewModel implementation          public ICommand CalculateCommand         {             get             {                 return _calculateCommand ??                   (                       _calculateCommand                             = _commandFactory.CreateCommand(                                  _ => this.Calculate().FireAndForgetSafeAsync(null))                   );             }         }          public async Task Calculate()         {             if (_factorialCalculationService.IsInPorcess)             {                 _factorialCalculationService.StopCalculation();             }              if (BigInteger.TryParse(Calculation.InputNumber, out BigInteger number))             {                 try                 {                     Calculations.Clear();                      ProgressBarMaxValue = Calculation.InputNumber;                     StartMonitoringCalculationProgress();                      await _factorialCalculationService.StartCalculation(number);                 }                 finally                 {                     StopMonitoringCalculationProgress();                 }             }             else             {                 throw new ArgumentException("Incorrect input value!");             }         }          #endregion          protected void StartMonitoringCalculationProgress()         {             _dispatcherTimer = new SystemTimer() { Interval = 500 };              _dispatcherTimer.Elapsed += (o, args) =>             {                  if (_factorialCalculationService                             .TryGetInterimResults(out IDictionary<BigInteger, BigInteger> interimResults))                 {                     lock (_syncLock)                     {                         Calculations.AddItems(                             interimResults.Select(                                     (item) =>                                     {                                         var calculationModel = _modelFactory.CreateCalculationModel();                                          calculationModel.InputNumber = item.Key.ToString();                                         calculationModel.Factorial = item.Value.ToString();                                          return calculationModel;                                     }                                 ).ToList()                         );                     }                 }             };              _dispatcherTimer.Start();         }          protected void StopMonitoringCalculationProgress()         {             _dispatcherTimer.Stop();             _dispatcherTimer.Dispose();         }     } } 

FactorialCalculationService:

namespace FactorialCalculator.Application.Services {     public class FactorialCalculationService : IFactorialCalculationService     {         private bool _isCanceled;         private bool _isInPorcess;         private CancellationTokenSource _tokenSourse;         private ConcurrentQueue<IDictionary<BigInteger, BigInteger>> _calculationQueue;         private readonly IFactorialCalculatorService _factorialCalculatorService;          public FactorialCalculationService(IFactorialCalculatorService factorialCalculatorService)         {             _isCanceled = false;             _isInPorcess = false;             _factorialCalculatorService = factorialCalculatorService;             _calculationQueue = new ConcurrentQueue<IDictionary<BigInteger, BigInteger>>();         }          #region IFactorialCalculationService implementation          public bool IsInPorcess => _isInPorcess;          public bool TryGetInterimResults(out IDictionary<BigInteger, BigInteger> interimResults)         {             return _calculationQueue.TryDequeue(out interimResults);         }          public Task StartCalculation(BigInteger number)         {             return                  Task.Run(                     () =>                     {                         try                         {                             _isCanceled = false;                             _isInPorcess = true;                             _tokenSourse = new CancellationTokenSource();                              _factorialCalculatorService.Calculate(                                                              number,                                                              (dictionary) => _calculationQueue.Enqueue(dictionary),                                                              _tokenSourse.Token                                                          );                               //Wait while all the results will be consumed                             while (!_calculationQueue.IsEmpty);                         }                         catch (AggregateException ax)                         {                             ax.Handle(ex => {                                 OperationCanceledException exception = ex as OperationCanceledException;                                  if (exception != null)                                 {                                     _isCanceled = true;                                 }                                  return exception != null;                             });                          }                         finally                         {                             _isInPorcess = false;                         }                     }                 );         }          public void StopCalculation()         {             if (!_isCanceled)             {                 _calculationQueue = new ConcurrentQueue<IDictionary<BigInteger, BigInteger>>();                 _tokenSourse.Cancel();             }         }          #endregion     } } 

FactorialCalculatorService:

namespace FactorialCalculator.Domain.Services {     public class FactorialCalculatorService : IFactorialCalculatorService     {         private int _maxTasks;          #region IFactorialCalculatorService implementation          public BigInteger Calculate(BigInteger number,                                     Action<IDictionary<BigInteger, BigInteger>> callback = null,                                     CancellationToken cancellationToken = default(CancellationToken))         {             if (number < 0)                 throw new ArgumentOutOfRangeException(nameof(number));              BigInteger countOfTasks = number / Environment.ProcessorCount;             countOfTasks = countOfTasks > int.MaxValue ? int.MaxValue : (int)countOfTasks;             _maxTasks = countOfTasks == 0 ? 1 : (int)countOfTasks;              var tasks = ParallelizeCalculation(number, callback, cancellationToken);              Task.WaitAll(tasks);              return GetFinalResult(tasks);         }          #endregion           protected virtual BigInteger CalculateClass(BigInteger upperBound,                                                     BigInteger startFrom,                                                     CancellationToken cancellationToken = default(CancellationToken),                                                     Action<IDictionary<BigInteger, BigInteger>> callback = null)         {             cancellationToken.ThrowIfCancellationRequested();              SortedDictionary<BigInteger, BigInteger> calculationHistory = new SortedDictionary<BigInteger, BigInteger>();              for (var i = startFrom; i <= upperBound; i += _maxTasks)             {                 //Thread.Sleep(50);                 cancellationToken.ThrowIfCancellationRequested();                  var internalResult = Calculate(i);                 calculationHistory.Add(i, internalResult);             }              callback?.Invoke(calculationHistory);              return calculationHistory.Last().Value;         }          protected virtual BigInteger Calculate(BigInteger number)         {             BigInteger result = 1;              for (BigInteger i = result; i <= number; i++)             {                 result *= i;             }              return result;         }          protected virtual Task<BigInteger>[] ParallelizeCalculation(BigInteger number,                                                         Action<IDictionary<BigInteger, BigInteger>> callback = null,                                                         CancellationToken cancellationToken = default(CancellationToken))         {             var tasks = Enumerable.Range(0, _maxTasks)                                   .Select(                                             i => Task.Factory.StartNew(                                                 () => CalculateClass(number, i, cancellationToken, callback),                                                 TaskCreationOptions.AttachedToParent                                             )                                          )                                   .ToArray();               return tasks;         }          private BigInteger GetFinalResult(Task<BigInteger>[] tasks)         {             BigInteger finalResult = 1;              foreach(var task in tasks)             {                 finalResult *= task.Result;             }              return finalResult;         }     } } 

ConcurrentSortedList:

namespace FactorialCalculator.WPF.Infrastructure.Collections {     public class ConcurrentSortedList<T> : ObservableCollection<T>     {         private readonly ReaderWriterLockSlim _lock;         private bool suspendCollectionChangeNotification;          public ConcurrentSortedList()             : base()         {             this.suspendCollectionChangeNotification = false;             _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);         }          public override event NotifyCollectionChangedEventHandler CollectionChanged;           public void AddItems(IList<T> items)         {             try             {                 _lock.EnterWriteLock();                  this.SuspendCollectionChangeNotification();                  foreach (var i in items)                 {                     Add(i);                 }             }             finally             {                 _lock.ExitWriteLock();             }              this.NotifyChanges();           }         public new void Add(T item)         {             try             {                 _lock.EnterWriteLock();                  int i = 0;                 while (i < Items.Count && Comparer<T>.Default.Compare(Items[i], item) < 0)                     i++;                  Items.Insert(i, item);             }             finally             {                 _lock.ExitWriteLock();             }              OnPropertyChanged(new System.ComponentModel.PropertyChangedEventArgs("Count"));         }           public void NotifyChanges()         {             this.ResumeCollectionChangeNotification();             var arg = new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset);             this.OnCollectionChanged(arg);         }          public void RemoveItems(IList<T> items)         {             try             {                 _lock.EnterWriteLock();                  this.SuspendCollectionChangeNotification();                  foreach (var i in items)                 {                     Remove(i);                 }             }             finally             {                 _lock.ExitWriteLock();             }              this.NotifyChanges();         }          public void ResumeCollectionChangeNotification()         {             this.suspendCollectionChangeNotification = false;         }          public void SuspendCollectionChangeNotification()         {             this.suspendCollectionChangeNotification = true;         }          protected override void OnCollectionChanged(NotifyCollectionChangedEventArgs e)         {              using (BlockReentrancy())             {                 if (!this.suspendCollectionChangeNotification)                 {                     NotifyCollectionChangedEventHandler eventHandler =                           this.CollectionChanged;                     if (eventHandler == null)                     {                         return;                     }                      Delegate[] delegates = eventHandler.GetInvocationList();                      foreach (NotifyCollectionChangedEventHandler handler in delegates)                     {                         DispatcherObject dispatcherObject                              = handler.Target as DispatcherObject;                          if (dispatcherObject != null                                && !dispatcherObject.CheckAccess())                         {                             dispatcherObject.Dispatcher.BeginInvoke(DispatcherPriority.DataBind, handler, this, e);                         }                         else                         {                             handler(this, e);                         }                     }                 }             }         }       } } 

BaseViewModel

public abstract class BaseViewModel : INotifyPropertyChanged {     protected readonly Dispatcher _uiDispatcher;      protected BaseViewModel(Dispatcher uiDispatcher)     {         _uiDispatcher = uiDispatcher;     }      #region INotifyPropertyChanged implementation      public event PropertyChangedEventHandler PropertyChanged;      protected void NotifyPropertyChanged([CallerMemberName] string propertyName = null)     {         var propertyChanged = PropertyChanged;         if (propertyChanged == null)         {             return;         }          _uiDispatcher.Invoke(() => propertyChanged.Invoke(this, new PropertyChangedEventArgs(propertyName)));     }      #endregion } 

App:

enter image description here

Source code: enter link description here

Confusion in speed up calculation for pipeline architecture

This is an online question I am trying to solve.

You are given a non-pipelined processor design which has a cycle time of 10ns and average CPI of 1.4.If a pipelined processor having 5 stages are 1ns, 1.5ns, 4ns, 3ns, and 0.5ns, what is the best speedup you can get compared to the original processor?

Approach I: In a pipelined architecture, in a steady state, the CPI tends to be 1 provided there is no fixed percent of NOPs. Thus Speed up = CPI_non_pipelined / CPI_pipelined = 1.4 /1 = 1.4

Approach II: For converting the execution into pipelined, we need to reduce the cycle to match up phase duration. Thus pipelined cycle should be max of {phase durations} = max of {1ns, 1.5ns, 4ns, 3ns, and 0.5ns} = 4ns. So speed up = cycle_duration_non_pipelined / cycle_duration_pipelined = 10/4 = 2.5

Wondering why is this difference! Any help will be much appreciated.

Calculation of the average of the values of a column in a matrix

I am trying to have a list of values which corresponds to the mean of the values of each column of a matrix. Here I show a quick example (my current matrix has a dimension of 1000 * 1000)

If I have matrix like this: matrixExample = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 9, 9, 9}} I would have to obtain a list like this: mean values={5, 5.7, 6.3, 7}.

How can I do that?

numerically stable log1pexp calculation

What are good approximations for computing log1pexp for single precision and double precision floating point numbers?

Note: log1pexp(x) is log(1 + exp(x))

I have found few implementations of log1pexp for double precision but they don’t provide an explanation on how they arrived at the approximations. Hence, I am not able to implement log1exp for single precision numbers (without converting to double precision intermediates of course).

Reference implementation: https://github.com/davisking/dlib/blob/master/dlib/dnn/utilities.h#L16-L29

How to use hook_views to add calculation to a field?

In a custom module (assoc), I have a custom Content Entity (Person) with these fields:

$  fields['lastname'] = BaseFieldDefinition::create('string') $  fields['firstname'] = BaseFieldDefinition::create('string') $  fields['email'] = BaseFieldDefinition::create('email') $  fields['cellphone'] = BaseFieldDefinition::create('telephone') $  fields['balance'] = BaseFieldDefinition::create('integer') 

I use Views to display a table of the persons. Here is the corresponding Views preview: enter image description here Before displaying the output, I’d like to add some value to the field ‘balance’:
balance = balance (of content entity) + somecredit - somedebit
‘somecredit’ and ‘somedebit’ being calculated by a query to the database.
What is the hook_views I should use ?
And how do I do?

Parallel Ramanujan’s formula for 1/π calculation

I finished my university project for calculating $ 1/\pi$ and I would love to get some feedback.

Before you guys jump into this code please keep in mind newcomer to C++ just decided to use it for this project because I assumed libraries will be better than others in terms of arbitrary precision.

Few words about the project, it should be a console application with some parameters like (how many terms of the formula to be calculated, how many threads should be used, log level and more if desired).

I am using MPFR and MPIR on Windows for the arbitrary precision calculations and Visual C++ compiler with no extra flags other than the one set up by Visual Studio (perhaps I can be advised here 🙂 ).

I am interested in following feedbacks priority is in the given order

  1. Performance
  2. Readability
  3. Design?

Since the project tends to be big I will only put the parts I am interested the most (calculation) if anyone else would like to advise me over the overall design please check my repository for the full code base.

main.cpp

boost::multiprecision::mpfr_float inc_multiply(long start, long end) {     boost::multiprecision::mpfr_float fact = 1;     for (; start <= end; ++start)         fact *= start;      return fact; } // <-- the real calculation of pi boost::multiprecision::mpfr_float calculatePi(int count, ...) {     using boost::multiprecision::mpfr_float;     mpfr_float partition;      std::stringstream ss;     ss << std::this_thread::get_id();     std::string thread_id = ss.str();      LOG_VERBOSE("Thread id: " + thread_id + " started.\n");     auto clock_start = std::chrono::system_clock::now();      std::va_list ranges;     va_start(ranges, count);      for (int i = 0; i < count; ++i)     {         auto boundary = va_arg(ranges, range_t);         mpfr_float previous_fac_start = 1.0;         mpfr_float previous_fac_4xstart = 1.0;          for (; boundary.first < boundary.second; ++boundary.first)         {             mpfr_float fac_start = previous_fac_start * (boundary.first == 0 ? 1 : boundary.first);             mpfr_float fac_4xstart = previous_fac_4xstart * inc_multiply((4 * boundary.first - 3) < 0 ? 1 : 4 * boundary.first - 3, 4 * boundary.first);              mpfr_float n = fac_4xstart * mpfr_float(1103 + 26390 * boundary.first);             mpfr_float d = boost::multiprecision::pow(fac_start, 4) * boost::multiprecision::pow((mpfr_float)396, 4 * boundary.first);              partition += (n / d);             previous_fac_start = fac_start;             previous_fac_4xstart = fac_4xstart;         }     }      va_end(ranges);      std::chrono::duration<double> elapsed = std::chrono::system_clock::now() - clock_start;     LOG_VERBOSE("Thread id: " + thread_id + " stopped.\n");     LOG_VERBOSE("Thread " + thread_id + " execution time was " + std::to_string(elapsed.count()) + "ms\n");      return (2 * boost::multiprecision::sqrt((mpfr_float)2) / 9801) * partition; } // <-- used to setup the strategy for calculation boost::multiprecision::mpfr_float calculate(ProgramOptions* options) {     std::vector<std::future<boost::multiprecision::mpfr_float>> futures;      // Choose thread work sepration strategy     // TODO: Fix duplication, think of dynamic creation of strategies.     if (options->IsOptimized())     {         auto strategy = new OptimizedSeparationStrategy();         auto partitions = strategy->Separate(options->GetIterations(), options->GetThreadsCount());          for (auto partition : partitions)             futures.emplace_back(std::async(calculatePi, 2, partition.first, partition.second));     }     else     {         auto strategy = new EqualSeparationStrategy();         auto partitions = strategy->Separate(options->GetIterations(), options->GetThreadsCount());          for (auto partition : partitions)             futures.emplace_back(std::async(calculatePi, 1, partition));     }      boost::multiprecision::mpfr_float pi;     for (auto threadId = 0; threadId < futures.size(); ++threadId)         pi += futures[threadId].get();      return pi; } 

SeparationStrategy.cpp

simple_ranges NaiveSeparationStrategy::Separate(long iterations, short threads) {     simple_ranges ranges;     if (threads == 0)         return ranges;      ranges.reserve(threads);      auto chunk = iterations / threads;     auto remainder = iterations % threads;      auto loopIterations = 0;     for (auto partition = 0; partition <= iterations - chunk + loopIterations; partition += chunk + 1)     {         ranges.emplace_back(partition, partition + chunk > iterations ? iterations : partition + chunk);         ++loopIterations;     }      return ranges; }  advanced_ranges OptimizedSeparationStrategy::Separate(long iterations, short threads) {     advanced_ranges ranges;     if (threads == 0)         return ranges;      ranges.reserve(threads);      auto leftChunk = 0, rightChunk = 0;      if ((iterations / threads) % 2 != 0)     {         leftChunk = (iterations / threads) / 2;         rightChunk = (iterations / threads) / 2 + 1;     }     else     {         leftChunk = rightChunk = (iterations / threads) / 2;     }      long l = 0, r = iterations, previous_l = l, previous_r = r;      while (l < r)     {         ranges.emplace_back(std::make_pair(previous_l, l + leftChunk), std::make_pair(r - rightChunk, previous_r));          previous_l = l + leftChunk + 1;         previous_r = r - rightChunk - 1;          l += leftChunk;         r -= rightChunk;     }      // make last range excluding last number     // as it will be calculated from the second last range.     ranges.back().first.second--;      return ranges; } 

SIMD Mandelbrot calculation

I was messing around with GPU compute shaders the other day and created a Mandelbrot shader. Unfortunately, Metal doesn’t support double-precision in compute shaders, so beyond a certain zoom level, I need to switch back to the CPU. In doing so, I decided to try writing SIMD code for the calculations to make it faster.

In the code below I’m using AVX512 instructions, and I do get a speedup over the scalar code. I break the image into 64×64 pixel tiles and farm them out to available cores. For the scalar code on one particular test image, the average time to calculate a tile is 0.757288 seconds. For the SIMD version below it’s 0.466437. That’s about a 33% increase, which is OK. Given that I’m calculating 8 times as many pixels at once, I was hoping for more.

These are some useful types I use in the code.

typedef struct RGBA8Pixel {     uint8_t red;     uint8_t green;     uint8_t blue;     uint8_t alpha; } RGBA8Pixel;  typedef union intVec8 {     __m512i ivec;     int64_t vec[8]; } intVec8;  typedef union doubleVec8 {     __m512d dvec;     double vec[8]; } doubleVec8; 

And here’s my function for calculating 1 64×64 tile:

- (void)calculateSIMDFromRow:(int)startPixelRow                        toRow:(int)endPixelRow                      fromCol:(int)startPixelCol                        toCol:(int)endPixelCol; {     if (!_keepRendering)     {         return;     }      const doubleVec8 k0s = {         .vec[0] = 0.0,         .vec[1] = 0.0,         .vec[2] = 0.0,         .vec[3] = 0.0,         .vec[4] = 0.0,         .vec[5] = 0.0,         .vec[6] = 0.0,         .vec[7] = 0.0,     };      const intVec8   k1s = {         .vec[0] = 1,         .vec[1] = 1,         .vec[2] = 1,         .vec[3] = 1,         .vec[4] = 1,         .vec[5] = 1,         .vec[6] = 1,         .vec[7] = 1,     };      const doubleVec8    k2s = {         .vec[0] = 2.0,         .vec[1] = 2.0,         .vec[2] = 2.0,         .vec[3] = 2.0,         .vec[4] = 2.0,         .vec[5] = 2.0,         .vec[6] = 2.0,         .vec[7] = 2.0,     };      const doubleVec8    k4s = {         .vec[0] = 4.0,         .vec[1] = 4.0,         .vec[2] = 4.0,         .vec[3] = 4.0,         .vec[4] = 4.0,         .vec[5] = 4.0,         .vec[6] = 4.0,         .vec[7] = 4.0,     };      UInt64      maxIterations   = [self maxIterations];     NSSize      viewportSize    = [self viewportSize];     for (int row = startPixelRow; (row < endPixelRow) && (_keepRendering); ++row)     {         RGBA8Pixel* nextPixel   = _outputBitmap + (row * (int)viewportSize.width) + startPixelCol;         double      yCoord      = _yCoords [ row ];         doubleVec8  yCoords;         for (int i = 0; i < 8; i++)         {             yCoords.vec [ i ] = yCoord;         }         double*     nextXCoord  = &_xCoords [ startPixelCol ];         for (int col = startPixelCol; (col < endPixelCol) && (_keepRendering); col += 8)         {             __m512d as = _mm512_load_pd(nextXCoord);             nextXCoord += 8;             __m512d bs = yCoords.dvec;             __m512d cs = as;             __m512d ds = bs;              UInt64 scalarIters = 1;             __m512i iterations  = k1s.ivec;             __m512d dists       = k0s.dvec;             __mmask8 allDone    = 0;             while ((allDone != 0xFF) && (_keepRendering))             {                 // newA = a * a - b * b + c                 __m512d newA;                 __m512d newB;                 newA = _mm512_mul_pd(as, as);                 newA = _mm512_sub_pd(newA, _mm512_mul_pd(bs, bs));                 newA = _mm512_add_pd(newA, cs);                  //double    newB    = 2 * a * b + d;                 newB = _mm512_mul_pd(_mm512_mul_pd(k2s.dvec, as), bs);                 newB = _mm512_add_pd(newB, ds);                  as = newA;                 bs = newB;                  dists = _mm512_mul_pd(newB, newB);                 dists = _mm512_add_pd(_mm512_mul_pd(newA, newA), dists);                 __mmask8 escaped = _mm512_cmplt_pd_mask(dists, k4s.dvec);                  iterations = _mm512_mask_add_epi64(iterations, escaped, iterations, k1s.ivec);                 scalarIters++;                 __mmask8 hitMaxIterations = (scalarIters == maxIterations) ? 0xFF : 0;                  allDone = ~escaped | hitMaxIterations;             }              intVec8 iters = { .ivec = iterations };             for (int i = 0; i < 8; i++)             {                 UInt64  nextIteration = iters.vec [ i ];                 if (nextIteration == maxIterations)                 {                     *nextPixel = kBlack;                 }                 else                 {                     *nextPixel = kPalette [ nextIteration % kPaletteSize ];                 }                 nextPixel++;             }         }     } } 

I’m new to Intel SIMD instructions and frankly find them quite confusing. If there are better ways to do any of the above, please let me know. I tried using the fused multiply-add and multiply-add-negate instructions, and they made the code significantly slower than using 2 or 3 separate instructions, unfortunately.