There are 60 Million Shipments per day. Each shipment has about 50 metrics to be calculated. Each metric is calculated based on a type of the event(Let’s say event_1
has the required information to calculate metric_1
, event_2
.. metric_2
and so on). All the events are independent of each other apart from one dependency, a single event(let’s say event_1
) which has vital information required to process each of the other events.
The current design:
(In Order)Scenario 1:event_1
arrives first, we calculate metric_1
and store the vital information required to process other events in DynamoDB. Other events(event_4
, event_2
,….) arrive and are processed by accessing the information from DynamoDB.
(Out of Order)Scenario 2: event_3
arrives first, system checks for required information in DynamoDB and fails, the system places the event in the dead letter queue to be retried after a period of time. One event_1
arrives and is processed, the other events go through.
Is using a data store and retry mechanism the right approach to resolve the dependency on the base event(event_1
)?
Are there better approaches/patterns to solve the event dependency problem?
Additional Context: Although I believe this information is irrelevant, I am giving it anyway if it helps. Source of Events: SNS topics, Event Processing: SNS->SQS->Lambda, Data Store: DynamoDB, Metrics are stored in RedShift.