MQ integration: How to notify consumers of upcoming message format changes?

We have multiple microservices communicating over MQ. As MQ messages are the interface/contract between the services, whenever we make changes to the MQ message published by a service we need to make the same adjustments on the services which consume the message.

As of now, the number of services is small enough so that we know which services communicate with each other, and can keep the MQ message contract in sync between them. But as the number of services grow this becomes harder.

Option 1: Break things first, then fix it

I’ve been thinking maybe of implementing some kind of health check. Let’s say service A during operations may emit message type X, which is consumed by service B. Service A could then on startup emit a health check type of message, something in the lines of a message X dry-run. When service B receives this, it simply verifies that the message is according to contract. If not, for example if service A have remove a critical field in the message, then service B will reject the message which in turn will end up on a dead-letter exchange, which again will trigger a warning notification to the devops staff.

This approach won’t prevent us from deploying non-compatible message types, but will notify us pretty much instantly when we do. For our use case, this might work due to our very small number of developers and projects, so if we break things like this we’d be able to fix it quite quickly.

Option 2: Early probes

A variation over this might be that we start versioning the MQ message format (which we probably should and will do anyway). Then, when service A plans to upgrade from version 1 of message type X to version 2, server A could early on start emitting “dry-run” type of version 2 of message type X. This would cause service B to drop the message. Say this happens a few days or weeks before service A perform the actual switch from version 1 to version 2, then the devops team will have time to add support for version 2 in the mean time.

Option 3: Manually detecting conflicts before deployment

Another approach would be to have some way of detecting – before the actual deployment – that service A is about to start emitting non-compatible messages in the first place. This would mean that would need to maintain some matrix or something over which versions of message X is support by which consumer, and defer deploying service A (with the new version of message X) until all the consumer are ready for it. How to implement this effectively I don’t know.

Other alternatives

How does other handle message type compatibility between services that communicate using MQ – how do you know that when your service A makes a change to message type X, it won’t break any of the consumers?

PS. I posted this over at Reddit a few days ago, but due to the lack of feedback I decided to post here as well.

Would I have one domain model (.NET Project) for all consumers or one domain model (.NET Project) per consumer?

Say I have a bounded context called: ‘Loans’ and the following APIs:

HSBC NatWest TSB 

The three banks above are consumers and have an API. I am using the scatter gather pattern (https://www.enterpriseintegrationpatterns.com/patterns/messaging/BroadcastAggregate.html).

Would I have one domain model (.NET Project) for all three consumers or one domain model per consumer (.NET Project)? I believe I should have one domain model per consumer as the domain logic is only relevant to that consumer.

This is more of a though experiment rather than a real business problem. However, we will have to do something similar with the scatter gather pattern at some point so hence the question.

I realise both approaches I have described will work. I am talking more from the perspective pf the principle of least astonishment.

Functionality design: Multiple sources, multiple consumers with configuration

It is a Design Problem which I am listing out here.

I have multiple event sources in my app, each one produces events with the different set of data (but each source is producing only one type of event). On the other side, I’ve got multiple actions, that can be triggered by sources.

Sources and actions may be extended in future (There may be more of them).

Which action is triggered when is stated in a configuration that can be changed during runtime – in other words, user suppose to have opportunity to say, for example, “when source 1 will produce event, then fire action 2” or “when source 2 will produce event, then fire action 1 and action 2” etc.

What’s more, events should be able to be filtered based on their settings, for example, the user can set “when source 1 will produce the event, then fire action 2 but only when the produced event contains date lower than today”

I’ve tried to draw something to maybe better illustrate my problem (and solution I was thinking of):

enter image description here

As it is written in the picture, I think that kind of mediator pattern should be used here.

But I have the following problems:

  1. I’m not sure if the mediator is a good place for filtering. I’ve marekd that configuration is injected here, but still not sure about it.

  2. Each of actions should be able to handle each of source – but since events are different, without the same ancestor, each action will have to be able to consume each event type. So when adding a new source, I will need to add handling method in each action. And I’m thinking if there is a way to avoid that.

Algorithm to assign producers to consumers with respect to connections

I am trying to analyze supply chains in a game and have come across this problem:

First, an informal description: I have producers and consumers. Each producer produces a certain amount of goods, each consumer requieres a certain amount of goods. Each producer is connected to some consumers (these connections have infinite capacity).

Is there a way to transport goods from producers to consumers so that each consumer has sufficiently many goods? Each producer can provide goods to multiple consumers and each consumer can accept goods from multiple producers. A producer also doesn’t have to deliver the entire production, as the rest can be discarted.

The connections form a bipartite graph with producers $ p \in P$ and consumers $ c \in C$ as vertices (so $ V = P \mathbin{\unicode{x228D}} C$ ) and edges $ E \subseteq (P \times C)$ . Each producer has a positive production rate and each consumer a positive consumption rate.

We model how much is transported over each connection with a weight function $ w: E -> \mathbb{R}$ . For each producer $ p$ , we get the required production by summing up the weights of all edges incident to $ p$ . This must not be higher than the production of $ p$ .

Similarly, for each consumer $ c$ , we get the total delivery by summing up the weights of all edges incident to $ c$ . This must not be lower than the consumption of $ c$ .

Does such a $ w$ exist for a given graph, production rate and consumption rate?

The left example has such a function, the right one does not:

Instance where such a weight function exists enter image description here

I am pretty sure that if we allow negative $ w(e)$ , the problem is very easy (for connected graphs, just compare total production to total consumption). Therefore, it probably makes sense to restrict $ w(e) \geq 0$ .

I have tried to find similar problems, but most flow problems have limited flow rates and sources with infinite capacity (whereas here, it’s exactly the other way around). I also know about the Assignment problem, but I don’t think it applies here.

Perhaps there is a way to slowly remove consumers and producers. In the first example, we know that the top producer can only supply the top consumer, so we could just remove the producer and reduce the consumption rate on the right. It might also be possible to merge multiple producers and consumers if they all supply each other. However, I don’t think that these operations alone suffice to solve the problem in all instances.

Perhaps there is no efficient algorithm, so I’ve also tried proving that the problem is NP-complete, but my attempt to reduce SAT to this problem wasn’t successful.

This probably isn’t the most enjoyable way to play Anno, but at this point, it don’t want to give up…

Extending a data class consumed in various place because one of the consumers requieres a new filed

I have a class A that is the return type of an process (let’s call it M.getData()) that is consumed by several procedures through the code base (lets call them P1,P2 and P3). This class A is part of a third party library that I can’t modify and is populated with data coming from a XML response from an external service. Up until now the XML file came with some extra data that was ignored, but now it is required only in the procedure P1. Since I can’t touch the A class, would it be a good practice to extend class A (A_Ext), change return type of M from A to A_Ext and handle the new requirement in P1 while letting P2 and P3 still use class A through an implicit casting like A var1 = M.getData() that now returns A_Ext?

The code base is in Java.

Google + Closing for consumers, back up your data now

I would hope that most of you already know that Google+ is closing down for consumers on April 2, 2019.
If you value the content you’ve posted you should download the data now, which includes Google+ circles, Communities, Streams, and +1¬ís.

Here’s how to do it.

https://support.google.com/plus/answer/1045788

Note, Google is not deleting the profile, just the data.

Using multiple consumers with CosmosDB change feed

I am trying to use cosmos db change feed (I’m referring to https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed-processor and https://github.com/Azure/azure-cosmos-dotnet-v2/tree/master/samples/code-samples/ChangeFeedProcessorV2).

When I start a multiple instances of a consumer, the observer seems to see only 1 partition key range. I only see a message – Observer opened for partition Key Range 0 and it starts receiving the change feed. So, the feed is received by only 1 consumer at any given point. If I close one consumer, the next one picks up happily.

I can’t seem to understand the partition keys / ranges in cosmos db. In cosmos db, I’ve created a database and a collection within it. I’ve defined a partition key – /myId. I store a unique guid in myId. I’ve saved about 10000 transactions in the collection.

When I look at partition key ranges using api (/dbs/db-name/colls/coll-name/pkranges), I see only node under PartitionKeyRanges. Below is the output I see

{     "_rid": "LEAgAL7tmKM=",     "PartitionKeyRanges": [         {             "_rid": "LEAgAL7tmKMCAAAAAAAAUA==",             "id": "0",             "_etag": "\"00007d00-0000-0000-0000-5c3645e70000\"",             "minInclusive": "",             "maxExclusive": "FF",             "ridPrefix": 0,             "_self": "dbs/LAEgAA==/colls/LEAgAL7tmKM=/pkranges/LEAgAL7tmKMCAAAAAAAAUA==/",             "throughputFraction": 1,             "status": "online",             "parents": [],             "_ts": 1547060711         }     ],     "_count": 1 } 

Shouldn’t this show more partition key ranges? Is this behavior expected?

How do I get multiple consumers to receive data as shown under https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed-processor?

Maintaining order of events with multiple consumers in a typical pub sub scenario

I am using Kafka. I am developing a simple e-commerce solution. I have a non-scalable catalog admin portal where products, categories, attributes, variants of products, channels, etc are updated. For each update, an event is fired which is sent to Kafka.

There can be multiple consumers deployed on different machines and they can scale up or down as per load. The consumers consume and process the events and save changes in a scalable and efficient database.
Order of events is important for me. For example, I get a product-create event. A product P is created and lies in category C. It is important that event for the creation of category C is processed before the product-create event for product P. Now if there are two consumers, and one consumer picks up product-create event for product P and the other consumer picks up event for creation of category C, it may happen product-create event is processed first, which will lead to data inconsistency.
There can be multiple such dependencies. How do I ensure the ordered processing or some alternative to ensure data consistency?

Two solutions that are right now in my mind:

  1. We can re-queue an event until its dependent event is successfully processed.
  2. We can wait for the dependent event to get processed and try processing the event at some intervals say 1 second with some maximum retries.

Requeuing has issues that event is now stale and no longer required. Example:

  • Initial Order = Create-Event(Dependent on event X), Event X, Delete-Event .
  • After Requeuing, Order = Event X, Delete-Event, Create-Event(Dependent on event X).
    Create event is processed after delete event again leading to inconsistent data.

The same issue is applicable to the second solution (waiting and retrying).

Above issues can be solved by maintaining versions for events and ignoring an event if the targeted object(which is going to be modified by the event) has a higher version than that of the event.
But I am very unsure of the pitfalls and the challenges of the above solutions that might not be very obvious right now.

PS: Stale data works for me but there should be no inconsistencies.