Synchronization of data across microservices

We have about 2 to 3 dozen of microservices that serve our customers. These services are deployed in a kubernetes cluster. These services are accessible to the outside world only through 3 or 4 api gateways

We realized that sometimes the same data is needed by two or more microservices. We have evaluated a couple of stragergies to solve the problem, and we have implemented this in pieces. Like any design, we are not 100% sure if this is a right approach and if we are not seeing the pitfalls in the design. Comments/Suggestions/thoughts by people who are experienced in this this will be helpful.

Case 1: When a service of lesser business importance (say ServiceL) needs data from serivce of higher buiness importance (say ServiceH) Then ServiceL calls ServiceH to get the necessary data

Case 2: When a service of lesser business importance (Say ServiceL) needs data from many important services (say ServiceH1, ServiceH2, and so on). Then ServiceH1, ServiceH2 and so on publishes messages via RabbitMQ. The publishing of messages is by non-blocking fire-and-forget mechanism. (So that these services are not stalled) The ServiceL consumes these messages and stores the data in it’s own data store. We are okay with the delay in the data becoming available to ServiceL

Case 3: When business important service (say ServiceH) needs data from lesser important service (say ServiceL). Then ServiceL publishes messages via RabbitMQ by fire-and-forget mechanism or by blocking mechanism (depending on urgency in syncing data) The ServiceH consumes the message and stores it in it’s data store. Often the data is needed by ServiceH for reports and summary. And we are okay with the summary not being perfectly up to date immediately

Case 4: When data is needed by two services and both of them not only read data but also modify, then we believe the domain identification is wrong. We redesign in this case. (Many a times merging these two microservices as one)

Now when we use messaging framework like RabbitMQ for syncing data across services, over a period of time we observed data is going out of sync. When data goes out of sync, we could see the statistics from RabbitMQ and replay messages. (This we believed brings in unnecessary complexity) So we are having jobs that run once a day to sync the data from the source service to the destination service. (Sync job access data via services and not directly from the data stores) Is this a good practice to sync data like this? Any pitfalls?

How to schedule export and import of Aurora snapshots across AWS accounts

I am trying to build the following process, to be ran on a schedule:

  1. Export, from an AWS account A, the content of an Aurora database
  2. Transfer that dump to another AWS account B
  3. Import the dump in another Aurora database on the account B

Here are two of my constraints:

  • I need to be able to put the system in place for multiple databases with different target DBs
  • The import DB might not have the same name as the export DB
  • Both databases are in the same region

I looked online quite a bit and I was able to find the following article: Automating Cross-Region and Cross-Account Snapshot Copies with the Snapshot Tool for Amazon Aurora

However, it doesn’t cover automatically restoring the backup and my attempts at finding a way to do so (using a Lambda for example) have not yield great results.

Is there a way to do so?

Thanks

Differentiating and comparing MacBook Pro models across different generations

What details (such as dates/year, identifying numbers, etc.) distinguish 2nd generation 13″ MacBook Pro models from 3rd generation?

These 2nd generation models featured the older more reliable keyboard, power adapter, memory card slot, USB-A slots. I am shopping for a used device and would like to know how to differentiate one of the most recent devices of this generation (prior to the more problematic keyboard design, Touch Bar, etc).

Best strategy for OAuth2.0 across browsers and across tabs within the same browser?

I have developed a login system using OAuth2.0 that is currently working within one tab in one browser. Without diving into the code, the system works by having the user enter their credentials to login, the credentials are sent up to the API (WebAPI2 .Net) from the client (Angular), the user is logged in via Active Directory, and then an access token and refresh token are returned to the client.

I am currently storing these within cookies. When an access token expires, the refresh token is grabbed from the cookie and used to get a new access token. All of this is working within one browser tab. However, we want to allow the user to be logged in to multiple tabs within a browser, and/or multiple browsers at the same time without effecting eachother (like facebook, etc.) Cookies are the wrong solution for storing these tokens because they don’t communicate between different tabs, so I did some research and found localStorage which probably solves the between tabs problem.

However, I wanted to get some opinions on best strategy for being logged in across multiple tabs/browsers with the same login using OAuth2. I don’t think you can store the tokens in SQL because then the call to the API isn’t secure to get the refresh tokens across browsers. Do you create separate tokens in each browser with the same login and then localStorage in each browser to share that browser’s token? Or is there a way to share refresh tokens safely across multiple browsers?

Data propagation across components

I frequently have relatively contained components (services) that do specialized things. These are also almost always immutable to make changing them less error prone. Note the below is within one relatively big project, not small services in separate processes, but that doesn’t change much.

Take this system as an example:

  1. Input: cardboard package that contains red books
  2. Processing
    1. Take a book
    2. Do some complex processing such as:
      1. If book has less than 50 pages, reject with message “Too short”
      2. Otherwise, read pages 15, 21 and 38
      3. Find all words that start with X on pages 15 and 21
      4. If we have more X-words on page 15, find the number of occurrences of the word “play” on page 38, otherwise read the last word on that page. Make that be A
      5. Write a new book where you change all the words “some” to A. If there are no words “some” at all, issue a warning message “No matching words found”
      6. Print it and color the covers blue
  3. Output: plastic package that now contains blue books and various rejection or warning messages as needed

Consider that we now have a lot of services that perform things in the 2.2 part above. E.g. there would be a service that writes a book in step 2.2.5.

Consider also that the rejection or warning messages are to be propagated across all services. In the above simple case that might not sound as much, but consider that this is usually much deeper than the above example. E.g. we might have something like:

[one thread] book service -> chapter service --> title service --> paragraph service ---> sentence service ----> word service --> footnote service  [another thread] printing service -> font chooser service -> cover coloring service 

This means that either we keep global state (yuck) or create gazillions of separate objects (SentenceServiceOutput, PrintingServiceOutput, …) and make each of the services return all pieces, i.e. write book would return (book, reject messages, warning messages) tuple and then wire them as needed in some parent service.

The 2nd solution has some drawbacks in that it:

  • Pollutes the code significantly and make it very hard to read and trace through. E.g. instead of a service (2.2.3 above) that returns a list of words found, it would return an object ([words], [reject message], [warning message])
  • Makes the code significantly longer and more convoluted – basically wherever we had [words] we now have to unpack ([words], [reject message], [warning message]), change a small piece and then re-pack into a new ([words], [reject message], [warning message]).
  • All this makes it significantly harder to keep the overall project structure in your head, since the number of names (e.g. class names) is increased dramatically and due to their number names become longer (you cannot call it just Output, since you’ll mix SentenceServiceOutput and FontChoosingServiceOutput), making the code even longer and harder to read
  • While it still makes it significantly easier to reason about smaller pieces than using global state, makes the code changes significantly slower and harder to audit, test & such

Are there any patterns which allow for mitigation of any or all of the above issues?

What open source tool can someone use to manage multiple servers scattered across different providers in 2019?

…and preferably with a web GUI?

For example solutions like Proxmox are really nice and polished, but only if you’re dealing with Proxmox instances obviously. Sure you can hook them up altogether and manage them from a single instance. And it’s no wonder that dedicated server companies have been providing ready-made ISOs of Proxmox along with ESXi as the top 2 solution to spit a dedicated server into multiple VMs.

But what if you manage VMs, containers or dedicated servers scattered across different public clouds or hosting providers? E.g. KVM based VMs on DigitalOcean/Linode, LXC containers on Proxmox hosts, pure dedis at OVH/Hetzner and so on? Ideally agentless, although I wouldn’t mind installing an agent if it made life easier overall.

The goal is to have something that’s easy to setup and maintain. Use a single server as “master” node and then connect and manage any other slave server (virtual, container or dedicated) using SSH commands or some agent. A simple web GUI could be used to report health and resource usage and (why not) perhaps execute shell commands remotely.

There is no need for compute/storage/networking separation obviously.

So far I’ve looked at:

a) orchestration tools like Ansible, Chef, Puppet etc. or Kubernetes/Swarm (albeit geared primarily if not exclusively for containers) but they are either limited to CLI or truly overkill to setup.

b) OpenStack = shoot me now (sorry OpenStack users)

c) OpenNebula probably comes closer to such a need but it also seems kinda overkill to setup (but not overly complex if you’ve tried Kubernetes).

d) oVirt looks really nice but it requires CentOS as base OS to work (for the slave nodes) – it’s an issue if a remote VM is already setup with Ubuntu obviously.

Thanks in advance for any suggestions/pointers.

Tracing out failed “nested” delegated domains across BIND and AD

Have a parent domain: hours.com Hosted on BIND

Have a delegated subdomain: minutes.hours.com Hosted on AD DNS

Have yet another subdomain which is delegated in above zone: seconds.minutes.hours.com This is also defined, and hosted on the same AD DNS

Queries using either the BIND or AD DNS servers for hours.com and minutes.hours.com work flawlessly. All glue records defined in correct context.

Queries using the AD DNS servers for seconds.minutes.hours.com also work flawlessly.

Queries using the BIND server for seconds.minutes.hours.com fail with NXDOMAIN.

I can’t seem to trace this out. Confirmed glue record for seconds.minutes.hours.com exist and are resolvable if I query the actual A record from either nameservers. I thought that the resolver would hit the BIND nameserver, which would then be “passed off” to the AD DNS server which would then respond with the NS record for the second delegated zone but this is not taking place.

Appreciate any direction on what I should try or if there is something I am missing.

How to pull last value in a Google Sheets column and populate it across another worksheet column?

This is my first post here- I hope this is the right place for it, apologies in advance if it isn’t.

Context:

I have a password protected presentation that I need to give users access to, but only after they have submitted their information to me via a Google Form. There is only a single password that is used to access the presentation, and administrators will want to update it every now and then.

I created a Google Spreadsheet with two worksheets in it that I’ve setup to accept data from two separate Google Forms. The worksheet “User information” stores user data, the other worksheet “Presentation password” stores the latest password to the presentation (each row in this spreadsheet contains a password that was used to access the presentation, and only the most recent row is the current password).

I’ve setup a Zapier “Zap” (automation) to the Google Form so that when a new user submits their information to gain access to the presentation, an email is automatically sent to the user with the link to the presentation, which is hosted online. I want to be able to send the most current presentation password to the user in this email, and in order to do this, I need to find a way to constantly and automatically pull over the most recent password from the “Presentation password” worksheet into a column in the the “User information” worksheet, even as new rows are added to “Presentation password”.

A separate Google Form used by administrators of this presentation is used to add a new row to the “Presentation password” worksheet. I want them to use this Google Form as an easy way to update the password that appears in the Zapier email.

Question

Whenever a new password is submitted to the “Presentation password” worksheet, a new row is added below the most recent data. I need to pull the password in the last row of password data, and to automatically replicate it across all rows in column F of the “User information” worksheet (or at least, to all rows that have user information in them). My hope is that by doing this, Zapier (which can only populate emails with data from a single worksheet) will have the data that it needs to email users with the latest password data, and administrators of this presentation will be able to use a Google Form to update the password that appears in this email, rather than having to login to Zapier to manually type it in.

Is what I’m trying to do possible? I’ve tried some arrayformula and index/count stuff that I found while searching around, but wasn’t able to get any of them to do exactly what I’m looking for (the arrayformula that I setup with index/count didn’t automatically populate across the entire column, or for new rows that are added when new data is entered).

Please see this sample worksheet that I put together for reference, and feel free to edit it.