Applying different architecture lenses in software systems

Applying different architecture lenses in software systems

In software, unlike physical constructs, there are unlimited choices a software engineer can make while building Software Systems. The concept of design pattern came to play to describe common patterns, what value do they add and when they can be useful. Of course, not all patterns are valuable and should be used all the time. Design patterns help us, software engineers, make use of higher level constructs that are known in the software industry.

Similarly, in the field of distributed systems, there are different architecture patterns that have their characteristic advantages as well as downsides. There are huge opportunities for learning when assessing those patterns and accompanying common trade-offs across different architecture lenses. Two of them in particular:

  • Application Architecture: How you organize classes and objects in an application codebase as a single deployable unit.
  • Distributed Systems Architecture: How you organize many applications, microservices and other system components. And how they interact with one another to achieve the overall desired outcome.

Of course the dynamics of those 2 dimensions are different. We therefore cannot generalize and claim that a certain pattern that works well in a dimension is expected to work exactly as well in the other dimension. For example, one would try to avoid duplication on the same codebase level by extracting shared logic in common packages and use it across the codebase. But the same principle should not always be applied in a distributed system architecture, since this will be a form of undesired coupling between different independent microservices. In distributed systems, a little duplication is better than a little coupling.

Being conscious of such differences in dynamics, the thought process can be quite useful nonetheless. Let’s take a look at some examples:

The Big Ball of Mud

The Big Ball of Mud is a state where the software system lacks perceivable architecture. The system can be full of spaghetti code, where every piece of the code can talk directly to another piece. The nodes of interactions, if graphed, form what we call “the big ball of mud.” This is a clear sign of unregulated growth and lack of refactoring in the codebase of the application.

However, this can also apply in the interactions of microservices. Imagine hundreds of services where each service can talk directly to any other service. Guess what you see when you draw such a dependency graph? Yup, a big nasty ball of mud.

With Engineering teams obsessed with the physical boundaries of their services, it can be quite easy to grow into such a situation. After all, the path to that is just unregulated growth, lack of refactoring, patching quick fixes and capabilities for a reasonable amount of time. But unlike application code, you do not have static code analysis that would capture increased complexity or cyclic dependencies.

On the other hand, a modular monolith gives more structure to the code. Despite still being a monolith and doing too much when compared to the world of microservices, it is still quite maintainable and possible to evolve – not an all in all anti-pattern, unless it grows into a ball of mud. A similar concept in distributed systems would be to encapsulate a group of services that act together to achieve a high level capability and recognize them as a system that is responsible for a certain business domain. In such systems, you do not deploy all components at the same time, thus re-architecting happens by gradually adjusting the scope and the touch points of internal system components while maintaining the overall outcome of the system, not a single service or component. This way of thinking encourages refactoring, aka re-architecting in this context.

Thinking systems as first class citizens

In Careem, We acknowledge the importance of going beyond services and thinking Systems. This is one of core design principles: The building blocks of our infrastructure revolve around systems. But what does that mean? Well, a few things:

  • Continuous refactoring of a code base for readability and maintainability. Similarly, re-architecting a system for better cohesion, efficiency, and maintainability is something encouraged and has a great ROI.
  • It is typical to fall into the trap of setting SLIs and SLOs on services, but there is great value in setting business meaningful SLOs on the overall system. Actually this is where all the meaningful metrics need to be, not on a specific internal implementation point. For example: a system is composed of 4 different services, and we want to track the error rate on the 4 services, which is a great thing to do. But if we have the error rate of those internal services as first level metrics, this will discourage re-architecting and changing those metrics. SLO on each service is still important as an internal metric and objective, but when you measure overall system reliability, you should measure it on the system level.
  • A single application should have a clear purpose and value proposed. Similarly, a system must have a high level purpose, a promise, and ways to confirm it delivered on that promise or not. it should also show a high level of cohesion.
  • While the rate of change in the structure of the system is much slower than the rate of change in the codebase, It should be much higher than the rate of change in the overall system and external contracts.

Hiding Internal details

One of the interesting concepts as well is hiding internal details. When writing a library or a module in the codebase, Software Engineers are encouraged to hide internals through the use of private and package private visibility. This ensures that internal classes will not be used unexpectedly by other pieces of the application. It will also regulate the access to the library through predefined touch points, therefore hiding the internal implementation details. Hiding the complexity and preventing undesired access is a crucial aspect. But lots of Engineering teams do not do a similarly good job when it comes to the architecture of distributed systems.

Hiding backend systems, encapsulated behind a gateway, lots of engineers would still use a domain or URI prefix to decide the service being routed to, for example /service-a/uri-1 and /service-b/uri-2. This way, the gateway is indeed used to regulate and restrict traffic to the internal services service-a and service-b, but it is not really helping in hiding internal details from external clients. Details of the internal components of the system gets exposed to the public. A refactoring (where ”service-a” and ”service-b” are deprecated and the scope is moved to another ”service-c”) cannot be done internally without the clients knowing, which breaks the encapsulation concept. There might still be a case for creating a public API layer, a facade, without exposing which services or actual end points will be hit under the hood. All APIs exposed by the system’s facade (experience layer) need to be coarse APIs that implement the exact use cases of the system, i.e. the use case coordination needs to be implemented there.

Such a construct gets missed if you just hide hundreds of microservices behind a gateway as a functional layer of the stack without taking the semantics and business meaningful APIs into consideration, i.e. the public interface.

Restricting access to internal system modules can be achieved through network restrictions or service to service authentication. Either way, the emphasis on the acceptable touch points for a system vs internal details is the main point to consider.

In Careem and other tech companies in the industry, such considerations are a matter of continuous learning and evolution of our systems, and our learning & discipline while building those systems.

Conclusion

The ability to shift gears and see problems from different lenses provides a great opportunity for Software Engineering. This challenges some existing norms and also pushes engineering teams to utilize learnings from different domains and different known patterns, as well as common pitfalls. More importantly, it helps Engineering teams apply learnings in a fast-paced changing domain, the codebase, to the world of distributed systems to help speed up the evolution of those systems.

Read More Related Articles

footer background
Careem Engineering

Everyday Life, Made Simple.