Unconditional Code for Mapping of Protobuf in Golang

Unconditional code is code that just works, without errors or exceptions, without funny edge cases represented with tedious IF statements. In this article I will share how this can be applied to situations in Golang where you need to map incoming protobuf messages to internal or external (i.e. outgoing) structures.

A Word about Golang

Golang is particularly sensitive to “ify” code due to the existence of nil and the decision of having errors as values returned by functions. If you have a function like:

func Foo() (*Result, error) {
  ...
}

func UseFoo() error {
  res, err := Foo()
  if err != nil {
    // handle, wrap or at least return `err`
  }
  if res == nil {
    // handle nil
  }
  // Do something with *res
}

You are forced by the author of Foo to handle both an error and a nil case.

If you are dealing with data intensive applications, you might be in a situation where you want to map some upstream/incoming data to some internal representations for further processing.

Example

I will use eCommerce as an example. Imagine we have a system composed of several services communicating over gRPC. We have a Checkout service that calls Document service, passing an order.Order object. The order contains all necessary information to issue a purchase document, and the Document service needs to map the upstream data to its internal model.Document struct to render it and send it to the customer.

Here is a segment that deals with the user part of the order.Order:

func FromUser(u *user.User) (*model.Customer, error) {
  if u == nil {
    return nil, nil
  }
  var customer model.Customer
  customer.Email = u.Email
  if u.FirstName == "" && u.LastName == "" {
    return nil, fmt.Errorf("customer must have first or last name")
  }
  customer.Name = strings.TrimSpace(strings.Join([]string{u.FirstName, u.LastName}, " "))
  if u.BillingAddress == nil {
    return nil, fmt.Errorf("customer must have billing address")
  }
  customer.Country = u.BillingAddress.Country
  customer.City = u.BillingAddress.City
  return &customer, nil
}

There are several problems with this code, the most obvious being it is difficult to read. It might be convenient to put the blame on the language. However, in my experience programming languages have rarely been the cause of trouble.

I see the the following issues with this code:

  • It does mapping and validation at the same time.
  • Calling function will have to deal with both the error and the nil case.
  • It is really hard to read.
  • It is hard to test – we always need to provide a complete user.User object and assert the entire mapped structure, vs having a table test case which asserts the mapping of name.

The last problem can easily explode throughout the codebase. Consider a FromOrder function that maps the deep structure of order.Order to an internal Document one:

func FromOrder(order *order.Order) (*model.Document, error) {
  if order == nil {
    return nil, fmt.Errorf("order is nil")
  }
  if order.User == nil {
    return nil, fmt.Errorf("user is missing")
  }
  customer, err = FromUser(order.User)
  if err != nil {
    // ...
  }
  // ...
}

func TestFromOrder(t *testing.T) {
  testCases := []struct {
    desc string
    in   *order.Order
    want model.Document
  }{
    {
      desc: "should create document",
      in:   &order.Order{
        // ...
      },
      want: &model.Document{
        // ...
      },
    },
  }
  for _, tC := range testCases {
    t.Run(tC.desc, func(t *testing.T) {
      // ...
    })
  }
}

We have three options to make the test works:

  • Initialize the complete order.Order and user.User objects to get passed the mapper validations. This will make our tests long and hard to maintain – any changes to the smaller mappers (e.g. FromUser) will break the TestFromOrder.
  • Use indirection – hide the mapping of different parts behind interfaces that can be mocked. This will create clatter in our mapper package and we will take a performance hit.
  • Not test the FromOrder function.

Luckily, there is a forth option – unconditional code.

Unconditional Go

We will take advantage of two very nice features of Golang and protobuf:

  • nil receivers. I fell in love with the idea of sending an empty message to an object from Objective-C and I was thrilled to see this in Golang. Since the methods in Golang are functions with receivers (which can be considered just another parameter), nothing prevents us from having the receiver parameter nil when it is a pointer. Probobuf code generation takes full advantage of this fact. It is quite simple actually (as Golang is always aiming for simplicity):
func (s *Struct) GetData() string {
  if s == nil {
    return ""
  }
  return s.Data
}

var ptr *Struct // nil
fmt.Println(ptr.GetData()) // this works fine
  • Range over nil. Iterating over an empty or nil slice would render the same result – the body of the loop will not get executed at all. In Golang, you don’t need to explicitly check for nil in those situations. Moreover, I can append to a nil slice.
func get() []string {
  return nil
}

func main() {
  for _, s := range get() {
    fmt.Println("never", s)
  }
  var slice []string // nil
  slice = append(slice, "first element")
}

Armed with those tools, we can now apply unconditional code to our mapping, i.e. make the code always work, without explicit checks, errors, nil returned values or panics:

func FromOrder(order *order.Order) model.Document {
  return model.Document{
    Customer:  FromUser(order.GetUser()),
    LineItems: FromItems(order.GetItems()),
  }
}
func FromItems(items []*order.Item) []model.LineItem {
  var lineItems []model.LineItem
  for _, item := range items {
    lineItems = append(lineItems, FromItem(item))
  }
  return lineItems
}
func FromItem(item *order.Item) model.LineItem {
  price := decimal.New(item.GetProduct().GetPriceE2(), -2)
  qty := decimal.New(item.GetQuantity(), 0)
  return model.LineItem{
    Description: item.GetProduct().GetDescription(),
    Total:       price.Mul(qty),
  }
}

func FromUser(u *user.User) model.Customer {
  return model.Customer{
    Name:    CombineNames(u.GetFirstName(), u.GetLastName()),
    Email:   u.GetEmail(),
    Country: u.GetBillingAddress().GetCountry(),
    City:    u.GetBillingAddress().GetCity(),
  }
}
func CombineNames(names ...string) string {
  return strings.TrimSpace(
    strings.Join(names, " "))
}

Now the code is much cleaner (and shorter), as it is doing one thing – mapping. It is also much easier to read, test and faster to write. Notice how the lack of error return value liberates us from all the clatter of Golang’s idiomatic error handling.

Invariants

So far so good, but what can we do about the validation rules that used to be embedded in the mapper? We can’t just delete them, right?

An invariant is a property of an object that is always true. In our example, we might say that an order.Item always has an order.Product inside, or that a model.LineItem always has a description.

The mapper is a bridge between two models and usually both models would have some invariants. Putting the invariants logic in that bridge might seem like a good idea, but is it really?

Let’s look at a segment of the original FromUser example:

if u.BillingAddress == nil {
  return nil, fmt.Errorf("customer must have billing address")
}
customer.Country = u.BillingAddress.Country
customer.City = u.BillingAddress.City

What is the invariant here? Is it that all order.User always have billing address or that a model.Customer will always have country and city? If it is the second one, it is very weak invariant, because both Country and City can be empty strings, so by checking for nil we are enforcing a technical (i.e. internal), not domain invariant.

The only advantage of having the validation logic in the mapper is that we can see why we cannot satisfy the output model’s invariants, e.g. “Line item should always have description, but it needs to come from the product description, which is missing”. Wow, this is complex in English, how can it be simple in Golang?

I think that a much better option is to use either pre checks or post checks.

Pre Checks

I prefer keeping the mapping code clean and move all assumptions about the input data to one place.

In our example we have the Document service accepting an order.Order over gRPC, we have two options, the first one being to add a validator to the handler accepting the gRPC request (aka pre check):

func (h *OrderHandler) handle(ctx context.Context, order *order.Order) error {
  err := h.validator.Validate(order)
  if err != nil {
    return fmt.Errorf("cannot handle invalid order: %w", err)
  }

  err = h.controller.CreateDocument(order)
  if err != nil {
    return fmt.Errorf("cannot create document from order: %w", err)
  }

  return nil
}

This is a good approach if the Document service is having specific, non-universal requirements about “orders that can have a purchase document”.

For example, the business might dictate that people receiving the purchased goods is more important than receiving a purchase document and the Delivery service in our system might have more relaxed view on the order.Order data.

However, in many other circumstances the order.Order invariants must be owned by the Checkout service. In those situations the Document or Delivery services shouldn’t make further assumptions, but rely on the invariants guaranteed by the upstream.

Which brings us to the next topic – post checks.

Post Checks

Now that each service is responsible to contribute the data along with the consistency (i.e. invariants) associated with it, we can have a validation at the end of the pipeline.

This means that the Document service should guarantee that all line items have description:

func (c controller) CreateDocument(o *order.Order) error {
  doc := mapper.FromOrder(o)
  err := c.validator.Validate(doc)
  if err != nil {
    return fmt.Errorf("cannot issue invalid document: %w", err)
  }
  // ...
  return nil
}

You can imagine a similar piece of code in the Checkout service in regards to the order.Order structs.

The post check guarantees that we will not issue a “bad” document because of “bad” upstream data and provokes the right question. Instead of asking “why do we see products without description” we will be asking “how should we issue a document for products without description”. Maybe we can have a default description?

Outcome

One might argue that having the Document service relying on the invariants of the Checkout service is coupling.

I would argue the opposite. Having pre-checks in the document service or having it embedded in the mapping is creating coupling of the worst kind. The Checkout service might accidentally break the Document service. Yes, the Document service will break with a mapping error, but it would still be broken, wouldn’t it?.

Now imagine the Checkout service is publishing orders to a message broker (e.g. Kafka) and there are several other products relying on that stream of events. We have a situation where the Checkout team can break any of those downstream services at any time and the only way to understand if a change is breaking is to go through the code of each one. The team owning the Checkout service has a choice – be careful and slow or be faster and break things.

I believe sometimes it is better to forget about microservices and tap into the decades of experience in the industry. If I have a monolith with components for orders, documents and delivery, would I repeat the invariants of the orders in each dependency or try to minimize duplication?

Regardless whether you agree with the statement above or not, the Document team must make a decision about checking incoming orders. Having this decision spread through the mapping code and potentially other parts of the service would make revising it later very, very expensive.

Conclusion

We make decisions about our systems every day and there is no formula on how to always make the right one. Therefore, we as engineers must make sure that our decisions are as cheap as possible and we can easily revise them in the future.

Unconditional code is one way to get there.

References

Decision Hiding – Make Complex Simple Again

Have you ever had your head explode while trying to trace the execution of a use case through a complex codebase? Well, I certainly have. Sometimes the code I am trying to decipher is my own, marking the process not only painful, but shameful as well.

In this article, I will explore the concept of complexity and how to tackle it using decision hiding (aka information hiding).

What is complexity?

The state of having many parts and being difficult to understand or find an answer to.

Cambridge Dictionary

I believe this to be a better definition than anything our industry can produce. Language has been the tool of scientists, philosophers and many others for centuries. Our industry has existed for a few decades.

Let’s try to contextualize the definition.

What is complexity in software?

A “Hello, World!” program is usually something we consider simple. It:

  • does one thing;
  • uses one standard library function;
  • has one entry point (e.g. the terminal);
  • is written in one language and so on;

Notice the emphasis on one.

In contrast, we consider ERP systems complex. They are doing… Nobody really knows what they are doing, but it is a lot. Sometimes they integrate hundreds of other systems and technologies, driven by thousands of configurations. Some ERPs come with their own IDE, so they are not just configurable, but programmable as well.

Google Search might serve as a more interesting example because it does one thing – it finds relevant results based on a search term. So it is simple. However, you want to have personalized search results, account for SEO, distribute the system so it can serve millions of requests per second, etc. The way the search works, the “how” is much more complex.

Borrowing from the Lean Architecture [1] book, the software has a form (what the system is) and a function (what the system does). Complexity may affect either of them differently.

All being said, it would seem like we cannot have a production software that is truly simple. Production software will always have many parts. Even the “Hello, World!” example, if we imagine it production-ready, we need to figure out installation, updates, localization, multiple targeted operating systems, etc. It could be shipped as a Docker container, adding more parts.

Luckily, this is not completely true. We can have simplicity as a property of production software.

Is math complex?

Several years ago, while I was at the University, I had difficulties understanding the math behind a particular computer science paper. I discussed it with a few colleagues, and everybody told me it is very complex. So I went to one of my math professors. He sat down, looked at the paper and told me it is very simple. He explained. I understood.

How did that happen?

Where my colleagues saw a big question mark, my professor saw structure:

Example of hierarchical problem composition
Problem composition

The professor had to explaining 3 distinct things and I was able to understand the original problem.

How does math battle complexity?

An example for dealing with complexity in mathematical proofs are lemmas.

Every lemma has a single result (i.e. a conclusion) that is a stepping stone for the larger problem. However, the lemma has no particular knowledge of the context it is used in. Similarly, the final proof depends only on the conclusion of the lemma.

This is essentially decision hiding – the lemma hides the decision of how to prove it: what theorems and lemmas to combine, what methodology to follow. Only the conclusion matters to others.

Coupling and dependencies in software

As software engineers, we often talk and think about coupling. We consider loose coupling a good thing, whereas tight coupling should be avoided at all cost.

But the devil is in the details.

For example, it is common in Java to reduce coupling by using interfaces and factories.

However, some people do so in a very mechanical way, following a broken principle: “If you don’t have a reference to a concrete class, you only depend on the abstraction, therefore you have loose coupling.”. The issue with this line of thinking is that it is focused on code. “If I don’t depend directly on a piece of code, I am not coupled with it.” This cannot be further from the truth.

Imagine a University scheduling system, which models students, classes, rooms and workstations.

High-level diagram

Let’s assume we have the system up and running, we have nice clean interfaces and dependency injection.

There is a problem, though. Students cannot find their place quickly in the room, so we decide to show the faculty number on the screen – we implement it in the workstation setup.

We have coupling.

Hidden dependencies

Inside our workstation setup module, we depend on the decision that every student will have a faculty number. We know there are constraints in the database that enforce this, so we depend on it.

If you do not agree that this is coupling, imagine the University wants to use the system for community workshops where anyone can sign up with an email. Domain experts say they are students, but they don’t have a faculty number. All of a sudden, there are dozens of places in our codebase that need to change (workstation setup included), because they were all developed knowing the faculty number is required.

Making faculty number required was a decision we took at some point and dependencies to that decision could easily infest our entire codebase.

There is a lot of data flowing through our systems and we sometimes forget that abstractions do not produce the data – concrete implementations do. The concrete leaves its mark on the data, allowing us to implicitly depend on code that was executed “miles away”, hidden under abstract factories and configuration-based dependency injection.

Decision hiding

A production software as a whole has many parts and is therefore complex. But this complexity can be managed – simplified – by dealing with one problem at a time. This is what math is doing, this is what civil engineering is doing. Furthermore, this is what we are usually trying to do in software using abstractions.

However, when you work on a problem, you are forced to deal with all its dependencies – implicit ones included.

Let’s look at another example.

According to Robert Martin in Clean Code [2]:

  • A function should do one thing and do it well
  • Error handling is one thing

Take our university system, more specifically, the placement module:

  • Students should be distributed across the room
  • Should send an email to the class administrator if there is no room

It may be intuitive to some that we need two separate functions – one to deal with the placement itself and another to deal with the error:

def place_all_students(university_class):
    try: 
        placement = place_students(university_class.room, university_class.students)
        setup_room(placement)
        return True
    except NotEnoughRoom as ex:
        email = create_not_enough_room_email(university_class.signature, ex.extra_students)
        send_email(university_class.administrator, email)
        return False

We could move the orchestration logic to another function, but apart from that, we have a very focused code.

However, we failed to apply proper decision hiding. The place_all_students function knows “how” students are placed – using some greedy algorithm that will leave a bunch of extra students at the end. If we revise this decision in the future, we will have to change a lot of code (e.g. place_all_students and create_not_enough_room_email).

Decision hiding, when applied properly, does two things for us:

  • Protect us from change. When we revise decisions, we would change only that part of the system that contains those decisions [3]. We can simulate this to test our design.
  • Help us focus on one thing at a time. When there is no knowledge or dependency to a another decision, we can ignore it and focus on the task at hand.

Getting back to our previous example, a better design would be to abstract the error, allowing us to change the placement algorithm without touching the rest of the system:

def place_all_students(university_class):
    try: 
        placement = place_students(university_class.room, university_class.students)
        setup_room(placement)
        return True
    except PlacementNotPossible as ex:
        email = create_placement_error_email(university_class.signature, ex.reason)
        send_email(university_class.administrator, email)
        return False

Don’t blame the requirements

You can argue that in the example above, it is the requirement that we should send an email with the extra students. Requirements are forcing us to have this dependency. You would be right.

However, you would also be wrong. “It is the requirements” is the ultimate excuse I have heard from engineers delivering complex spaghetti solutions. They say it is complex because this is how product/customers want it. I haven’t done a survey, but I can’t imagine any customer would explicitly want complex software.

Requirements are an integral part of the software. They shouldn’t be just thrown over the fence to engineering – they should be a product of discussion, engineering and careful considerations.

Everybody, all together, from early on.

The Lean Secret [1]

A lot of decisions are taken as requirements and later just implemented as code. If we leak decisions by definition, we cannot later hide them in our code.

Tips for decision hiding

Even though decision hiding is not a recipe you can memorize and use, you can have a checklist to remind you of its principles.

Here is my checklist:

  • Be mindful of the facts and decisions you depend on.
  • Prefer depending on hard domain decisions over solution decisions. For example:
    • “All students will have faculty number” is a domain decision.
    • “Greedy placement of students” is a solution decision.
    • “We will ship software as Docker container” is a solution decision.
  • Challenge requirements, especially ones that revolve around the solution rather than the problem.
  • Be careful what you ask and share. If you know internal details about another team or component, you might accidentally depend on otherwise hidden facts and decisions.
  • Think about the data you share between components – it is the highway for implicit dependencies.
  • Test your decision hiding – pick a decision, change it and see how it affects your code. If the change is contained, your design is solid.

Conclusion

Decision hiding is not just a way of structuring code or building software. It is a way of thinking that has influenced many software engineering “best practices” like encapsulation, indirection, SOLID, clean coding, etc.

If we want to have software that is both rich and simple, we must apply decision hiding to every aspect of its creation. Otherwise, we will always end up entangled in complexity.

References

  1. Lean Architecture: for Agile Software Development byJames O. Coplien, Gertrud Bjørnvig
  2. Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin
  3. On the criteria to be used in decomposing systems into modules by D. L. Parnas

Design vector created by freepik – www.freepik.com