Unconditional Code for Mapping of Protobuf in Golang

Unconditional code is code that just works, without errors or exceptions, without funny edge cases represented with tedious IF statements. In this article I will share how this can be applied to situations in Golang where you need to map incoming protobuf messages to internal or external (i.e. outgoing) structures.

A Word about Golang

Golang is particularly sensitive to “ify” code due to the existence of nil and the decision of having errors as values returned by functions. If you have a function like:

func Foo() (*Result, error) {
  ...
}

func UseFoo() error {
  res, err := Foo()
  if err != nil {
    // handle, wrap or at least return `err`
  }
  if res == nil {
    // handle nil
  }
  // Do something with *res
}

You are forced by the author of Foo to handle both an error and a nil case.

If you are dealing with data intensive applications, you might be in a situation where you want to map some upstream/incoming data to some internal representations for further processing.

Example

I will use eCommerce as an example. Imagine we have a system composed of several services communicating over gRPC. We have a Checkout service that calls Document service, passing an order.Order object. The order contains all necessary information to issue a purchase document, and the Document service needs to map the upstream data to its internal model.Document struct to render it and send it to the customer.

Here is a segment that deals with the user part of the order.Order:

func FromUser(u *user.User) (*model.Customer, error) {
  if u == nil {
    return nil, nil
  }
  var customer model.Customer
  customer.Email = u.Email
  if u.FirstName == "" && u.LastName == "" {
    return nil, fmt.Errorf("customer must have first or last name")
  }
  customer.Name = strings.TrimSpace(strings.Join([]string{u.FirstName, u.LastName}, " "))
  if u.BillingAddress == nil {
    return nil, fmt.Errorf("customer must have billing address")
  }
  customer.Country = u.BillingAddress.Country
  customer.City = u.BillingAddress.City
  return &customer, nil
}

There are several problems with this code, the most obvious being it is difficult to read. It might be convenient to put the blame on the language. However, in my experience programming languages have rarely been the cause of trouble.

I see the the following issues with this code:

  • It does mapping and validation at the same time.
  • Calling function will have to deal with both the error and the nil case.
  • It is really hard to read.
  • It is hard to test – we always need to provide a complete user.User object and assert the entire mapped structure, vs having a table test case which asserts the mapping of name.

The last problem can easily explode throughout the codebase. Consider a FromOrder function that maps the deep structure of order.Order to an internal Document one:

func FromOrder(order *order.Order) (*model.Document, error) {
  if order == nil {
    return nil, fmt.Errorf("order is nil")
  }
  if order.User == nil {
    return nil, fmt.Errorf("user is missing")
  }
  customer, err = FromUser(order.User)
  if err != nil {
    // ...
  }
  // ...
}

func TestFromOrder(t *testing.T) {
  testCases := []struct {
    desc string
    in   *order.Order
    want model.Document
  }{
    {
      desc: "should create document",
      in:   &order.Order{
        // ...
      },
      want: &model.Document{
        // ...
      },
    },
  }
  for _, tC := range testCases {
    t.Run(tC.desc, func(t *testing.T) {
      // ...
    })
  }
}

We have three options to make the test works:

  • Initialize the complete order.Order and user.User objects to get passed the mapper validations. This will make our tests long and hard to maintain – any changes to the smaller mappers (e.g. FromUser) will break the TestFromOrder.
  • Use indirection – hide the mapping of different parts behind interfaces that can be mocked. This will create clatter in our mapper package and we will take a performance hit.
  • Not test the FromOrder function.

Luckily, there is a forth option – unconditional code.

Unconditional Go

We will take advantage of two very nice features of Golang and protobuf:

  • nil receivers. I fell in love with the idea of sending an empty message to an object from Objective-C and I was thrilled to see this in Golang. Since the methods in Golang are functions with receivers (which can be considered just another parameter), nothing prevents us from having the receiver parameter nil when it is a pointer. Probobuf code generation takes full advantage of this fact. It is quite simple actually (as Golang is always aiming for simplicity):
func (s *Struct) GetData() string {
  if s == nil {
    return ""
  }
  return s.Data
}

var ptr *Struct // nil
fmt.Println(ptr.GetData()) // this works fine
  • Range over nil. Iterating over an empty or nil slice would render the same result – the body of the loop will not get executed at all. In Golang, you don’t need to explicitly check for nil in those situations. Moreover, I can append to a nil slice.
func get() []string {
  return nil
}

func main() {
  for _, s := range get() {
    fmt.Println("never", s)
  }
  var slice []string // nil
  slice = append(slice, "first element")
}

Armed with those tools, we can now apply unconditional code to our mapping, i.e. make the code always work, without explicit checks, errors, nil returned values or panics:

func FromOrder(order *order.Order) model.Document {
  return model.Document{
    Customer:  FromUser(order.GetUser()),
    LineItems: FromItems(order.GetItems()),
  }
}
func FromItems(items []*order.Item) []model.LineItem {
  var lineItems []model.LineItem
  for _, item := range items {
    lineItems = append(lineItems, FromItem(item))
  }
  return lineItems
}
func FromItem(item *order.Item) model.LineItem {
  price := decimal.New(item.GetProduct().GetPriceE2(), -2)
  qty := decimal.New(item.GetQuantity(), 0)
  return model.LineItem{
    Description: item.GetProduct().GetDescription(),
    Total:       price.Mul(qty),
  }
}

func FromUser(u *user.User) model.Customer {
  return model.Customer{
    Name:    CombineNames(u.GetFirstName(), u.GetLastName()),
    Email:   u.GetEmail(),
    Country: u.GetBillingAddress().GetCountry(),
    City:    u.GetBillingAddress().GetCity(),
  }
}
func CombineNames(names ...string) string {
  return strings.TrimSpace(
    strings.Join(names, " "))
}

Now the code is much cleaner (and shorter), as it is doing one thing – mapping. It is also much easier to read, test and faster to write. Notice how the lack of error return value liberates us from all the clatter of Golang’s idiomatic error handling.

Invariants

So far so good, but what can we do about the validation rules that used to be embedded in the mapper? We can’t just delete them, right?

An invariant is a property of an object that is always true. In our example, we might say that an order.Item always has an order.Product inside, or that a model.LineItem always has a description.

The mapper is a bridge between two models and usually both models would have some invariants. Putting the invariants logic in that bridge might seem like a good idea, but is it really?

Let’s look at a segment of the original FromUser example:

if u.BillingAddress == nil {
  return nil, fmt.Errorf("customer must have billing address")
}
customer.Country = u.BillingAddress.Country
customer.City = u.BillingAddress.City

What is the invariant here? Is it that all order.User always have billing address or that a model.Customer will always have country and city? If it is the second one, it is very weak invariant, because both Country and City can be empty strings, so by checking for nil we are enforcing a technical (i.e. internal), not domain invariant.

The only advantage of having the validation logic in the mapper is that we can see why we cannot satisfy the output model’s invariants, e.g. “Line item should always have description, but it needs to come from the product description, which is missing”. Wow, this is complex in English, how can it be simple in Golang?

I think that a much better option is to use either pre checks or post checks.

Pre Checks

I prefer keeping the mapping code clean and move all assumptions about the input data to one place.

In our example we have the Document service accepting an order.Order over gRPC, we have two options, the first one being to add a validator to the handler accepting the gRPC request (aka pre check):

func (h *OrderHandler) handle(ctx context.Context, order *order.Order) error {
  err := h.validator.Validate(order)
  if err != nil {
    return fmt.Errorf("cannot handle invalid order: %w", err)
  }

  err = h.controller.CreateDocument(order)
  if err != nil {
    return fmt.Errorf("cannot create document from order: %w", err)
  }

  return nil
}

This is a good approach if the Document service is having specific, non-universal requirements about “orders that can have a purchase document”.

For example, the business might dictate that people receiving the purchased goods is more important than receiving a purchase document and the Delivery service in our system might have more relaxed view on the order.Order data.

However, in many other circumstances the order.Order invariants must be owned by the Checkout service. In those situations the Document or Delivery services shouldn’t make further assumptions, but rely on the invariants guaranteed by the upstream.

Which brings us to the next topic – post checks.

Post Checks

Now that each service is responsible to contribute the data along with the consistency (i.e. invariants) associated with it, we can have a validation at the end of the pipeline.

This means that the Document service should guarantee that all line items have description:

func (c controller) CreateDocument(o *order.Order) error {
  doc := mapper.FromOrder(o)
  err := c.validator.Validate(doc)
  if err != nil {
    return fmt.Errorf("cannot issue invalid document: %w", err)
  }
  // ...
  return nil
}

You can imagine a similar piece of code in the Checkout service in regards to the order.Order structs.

The post check guarantees that we will not issue a “bad” document because of “bad” upstream data and provokes the right question. Instead of asking “why do we see products without description” we will be asking “how should we issue a document for products without description”. Maybe we can have a default description?

Outcome

One might argue that having the Document service relying on the invariants of the Checkout service is coupling.

I would argue the opposite. Having pre-checks in the document service or having it embedded in the mapping is creating coupling of the worst kind. The Checkout service might accidentally break the Document service. Yes, the Document service will break with a mapping error, but it would still be broken, wouldn’t it?.

Now imagine the Checkout service is publishing orders to a message broker (e.g. Kafka) and there are several other products relying on that stream of events. We have a situation where the Checkout team can break any of those downstream services at any time and the only way to understand if a change is breaking is to go through the code of each one. The team owning the Checkout service has a choice – be careful and slow or be faster and break things.

I believe sometimes it is better to forget about microservices and tap into the decades of experience in the industry. If I have a monolith with components for orders, documents and delivery, would I repeat the invariants of the orders in each dependency or try to minimize duplication?

Regardless whether you agree with the statement above or not, the Document team must make a decision about checking incoming orders. Having this decision spread through the mapping code and potentially other parts of the service would make revising it later very, very expensive.

Conclusion

We make decisions about our systems every day and there is no formula on how to always make the right one. Therefore, we as engineers must make sure that our decisions are as cheap as possible and we can easily revise them in the future.

Unconditional code is one way to get there.

References

Code Review – Why Nobody Wants to Read your Code?

Well, can you blame them?

Code reviews might be the only thing in software engineering that is universally considered “good” and “the right thing to do”. Furthermore, it is one of the few things that are empirically proven to improve our software [1].

And yet, I oftentimes hear particular about reviews. Nobody complains about the concept, but a lot of people (I am tempted to say “everybody”) are not 100% satisfied with the implementation.

The most common compain I have heard is about the lack of timely reviews.

In this article, I will try to answer the question in the title – why some diffs are neglected while others get more attention. And more importantly – what can the author do to make his code a “bestseller”.

Note that I will be using diff and pull request interchangebly throughout the article. Also, I assume there is already a code review practice in your team. If there isn’t – this article will not be particularly useful and you must first establish code reviews before making them better.

Top 3 reasons why people avoid reviews

My insights suggest that there are 3 top reasons why people tend to avoid reviewing particular diffs:

  1. Volume. Not surprisingly, since large code changes require lots of time and focus to review thoroughly (sometimes more than it took to write them). Many engineers use time between meetings, morning coffee or some other convinient slot to do reviews.
  2. Complexity. Very similarly to volume, code with high complexity requires time and mental energy to review properly. It is of high density – a small piece of complex distributed problem solution has the same “mass” as a much larger mundane diff.
  3. Lack of context. If you are not familiar with a piece of code and the reason behind the changes, you cannot actually review it. Living linters, although not uncommon, don’t bring much value to the table – the purpose of the code review is to verify the correctness of the proposed change above anything else.
  4. Value. It wouldn’t be a top 3 without a number 4, would it? Engineers are particularly lazy (in a good sense). If your diff doesn’t bring  value or it is not clear why it is important, people will tend to neglect it.

How to make your pull requests better?

The heart of my advice is simple – prepare your diffs for review with empathy.

Empathetic writing is a well-established paradigm, one where you constantly try to be in your reader’s shoes. You can start exploring it on your own – there are tons of materials (e.g. https://www.grammarly.com/blog/empathy-writing).

Start by asking yourself:

  • Who is going to review my diff?
  • What is her background?
  • What context does she have for the given task/problem/code?
  • What is the best/easiest way to read through the changes?

Every communications training urges you to think about the audience. While some may argue that you write code for the compiler (I strongly disagree with this statement), nobody can deny that the intended audience of a pull request are software engineers.

Think of ways you can help or guide your reviewers. Here are some things I like to do:

  • Discuss a change in advance whenever possible. Before you have written a single line of code the level of ego is at its lowest and you can really benefit from a fresh perspective. This also gives the other engineer(s) a sense of co-authorship which is priceless during the review process.
  • Review your own code before giving it to somebody else. If you are not already doing this, you will be surprised how effective a self-review might be.
  • Start with why (Simon Sinek reference intended) – explain why your diff is important. Don’t forget to be empathetic with your reader – who is she, would your explanation make sense for someone without your context, how is the change important to her, etc.
  • Write a short summary, explaining the entire diff so that the reviewer can read your code while keeping the general idea in her head. I sometimes use this to guide the reviewer to read the files in a particular order that makes most sense. Avoid “retelling” your code (e.g. explain why you are committing the transaction at line 356 rather than stating you are doing it).
  • Give enough context of the problem and code/components under review. Try not to give too much. You can put the ticket (e.g. Jira), but, please, don’t stop there – I have rarely seen tickets that are good enough as just references. Extract what is important from the ticket and put it in the pull request description.
  • Give references to documentation, especially if understanding the change requires deep knowledge about some library/third-party component. This has the added benefit of making you look at what the documentation is actually saying and verify some of your assumptions.
  • Don’t be clever. A subject of its own article, clever is not the same as smart and being clever in your code is rarely good for your reader.

As you can see, all things revolve around thinking about your reviewer saving her precious time. Also, in order to make the best out of your reviews, you must start thinking about them early.

Conclusion

I hope you don’t treat this article as a checklist, but rather as an inspiration – ask yourself questions, be empathetic with the readers of your code and pull requests, think of ways you can help them help you.

References

  1. The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects by McIntosh, Kamei, Adams, Hassan
  2. Peer Reviews in Software: A Practical Guide by Karl Wiegers
  3. Code Review at Cisco Systems by SmartBear
  4. The CL author’s guide to getting through code review by Google

 

Photo by Engin Akyurt from Pexels