Reiterating the Importance of Building Trustworthy and Highly Dependable Systems …..
In 2019, a glitch in the UberEats Payment method with a payment partner, PayTM, had cost them about $14k worth of free food due to the successful delivery of uncharged orders to customers.
And as a backend engineer, usually involved in implementing business logics, this problem resonates well with me, because such mistakes as this are not unusual with some engineers.
Some API endpoints do not even return any valid status code & you start wondering what the engineer on the other end was trying to achieve.
This justifies why it’s usually safer to treat every unknown response as a failure.
I once enrolled for a course on “Building Trustworthy Systems” & later discovered a video about how much Excel, — that popular Microsoft Spreadsheet software had cost businesses.
Video title: When Spreadsheets Attack!
It talks about the report of an underestimated budget of $25m by Utah’s state department of education due to “faulty reference in a spreadsheet”.
A village in West Baraboo ended up borrowing $400k, an amount more than they should have because of errors when summing a list of numbers in Excel.
In 2012, JPMorganChase reportedly lost about $6bn as a result of an error from related miscalculations in Excel.
Likewise, the glitch between UberEats and PayTM ended up costing them about $14k worth of food as a result of uncharged orders from customers.
These losses reiterate how important it is to build trustworthy and highly dependable systems. And to create set down processes to follow that’d enable such systems to be well tested and verified before deploying them for use in production.
Agreed, human errors abound and Product development alone is not enough to deal with these issues!
It is also an ethical thing to do, as these kinds of errors may have real-world consequences.
In other scenarios, glitches like in the case of PayTM with UberEats could involve human lives.
As such, depending on your situation, it’s recommended to follow some of the best practices below to ensure a trusted and highly dependable system.
- (Always communicate) Improve communication between stakeholders (highly important as these could have prevented the losses on UberEats from happening).
- Don’t assume unknown API responses means success. Treat unknown responses as failures until they are known (Implementations and use cases might be different, but same concept).
- Give users access to only the resources/services they need per time (All permissions should be revoked by default and access should be given only when they are stated explicitly).
- Depending on risk evaluation, consider making your services idempotent.
I hope these recommendations help someone.