Chaos Engineering Summary (A practice I should adapt)

On 26th Dec 2022, I was at a chaos engineering software engineering event by a harness and it was a very cool topic.

Generally, in software development, developers and testers use testing tools like Jasmine, Mocha, and Selenium to test their product but in chaos engineering, you are talking way beyond.

Chaos Engineering is different from general unit tests or integration tests. It’s a process of testing massive systems for bugs, and failures in production.

Why in the world someone would like to fail their system?

Because, once you get to know the failures, you put the effort into fixing them and making the system resilient.

Chaos engineering isn’t something that can be done with 1 man, it requires the dev team, test team, DevOps, and sometimes management to be present to test the application.

Companies like Intuit, which owns Mailchimp, Credit Karma, QuickBooks, and a list of popular products use chaos engineering to test their application.

The funny part of this test is, that the real test is done in production.

Sounds scary? I felt the same.

So what really happens in the chaos engineering test?

Well, the test can be as simple as disconnecting the network, to deleting a complete node (server) from the cluster. It can also be a memory injection or network attack.

Chaos engineering is generally done with very predefined resources, a list of attacks and expected results, and with control to roll back the changes if something goes wrong.

That means that all chaos engineering events are not the same things are predefined and expectations are very clearly defined.

What is Gameday?

Gameday is the day chaos engineering is executed on production. The test is also generally done in a staging network.

Who presented at the event?

  1. An IBM Presentation

An IBM senior tech employee presented the best practices to do chaos engineering

2. A friend present

A community friend of mine Avinash who works with Delloite as a DevOps Engineer presented.

avinash upadhyay devops delloite
Avinash Upadhyay presenting a talk on API resiliency

3. I met Prithvi

I also met Prithvi Raj, who is the technical community manager at harness.

Prithvi Raj, Technical community manager of Harness
Prithvi Raj, Technical community manager of Harness

He invited me to be part of an upcoming event Chaos Carnival which will basically be around Building Resiliency though Chaos Engineering. The event will be virtual and free for everyone.

 

Some resources I captured, at the event which I think I will likely be using in the future! I will ask Prithvi to share the digital version, and link.

  1. chaos engineering gameday checklist
chaos engineering gameday checklist
chaos engineering gameday checklist

2. chaos engineering gameday Outcome and action plan

chaos engineering gameday Outcome and action plan

tools and framework for chaos engineering

BTW, Harness is an Indian company based in Bangalore. Most of their products and open source companies including big tech including IBM, Intuit, and of course small companies use the product to do chaos engineering.

How is harness sued for chaos engineering? 

The Litmus Chaos (by a harness) is basically a software package that is installed in a Linux OS and connects with all its child nodes.

if you have a K8s(Kubernetes) cluster, you will be able to install “harness client” on all the child servers and do your test.

 

Who writes the test code to do chaos engineering for litmus? 

Once Litmus Chaos is installed, you will have access to the Litmus templates that are freely available. From network tests to memory manipulation, and node killing to shutting down the whole cluster, they have covered most of the common use cases.

In case your use case doesn’t exist, you can write your own code, which is mostly YAML, and run custom test cases.

For example: If you want to test the network resilience of your backend API, then you can simply create a “network regression testing” chaos engineering project, run the test and get the result without writing any line of code. You can write code, mostly YAML if you want to but you don’t really need to if you have something like litmus. Litmus is open source and anyone can download for free and use it, or modify and use it.

 

Just for the sake of fun, can I do chaos engineering on my laptop?

Yes, you can. In fact, there was a small demo at the event where they showcased the product and it was quite fun watching it. The demo was a test on an e-commerce app to test the service reliability on a small e-commerce project and everything was running on localhost on a MacBook.

What does Harness and what do they do? 

The harness is an Indian software company based in Bangalore that contributes a lot to the open-source community. They very a very cool product portfolio to do CI CD pipeline, and litmus to do chaos engineering. There are products and open source and anyone can download and use them for free.

Can a startup do chaos engineering to test its product? 

It depends on the resources you have. If you have a big team, and your tech is big that can really impact people.

For people who just have a WordPress website, you don’t need chaos engineering. Although, you can do that too.

I will be requesting the community lead to share the full dock and digital version of the presentation as I think it will be a good resource to refer back to.

 

Do I write blogs on all the events I go to? 

No, only the ones that are interesting to me.

 

How can I attend meetups like these and on software engineering?

To know events, just go to meetup.com and search for your interest.

I live in Bangalore and it’s a quite happening place. But if you live remotely, you can join virtual meets, ie: Chaos Carnival, and have fun.

 

Note: This is not a sponsored post. I have written this with my interest to refer back to the community.