A Production day is no less than a festival. The preparation starts 3-4 weeks before the actual date. Invitations are sent to so many people. Everyone seems to be busy preparing their dishes (applications) to be served on that day.
It is a very auspicious day. This is the day when all the year-long hard work finally pays off.
And sometimes don’t. But today I will talk about a successful production day because I’ve 100% success rate. It never goes south when I’m in the team :p
A Day Before Production
This day is entirely spent in the preparation.
You find people collecting all the resources and keeping it at a single place so that when the time comes they don’t have to go looking for it.
Some spend time creating utility scripts for automating the time consuming error-prone work.
I created a log-monitoring script as I had to monitor multiple logs. I think I can share that script with you.
I created an Apple Script for the same. If you’ve not used Apple Script before than please do. It can be really helpful if you own a Mac Laptop.
Here’s the script…
Not only that, some people even spend time searching for other people’s repository to find something useful that can make their work easy.
But yeah… the day before production is all about preparation.
A Few Hours Before, During And After Production
I woke up early. By mistake, I woke up very early that day (4:49 A.M) and I couldn’t go back to sleep 🙁
If you find yourself in a similar situation then please, please, please go back to sleep. It only gets difficult.
This is my sleep chart for that day.
Okay, so I woke up early and started my day early.
Did some exercise, took a cold bath, read a book, ate nothing :p
It was 8 AM when I opened my laptop and wrote an article on QuickSort.
We were supposed to start at 9:00 AM. It was a pretty easy start. Joined the team call at 9:30 A.M and the process started. One-by-one every team worked on their assigned task. It was around 1:00 afternoon when our turn came to deploy all the applications.
This is where all the chaos started.
The applications were deployed successfully, but then issues started coming in. Not specifically related to the applications.
It was all over the place. The database team going haywire, the monitoring teams raising alerts. Everyone was silent on the call but every second I was hearing the sound of the notification.
Then someone reported that service A is down, the system is raising alerts. I looked at the logs and it was a legit issue. Spent some time fixing it. During this time, we got alerts for Service B and C. Then simultaneous we had to communicate with the different teams at least 4 people at any given time while fixing the issue. All at the same time.
It was the craziest hour for me.
I’ve never multi-tasked at that level before. I still don’t know how I was able to cope up for so long. I’m not a multi-tasker by nature. I prefer to take one task at a time, finish it and then take another. But this was next level multi-tasking even for a multi-tasker.
Somehow everything got resolved and the pressure eased. I took a deep breath and noticed that my head was hot.
Literally hot. I immediately took the ice and put it on the head.
It was already 4:00 in the afternoon. And I was going on non-stop. I didn’t even have lunch. And then mom forcefully kept the food on the table asking me to eat.
Now, on the one stand, I was monitoring all the application and on the other hand, I was eating food. In between, I had to talk as well.
Slowly everything smoothened but we were still seeing some errors in the log. We knew it was not affecting the functionality in any way because if it did, someone from some team might have told us. But because of this error, the alerts were getting raised in the system. That needed to be fixed. So we started looking into that error.
That was one weird error.
After putting 2 hours on that error, we finally gave up. The 2 hours were not completely targetted to that issue as well. We were serving some other request in between. It was in and out effort.
The reason it took so much time because we had no way to test it. In local it was working fine, in the stage it was working fine but in production there was the issue. The issue was only present in the production environment. And deployment to production is something that takes some thought (because live traffic is on).
So, we kept looking at the issue from all the angles but couldn’t figure out the cause.
Finally, we were so tired that we had to call it a day. It was close to 11 at night when we finally left it and closed our day.
Everyone was cheering for a successful deploy, congratulating each other. But we had this one issue to be fixed tomorrow.
The Next Day (Saturday)
Since we left the issue unresolved, we had to work the next day. We planned on starting early but we were so damn tired and had to push it to 11.
So we started at 11 from where we left off last night. And trust me it took only 30 minutes. The solution was just about removing 2 lines of code. Yes, we removed 2 lines and everything worked. But the cause of that error was crazy. The debugging was one hell of a job that we did.
And we did a really awesome job. The issue was not at all easy to find and even test. It was just occurring on the production servers. And the logs that we were getting was not at all descriptive. We had to dig deep and explore all the possibilities to finally fix that one.
And I’m telling you – We Are Awesome!!!
Well, I would like to end with two things that I learned and it might help you too –
- Sleep well and long.
- Do not skip your meals.
The combination of these two things will make you really tired and miserable so make sure you get enough sleep and good food.
Let me know your experience in the comments below.