Skip to main content

Detect errors on time to gauge the success of your launch — Simple concepts, huge benefits!

When you develop your applications, do you put thought into a proper logging mechanism? In a perfect world, there should be no errors in the logs, but we all know that’s not the case in reality. Then how do you know that the project you just launched to a production environment is not introducing new errors or new types of errors on top of already existing errors that your triage team has been investigating?
Maybe on day 1 of your launch, you don’t see customer complaints, but in reality your systems are hurting or bleeding slowly. Yes, a network operations team could be checking the high-level health of the systems through a list of dashboards, but what are we software developers going to do to introduce a new level of detection? This all starts from the enterprise architecture and frameworks you build. Let’s assume you have a very robust logging mechanism and that this mechanism allows you to log the happy path. Let’s also assume that you have very clean guidelines for error and exception handling and utilizing the logging framework where necessary.
Now that you have all of the above in place, at the beginning of your project that is implementing business requirements within the existing framework, you have ability to cleanly define the top 20 cases to measure the success of the new code/features. Each developer can use this top-20 list as a guideline while developing the code and logging happy/negative cases. Let’s say your code is now in production and you are scanning through the logs manually and detecting the top-20 cases. Is this efficient? Are you supposed to do this on daily basis manually?
My recommendation is that you develop a lightweight solution that will be able to automatically do the following for you:
  • * Scan the logs on daily/hourly basis and produce the count of the top-20 scenarios and display the results in a table on some internal dashboard website
  • * Have ability to detect if the number of errors in each category * increases by more than X% (daily comparison of errors per Y units of work and units of work could be somehow defined and tied to the traffic on your website).
  • * Have ability to detect if new types of errors and exceptions that start happening so that the team can manually assess the situation and then add each new type of error to the top-20 list and start tracking it on daily basis.
If you have all of this automated, then there is no manual work needed when you launch something to production. You will be able to tell if your new code is hurting the numbers on existing top-20 categories and you will also be able to tell if you started introducing new types of errors that hurt the revenue of your company. Let’s assume that your production deployment involves deploying to a smaller/secondary data center first and then later to the rest of your data centers. Then this type of mechanisms can help you decide whether you continue deploying to the rest of data centers after deploying to that smaller data center.
These are all simple concepts. You can spend minimal efforts in building it yourself or maybe decide to buy a solution. The importang thing is to always take the “keep it simple” approach in decision making.

Conclusion:

Start by tracking top-20 errors on daily/hourly basis and use the percent of change as the gauge for the success of your code being pushed to production environments. Detect the newly introduced low-level engineering errors in production on time to gauge the success of your launch. Don’t over-design this! Keep it simple!
Almir Mustafic

Comments

Popular posts from this blog

Leaders/Mentors in my life

I have been blessed in my software engineering career with great leaders. Some of them challenged me in technical skills. Some of them challenged me in my organization and leadership skills. Some of them challenged me in both. And all of them made me a better software engineer, a better senior engineer, a better solutions architect, a better teammate, and a better leader. If you are a student, find yourself a mentor. If you are a junior software engineer, find yourself a mentor. If you are an experienced software engineer, find yourself a mentor. Remember, you write your own definition of success and you are your own critic. That may mean that you TRY to perfect every stage of your career, or that may mean that you skip some stages in your career. Remember, you are in control. That’s all I wanted to say today :) Keep geeking out. Almir

Daylight saving time and A Software Engineering state of mind ?

You may be wondering what the Daylight saving time has to do with a software engineering state of mind. When thinking about writing this article, at first I thought to start with the following joke and I am: “ Did you know that the Daylight saving time was started because a software developer coded a function that does smart timezone and configurable calculations and then this developer created a problem to solve to use the algorithm; hence, the Daylight saving time was born. ” This is a joke, but  on a more serious note , this brings me to a state of mind in software engineering that make this joke a reality to some degree. How many times did we find ourselves in situations where we learned something new in programming and we looked for ways to apply it at any cost? How many times did we see a cool new feature from a creator of a framework and we decided to use it even though that was not the right solution for the problem or maybe there was no problem to solve in the ...

Language of Software Engineers and scrum-master skills (quick thoughts)

Language of software engineers and skills of scrum-masters? All software developers speak the same language and that is pseudo-code :) However, there are still communication issues among software engineers specifically with other teams. That's where the role of great scrum-masters fits in. That great scrum-master does not necessarily need to be technical but he/she needs to have the skills of hearing roadblocks that engineers communicate in their technical language. I said "hearing" and hearing is not the same as listening. Listening is just a pre-requisite for hearing. Once you hear it, now you need to know how to action it and mobilize the right people. Coaching comes along with all of this, but that is a separate topic because it is also a responsibility of the tech manager. These skills separate great scrum-masters from others. Almir Mustafic P.S. Disclaimer: On any given day, I wear a hat of a solutions architect, engineer, scrum-master and tech manager.