Skip to main content

Detect errors on time to gauge the success of your launch — Simple concepts, huge benefits!

When you develop your applications, do you put thought into a proper logging mechanism? In a perfect world, there should be no errors in the logs, but we all know that’s not the case in reality. Then how do you know that the project you just launched to a production environment is not introducing new errors or new types of errors on top of already existing errors that your triage team has been investigating?
Maybe on day 1 of your launch, you don’t see customer complaints, but in reality your systems are hurting or bleeding slowly. Yes, a network operations team could be checking the high-level health of the systems through a list of dashboards, but what are we software developers going to do to introduce a new level of detection? This all starts from the enterprise architecture and frameworks you build. Let’s assume you have a very robust logging mechanism and that this mechanism allows you to log the happy path. Let’s also assume that you have very clean guidelines for error and exception handling and utilizing the logging framework where necessary.
Now that you have all of the above in place, at the beginning of your project that is implementing business requirements within the existing framework, you have ability to cleanly define the top 20 cases to measure the success of the new code/features. Each developer can use this top-20 list as a guideline while developing the code and logging happy/negative cases. Let’s say your code is now in production and you are scanning through the logs manually and detecting the top-20 cases. Is this efficient? Are you supposed to do this on daily basis manually?
My recommendation is that you develop a lightweight solution that will be able to automatically do the following for you:
  • * Scan the logs on daily/hourly basis and produce the count of the top-20 scenarios and display the results in a table on some internal dashboard website
  • * Have ability to detect if the number of errors in each category * increases by more than X% (daily comparison of errors per Y units of work and units of work could be somehow defined and tied to the traffic on your website).
  • * Have ability to detect if new types of errors and exceptions that start happening so that the team can manually assess the situation and then add each new type of error to the top-20 list and start tracking it on daily basis.
If you have all of this automated, then there is no manual work needed when you launch something to production. You will be able to tell if your new code is hurting the numbers on existing top-20 categories and you will also be able to tell if you started introducing new types of errors that hurt the revenue of your company. Let’s assume that your production deployment involves deploying to a smaller/secondary data center first and then later to the rest of your data centers. Then this type of mechanisms can help you decide whether you continue deploying to the rest of data centers after deploying to that smaller data center.
These are all simple concepts. You can spend minimal efforts in building it yourself or maybe decide to buy a solution. The importang thing is to always take the “keep it simple” approach in decision making.

Conclusion:

Start by tracking top-20 errors on daily/hourly basis and use the percent of change as the gauge for the success of your code being pushed to production environments. Detect the newly introduced low-level engineering errors in production on time to gauge the success of your launch. Don’t over-design this! Keep it simple!
Almir Mustafic

Comments

Popular posts from this blog

Brand New programming language and one solution OR …

Brand New programming language and one solution OR Two existing programming languages, one solution for EACH? I understand that there is no right or wrong. It all depends on your software architecture, team structure, team skills and other factors, but I still want to explain the scenario as it may look familiar to some. Let me explain. Let’s assume that you have microservices and common libraries in two major programming languages. You have some teams who are experts in one and some teams experts in the other programming language. Now you need to come up with a solution for a scenario that all teams will need to leverage. Let’s assume that your cloud platform has an off-the-shelf approach for this but it is supported by a 3rd programming language that your teams do not have much experience in. What is the right thing for your organization and not just from the technical point of view? A) Do you embrace what your cloud platform gives you off the shelf and implement thi...

10,000 foot level view in technology

10,000 foot level view in technology: How useful is it? What can be done at this level? If the 30,000 foot level is the CTO level, then consider the 10,000 foot level as the level that software engineering managers and directors operate at. To achieve the success at 10,000 foot level, you as a software engineering manager need to dive deep into technical details, help the team lay the foundation from BOTH organizational and technical side. It is the little moves that get the team to this level. Once the team’s applications and results at that level, then you as a manager have ability to perform the high level analysis and troubleshooting without necessarily being in code on daily or weekly basis. Therefore, your ability to troubleshoot technical issues at the 10,000 foot level is a testament to the great work of your team. Go TEAM !! Almir Mustafic

Programming languages to teach students in high-school and university

Python-like or C-like as the language to introduce programming to students in high school and university? The question is: Do you introduce programming concepts to high-school/university students using languages that handle memory and other things for you or do you start introducing all of these concepts in languages like C that require you to understand all aspects. I will tell you what worked for me. I was introduced to programming in grade 10 using Basic programming language. There was a version called Better Basic and also Quick Basic. Then in grade 11 we learned procedural programming in a programming language called Turing (not Turing machine but a Pascal-like language developed by University of Toronto for teaching purposes). Then a year later, I started getting interested in C and C++. As you can see, I eased into the languages that introduced me to NULL exceptions and memory leaks :) With this approach I was not overwhelmed and this set up the foundation for a fun journey ...