Course Overview
The One Day Hack for Monitoring focuses on the observability of the cloud-native workloads running in the Microsoft Azure. The main goal of this hackathon is to prepare IT professionals of all flavours for monitoring cloud application workloads, introduce best practices, and work with cutting-edge monitoring technologies. This hack focuses on giving attendees a hands-on experience of working with Azure Monitoring in Azure with Container Insights. The concept is based on attendees working in teams while completing a set of gated challenges that will exponentially boost their knowledge of gathering and understanding the metrics, creating alerts, and automating tasks that promptly take action on potential issues in today’s modern cloud-application world. They will do that by leveraging Azure’s Monitoring, Container Insights, Application Insights, Grafana, and Prometheus.
Course Content
The challenges are connected – each building on the previous one. The attendees will be faced with three challenges. They will be given an Azure Kubernetes Service cluster running a demo application consisting of several microservices developed using Node.js. The challenges are briefly described below
Challenge 1: Hello, is anybody home?
Introduction to monitoring with Azure Monitor will start with the team looking to create a sustainable monitoring solution to get meaningful insights. They will have access to the AKS cluster with the application up and running.
The team will have to verify that cluster is behaving normally. This will be done by creating a monitoring dashboard showing the most critical metrics for containers, applications running on those containers, underlying Virtual Machines, and Kubernetes API
Challenge 2: You should dig deeper
The team will be introduced to Azure Container Insights.
They will need to show some system-defined, log-based metrics as well as dig deeper and create custom, user-defined log-based metrics which they will later include in the Monitoring dashboard from the previous challenge.
The team will also need to implement another monitoring solution using Prometheus and Grafana for visualization
Challenge 3: Where did all this traffic come from?
Everything should be up and running correctly, however, the team will be presented with another app they need to deploy to their cluster. That is why the team should check the health of their solution and optimize it. To achieve that, they will have to implement resource limits and add alerts for key metrics to improve their cluster's observability