Observability with ASP.NET Core
As full stack devs building web applications, we’ve all had to deal with difficulties in finding and fixing that one bad commit which brought down production instances to their knees. It could be a sudden CPU spike or memory spike or underperforming SQL queries.
No matter how robust of a CI/CD pipeline we have, bad code can inevitably creep it’s way into production.
For these scenarios wouldn’t it be great if we had our own Jon Snow on night’s watch duty.
Well, Jon Snow might no nothing 😜
But This is where observability aspect of our software system comes to play.
Observability lets us understand a system from the outside, by letting us ask questions about that system without knowing its inner workings. It can help us answer all of our “Why is this happening?” questions.
The Observability Stack
For any software system to be observable, we need to have at least 3 components.
- Data Production
- Data Collection and Storage
- Data Visualization and actioning system.
Meet Open Telemetry
Open Telemetry aka OTel, is a software framework that lets us produce, collect, manage and export various pieces of telemetry information like metrics, logs and traces.
With Open telemetry, we are in-charge of information being generated rather than sticking to a vendor specific format.
What this means is we have an open standard for instrumenting the collection of various data points throughout our software system.
This raw data can then be send to what’s called an “Observable Backend”, which is in charge of storing and processing it to produce meaningful metrics
Meet Prometheus
Prometheus is an open-source data monitoring system, that can collect telemetry data from our software system and helps us in monitoring this data.
At it’s core prometheus has the following things:
- Data collection agents aka jobs which periodically scrape raw metrics data from our application.
- A time series database for storing the scraped data.
- A HTTP Server for exposing this data.
Prometheus has it’s own query language called PromQL for querying the metric data to extract useful insights
As powerful as PromQL syntax, to be honest it’s not very friendly to be dealt by not so tech savvy folks.
Moreover we can’t leverage the full power of these metrics unless we can visualize them.
Meet Grafana
This is the final piece in building an observable stack.
Grafana is an open source data visualization framework, that lets us build powerful dashboards to query data from various data source like Prometheus.
Apart from data visualization it also a highly configurable monitoring and alerting system
A practical Example
Let’s take a practical example to better understand the observability ideas.
We have a simple event management system that lets us create events with different ticket pricing tiers and we also allow users to make reservations for our events.
This is how our REST APIs look like
On a high level we can start by creating “Event” and then we create different pricing tiers by creating “Tickets” corresponding to event.
Finally we can make “Reservations” to an event based on a particular ticket.
In order to make our app observable, we start by adding the necessary nuget packages for Open Telemetry.
<PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.6.0" />
<PackageReference Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" Version="1.6.0-rc.1" />
<PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.6.0" />
<PackageReference Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.5.1-beta.1" />
<PackageReference Include="OpenTelemetry.Instrumentation.Http" Version="1.5.1-beta.1" />
<PackageReference Include="OpenTelemetry.Instrumentation.Runtime" Version="1.5.1" />
Apart from core OTel Protocol and runtime hosting, we also add the instrumentation for Asp.NET Core and Hosting extensions to help us provide some widely used common metrics for web applications like number of http requests, aggregate metrics on response time, server up time, etc.
We also added Prometheus exporter so that our metric data can be exposed over a HTTP endpoint in a Prometheus friendly format.
Next we add the relevant services to the ASP.NET core’s DI container and map the /metrics
endpoint which is expected by prometheus scraper
Here is how our Program.cs
could be modified to achieve this
// Add relevant services for OTel to function
services.AddOpenTelemetry()
.ConfigureResource(resource => resource.AddService(serviceName: environment.ApplicationName))
.WithMetrics(metrics =>
metrics
.AddAspNetCoreInstrumentation() // ASP.NET Core related
.AddRuntimeInstrumentation() // .NET Runtime metrics like - GC, Memory Pressure, Heap Leaks etc
.AddPrometheusExporter() // Prometheus Exporter
);
// Map the /metrics endpoint
app.UseOpenTelemetryPrometheusScrapingEndpoint();
If we run our web app and hit /metrics
endpoint we will be greeted with the following output
# TYPE process_runtime_dotnet_gc_collections_count_total counter
# HELP process_runtime_dotnet_gc_collections_count_total Number of garbage collections that have occurred since process start.
process_runtime_dotnet_gc_collections_count_total{generation="gen2"} 1 1698476105878
process_runtime_dotnet_gc_collections_count_total{generation="gen1"} 2 1698476105878
process_runtime_dotnet_gc_collections_count_total{generation="gen0"} 3 1698476105878
...
In order to take away the complexity of installing and managing Prometheus and Grafana, we can use Podman, as this will help us create development containers.
We also need to install podman-compose extensions to ease the orchestration of our containers
With that out of the way, we start containerizing our ASP.NET Core application, based on the standard Dockerfile template for ASP.NET core applications we could do something like this
FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build
WORKDIR /source
COPY Ticketer.sln .
COPY Ticketer.Business/Ticketer.Business.csproj ./Ticketer.Business/Ticketer.Business.csproj
COPY Ticketer.WebApp/Ticketer.WebApp.csproj ./Ticketer.WebApp/Ticketer.WebApp.csproj
RUN dotnet restore ./Ticketer.Business/Ticketer.Business.csproj
RUN dotnet restore ./Ticketer.WebApp/Ticketer.WebApp.csproj
COPY Ticketer.Business/. ./Ticketer.Business/
COPY Ticketer.WebApp/. ./Ticketer.WebApp/
WORKDIR /source/Ticketer.WebApp
RUN dotnet publish -c release /p:EnvironmentName=docker-compose-dev -o /app
FROM mcr.microsoft.com/dotnet/aspnet:7.0
WORKDIR /app
COPY --from=build /app ./
ENV ASPNETCORE_ENVIRONMENT=DockerDev
ENV ASPNETCORE_URLS=http://+:5001
EXPOSE 5001
ENTRYPOINT ["dotnet", "Ticketer.WebApp.dll"]
Next we write a prometheus.yml
configuration file to instruct Prometheus where it should obtain the metrics from.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "ticketer-web-app"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
scrape_interval: 30s # poll very quickly for a more responsive demo
static_configs:
- targets: ["ticketer-web-app-service:5001"]
This simple config file has just one job that scrapes from ticketed-web-app-service:5001/metrics
host
In essence what this means is prometheus instance has to hit the above endpoint every 30 seconds and gathers the telemetry information.
If you are wondering why we did not use localhost
it’s because on a podman compose container set, once we configure our network we can access our containers with their service names
Now to the final piece, we must connect our prometheus server to grafana.
So we write a grafana configuration.
apiVersion: 1
datasources:
- name: TicketerPrometheus
type: prometheus
access: proxy
url: http://ticketer-prometheus-service:9090
This is a very simple configuration that instructs that grafana has one prometheus data-source at the http://ticketer-prometheus-service:9090
Now we write one final docker-compose.yml
to orchestrate the creation and management of these containers
networks:
ticketer-network:
name: ticketer-network
driver: bridge
services:
ticketer-web-app-service:
container_name: ticketer-web-app
build: .
restart: always
networks:
- ticketer-network
ports:
- 8080:5001
depends_on:
- ticketer-db-service
ticketer-prometheus-service:
container_name: ticketer-prometheus
image: prom/prometheus:latest
networks:
- ticketer-network
ports:
- 9090:9090
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
depends_on:
- ticketer-web-app-service
ticketer-grafana-service:
container_name: ticketer-grafana
image: grafana/grafana:latest
networks:
- ticketer-network
ports:
- 3000:3000
volumes:
- ./grafana-provisioning:/etc/grafana/provisioning
depends_on:
- ticketer-prometheus-service
ticketer-db-service:
container_name: ticketer-db
image: postgres:latest
networks:
- ticketer-network
ports:
- 5432:5432
restart: always
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: develop
POSTGRES_DB: Ticketer
volumes:
- ticketer-db-data:/var/lib/postgres
volumes:
ticketer-db-data:
We have 4 services with one container each
- ticketer-web-app-service: The service which hosts our application code.
- ticketer-db-service: The service which hosts our database
- ticketer-prometheus-service: This service hosts our prometheus instance and we give our prometheus configuration through volume mount.
- ticketer-grafana-service: This service hosts our grafana instance and we configure it similarly using volume mounts.
Finally we have all these servies bound to the same bridge network ticketer-network
With everything setup let’s check it action.
All you have to do is run podman compose up
from the root of the repository and wait for these services to create their respective containers.
Then let’s make some API calls by hittinglocalhost:8080
First we create an event
tarun@Taruns-MacBook-Pro ~ % curl -i \
-X POST \
-H "Content-Type: application/json" \
-d '{"name": "Critical Event", "scheduledAt":"2099-10-10T09:41:00Z" }' \
http://localhost:8080/api/events
HTTP/1.1 201 Created
Content-Type: application/json; charset=utf-8
Date: Sat, 28 Oct 2023 07:27:39 GMT
Server: Kestrel
Location:
Transfer-Encoding: chunked
{"id":"d9776c35-710f-48fa-a911-9a859c1a5bfc","createdAt":"2023-10-28T07:27:39.4959008Z","updatedAt":"2023-10-28T07:27:39.4959163Z","name":"Critical Event","time":"2099-10-10T09:41:00Z"}%
Then we create a ticket for the event
tarun@Taruns-MacBook-Pro ~ % curl -i \
-X POST \
-H "Content-Type: application/json" \
-d '{"tier": "Ultra Premium", "baseFare":100, "tax":25, "discount": 0 }' \
http://localhost:8080/api/events/d9776c35-710f-48fa-a911-9a859c1a5bfc/tickets
HTTP/1.1 201 Created
Content-Type: application/json; charset=utf-8
Date: Sat, 28 Oct 2023 07:44:22 GMT
Server: Kestrel
Location:
Transfer-Encoding: chunked
{"id":"a44232ec-8afc-4175-8674-1cfb3fb15361","createdAt":"2023-10-28T07:44:23.2103948Z","updatedAt":"2023-10-28T07:44:23.2104188Z","eventId":"d9776c35-710f-48fa-a911-9a859c1a5bfc","tier":"Ultra Premium","baseFare":100,"tax":25,"discount":0,"fare":125}%
Finally we make some reservations for the event
curl --location --request POST 'http://localhost:8080/api/reservations?ticketId=a44232ec-8afc-4175-8674-1cfb3fb15361&eventId=d9776c35-710f-48fa-a911-9a859c1a5bfc'
Now that we have caused some activity on our web server, we can check how this is reflected in our observability stack.
First we can visit prometheus server at http://localhost:9090
We can query the http_server_duration_milliseconds_bucket
to see the telemetry data collected for the requests that came in.
Similarly we can head over to grafana instance to get much richer data visualization dashboards.
Head to http://localhost:3000
, we can use admin
as the default username and password.
Navigate to datasources -> TicketerPrometheus -> Explore Data
, and you’ll be greeted with highly configurable, grafana’s explorer.
Change the data resolution time to 5 minutes and select the same metric and hit Run Query.
You can see that it shows the information as a bar graph.
If you head to grafana’s alerting module at http://localhost:3000/alerting/new/alerting
You’ll see a highly customizable alerting system 🚨
The observability stack mentioned in this write-up has a lot more configurability than what I can possibly explain with a simple write-up.
All the code mentioned in this write-up can be found on my github repository here:
But I hope that you’ve gotten a good introduction to the surface of what’s called
Anyway thought I’d share something I found very interesting, that’s it for this story, thanks for making it till here 😄
Connect with me on LinkedIn and X
for more such interesting reads.