Test using OpenTelemetry traces in Asp.Net

Taking advantage of our OpenTelemetry tracing to easily test behavior that is otherwise very hard to observe

A man searching through binoculars, surrounded by books

Traces and other telemetry exist to make your application more observable. We normally think about that in terms of production. It comes up when debugging, investigating performance issues, or responding to incidents. But, observability is observability during development, too. If your application is already instrumented for tracing, then collecting those traces during tests can make certain behavior dramatically easier to verify.

This came out of work I did for Letterbook. Specifically, there are a lot of instances where the immediate action taken by a user should trigger side effects out of band with the request that prompted it. For instance, when you post something, that should send the post to your followers. And you can have an unbounded number of followers, so this is potentially a very large amount of work that needs to be done. Way more than is feasible to do within the scope of a request response cycle. Even beyond the scope of work, delivery failures, back off, and retries can potentially run for hours. So, Letterbook punts that work into a job queue and completes the response. But I still want to assert that it happened as part of my integration tests. This will be a common dynamic in Letterbook, so I want to be sure it's well tested. But with integration tests, only the API is readily observable, and in many cases that would be insufficient. Traces to the rescue!

Observing Side Effects

For the purposes of this blog post, consider a simplified hypothetical application. Let's say this application has a similar dynamic: there is some out-of-band side effect that is triggered by calling POST /api/perform/side-effect. Maybe there's another endpoint that would give some information about the side effect; in this case, GET /api/observe/side-effect/{id}. We can try to poll it and wait for our expected resource to appear. Here's a hypothetical test class for our hypothetical scenario:

// XUnit test class, but the basic idea should work for any test runner
public class SomeTest : IClassFixture<HostFixture>
{
    private readonly HostFixture _host;
    private readonly HttpClient _client;

    public SomeTest(HostFixture host)
    {
        _host = host;
        _client = _host.CreateClient();
    }

    [Fact]
    public async Task PollingSideEffectTest()
    {
        var id = Guid.NewGuid();
        var payload = JsonContent.Create(new { id });
		await _client.PostAsync("/api/perform/side-effect", payload);

        var tries = 5;
        while(tries > 0)
        {
            var sideeffect = await _client
              .GetAsync($"/api/observe/side-effect/{id}");
            if (!sideeffect.IsSuccessStatusCode)
                break;
            await Task.Delay(200);
            tries--;
        }

        Assert.NotEqual(0, tries, "Side effect was not observed");
    }
}

And then the hypothetical HostFixture, which provides access and manages the lifecycle of our system under test:

public class HostFixture : WebApplicationFactory<Program>
{
    public HostFixture()
    {

    }

    protected override void ConfigureWebHost(IWebHostBuilder builder)
    {
        builder.ConfigureServices(services => {
          // customize services in DI to provide test doubles or other
          // changes, as necessary
        });

        base.ConfigureWebHost(builder);
    }
}

That probably works, assuming such an endpoint actually exists. But what if it doesn't, like in our original scenario of sending potentially thousands of messages? It's not like Letterbook is going to provide anything like a delivery receipt. That wouldn't be at all feasible; that's what telemetry is for. So let's use our telemetry.

Letterbook is already instrumented for tracing with OpenTelemetry. We'll assume the same is true for our hypothetical application. So, we're already producing the spans we'll need. The next part is to collect them. OpenTelemetry has a composable design, and the component that sends spans where you can use them is the exporter. So, we'll add an in-memory exporter, so that they're easily accessible to our tests:

 public class HostFixture : WebApplicationFactory<Program>
 {
+    private readonly BlockingCollection<Activity> _spans = new();
+    public IAsyncEnumerable<Activity> Spans => _spans.ToAsyncEnumerable();
+
     public HostFixture()
     {

     }

     protected override void ConfigureWebHost(IWebHostBuilder builder)
     {
         builder.ConfigureServices(services => {
-
+            services.AddOpenTelemetry()
+                .WithTracing(tracer =>
+                    {
+                        tracer.AddInMemoryExporter(_spans);
+                    });
         })
 
         base.ConfigureWebHost(builder);
     }
 }

I hope this is mostly self explanatory, but there's one important feature here to mention. Notice that the Spans property is an IAsyncEnumerable. This allows us to easily enumerate new spans as they arrive. If this were a regular enumerable, the enumeration would reach the end of the list and then exit. This way, we can await new arrivals without any fuss.

Then, we can query for the spans we want, and make assertions on the result. Like so:

 public class SomeTest : IClassFixture<HostFixture>
 {
     private readonly HostFixture _host;
     private readonly HttpClient _client;
 
     public SomeTest(HostFixture host)
     {
         _host = host;
         _client = _host.CreateClient();
     }

     [Fact]
-    public async Task PollingSideEffectTest()
+    public async Task TraceSideEffectTest()
     {
         var id = Guid.NewGuid();
         var payload = JsonContent.Create(new { id });
 		await _client.PostAsync("/api/perform/side-effect", payload);
 
-        var tries = 5;
-        while(tries > 0)
-        {
-            var sideeffect = await _client.GetAsync($"/api/observe/side-effect/{id}");
-            if (!sideeffect.IsSuccessStatusCode)
-                break;
-            await Task.Delay(200);
-            tries--;
-        }
-
-        Assert.NotEqual(0, tries, "Side effect was not observed");
+        var cts = new CancellationTokenSource(1000);
+        var sideEffect = await _host.Spans
+          .FirstOrDefault(span => a.Source.Name == "side-effect", cts.Token);
+        Assert.NotNull(sideEffect, "Side effect was not observed");
+        Assert.NotEqual(ActivityStatusCode.Error, sideEffect.Status, "Side effect encountered an error");
     }
 }

That's much better. No awkward polling. No extra calls to the system under test that aren't actually part of the test action. In fact, that second endpoint doesn't even need to exist. And it's easier to make assertions, too. We can succinctly assert that the side effect occurred and didn't error. If we need to make more detailed assertions, we have the entire span, and even the entire trace at our disposal. We can add custom instrumentation to expose important details and use those in assertions as well.

These changes depend on some nuget packages, so add these to your project if you don't have them already:

System.Linq.Async
OpenTelemetry.Exporter.InMemory

Isolating Tests

This is all great so far, but thanks to the ClassFixture feature, that web application will actually be persistent across every test in this class. Which means the in-memory span list will also be persistent across those tests, and we can expect it will contain spans from tests other than the one we're evaluating. That's not great.

But, there's a straightforward solution to that. The whole theory of traces is that they correlate back to their triggering event and represent all of the actions which flowed from that. That correlation is most of what makes them traces, as opposed to logs. Which means we can isolate the spans from a single trace, and only consider those in our assertions. To do that, we need to know what the trace id is for our test. And the easiest way to do that is to set one ourselves. We'll set a trace id on the initial request from the client, and that id (and related context) will propagate throughout our application, so all the related spans will refer to the same top level trace. This method will allow us to set a known trace id on our request:

public static ActivityTraceId TraceRequest(HttpContent request)
{
	var traceId = ActivityTraceId.CreateRandom();
	var activityContext = new ActivityContext(traceId, ActivitySpanId.CreateRandom(), ActivityTraceFlags.Recorded, traceState: null);
	var propagationContext = new PropagationContext(activityContext, default);
	var carrier = request.Headers;
	var propagator = new TraceContextPropagator();
	
    propagator.Inject(propagationContext, carrier, SetHeaders);
	
    return traceId;
}

And then a small update to our test:

     [Fact]
     public async Task TraceSideEffectTest()
     {
         var id = Guid.NewGuid();
         var payload = JsonContent.Create(new { id });
+        var traceId = TraceRequest(payload);
 		await _client.PostAsync("/api/perform/side-effect", payload);
 
         var sideEffect = await _host.Spans
+          .Where(span => span.TraceId == traceId)
           .FirstOrDefault(span => a.Source.Name == "side-effect", _cts.CancelAfter(1000));
         Assert.NotNull(sideEffect, "Side effect was not observed");
         Assert.NotEqual(ActivityStatusCode.Error, sideEffect.Status, "Side effect encountered an error");
     }

Takeaway

And there we go! We're taking advantage of our OpenTelemetry tracing to easily test behavior that is otherwise very hard to observe.

The kind of indirect and after-the-fact effects that I'm concerned about are pretty common, and they're commonly very hard to test. So hard, in fact, that they routinely spawn entire manual QA organizations, and protracted regression testing procedures. And clearly, it doesn't have to be that way. Over the course of developing Letterbook, I've spent less than 10 hours, total, on adding and configuring telemetry. Maybe less than 5. This specific enhancement to the integration tests took about 30 minutes, and most of that was looking up examples for context propagation so I could predict and isolate the correct trace ID for a test. And now Letterbook will have reliable, fast, maintainable, and automated tests of its most critical features.

Many times in the past I've been blocked from incorporating distributed tracing into applications, on the grounds that it would take too long or be too complicated. This experience directly contradicts that. But more to the point, too long and complicated for what? Without this capability, I would have had to set up some kind of convoluted system of virtual dependencies in order to make the relevant behavior visible to the tests. Or, failing that, commit to repeated and extensive manual testing, which would almost certainly kill the project. Even if setting up tracing was a long or complicated process, it would still easily be worth doing. The confidence and capabilities it creates are unbelievably valuable.


Cover photo by Andrea Piacquadio