Azure Functions SRE on Azure DevOps, The First Cut

This post is over a year old, some of this information may be out of date.

You may have heard of the term, "Walking Skeleton" if you work with the agile methodology. Alistair Cockburn defines the term "Walking Skeleton" in his article:

A Walking Skeleton is a tiny implementation of the system that performs a small end-to-end function. It need not use the final architecture, but it should link together the main architectural components. The architecture and the functionality can then evolve in parallel.

This concept is very important, from the DevOps or SRE (Site Reliability Engineering) point of view, because we experience fail-fast and fail-often while building the system and testing it, before going live. Throughout this, we can also secure the system's stability and reliability.

Let me interpret the quote above in a different way. The concept of "Walking Skeleton" is to build a system/application first in a working condition, no matter it is all hard-coded or not. When the Walking Skeleton is ready, it has to be running in the entire ALM process. And it includes unit tests, integration tests and end-to-end tests with CI/CD pipelines. Once everything is OK in the pipelines, then the Walking Skeleton gets more "Flesh" by "Continuous Improvement". It sounds straightforward. In fact, it doesn't go well unless other supporting parties have everything ready for us. For example, setting up CI/CD pipelines requires system access permissions, service principal impersonations, cloud resource access permissions, etc. Although we have our system/application developed and ready, without those non-functional requirements, our delivery can't be done. Therefore, to minimise these hassles, all the DevOps/SRE related requirements have to be sorted out with the Walking Skeleton at the very early stage of the delivery.

As stated above, the first step of running the Walking Skeleton is to run different testing environments consistently in the CI/CD pipelines. This post shows how to build the Walking Skeleton of Azure Functions app with all testing scenarios, set up CI/CD pipelines on Azure DevOps, and complete the first cut of SRE requirements.

Sample codes used in this post can be downloaded from this GitHub repository.

System High-level Architecture

In my previous post, we've already developed an Azure Function app and performed unit tests and integration tests. Here's the high-level architecture diagram of the Azure Functions app.

Writing Unit Tests and Integration Tests

I showed both unit tests and integration tests in my previous post. Unit testing uses the dependency injection feature from Azure Functions library and integration testing uses Mountebank to perform tests. I'm not going to repeat here. Let's move onto the end-to-end (E2E) testing.

Writing End-to-End Tests

Generally speaking, E2E testing comes with functional testing. Functional testing validates the acceptance criteria by manually running the application. If we can capture the functional testing scenarios in a coded way, they become a part of E2E testing, and we run them in the CI/CD pipelines. If we can mock external API dependencies, this E2E testing scenario can be a part of integration testing, too.

Now, here's the question. It sounds we re-use the same code base for both integration testing and E2E testing without modifying them. How can we achieve this? Let's have a look at the code below. HealthCheckHttpTrigger uses LocalhostServerFixture for integration testing.

	[TestClass]
	public class HealthCheckHttpTriggerTests
	{
	private const string CategoryIntegration = "Integration";
	private ServerFixture _fixture;

	[TestInitialize]
	public void Init()
	{
	this._fixture = new LocalhostServerFixture();
	}

	[TestMethod]
	[TestCategory(CategoryIntegration)]
	public async Task Given_Url_When_Invoked_Then_Trigger_Should_Return_Healthy()
	{
	// Arrange
	var uri = this._fixture.GetHealthCheckUrl();
	using (var http = new HttpClient())

	// Act
	using (var res = await http.GetAsync(uri))
	{
	// Assert
	res.StatusCode.Should().Be(HttpStatusCode.OK);
	}
	}

	[TestMethod]
	[TestCategory(CategoryIntegration)]
	public async Task Given_Url_When_Invoked_Then_Trigger_Should_Return_Unhealthy()
	{
	// Arrange
	var uri = this._fixture.GetHealthCheckUrl(HttpStatusCode.InternalServerError);
	using (var http = new HttpClient())

	// Act
	using (var res = await http.GetAsync(uri))
	{
	// Assert
	res.StatusCode.Should().Be(HttpStatusCode.InternalServerError);
	}
	}
	}

view raw integration-test-http-trigger.cs hosted with ❤ by GitHub

Let's have a look at the LocalhostServerFixture class.

	public class LocalhostServerFixture
	{
	private readonly MountebankClient _client;

	public MountebankServerFixture()
	{
	this._client = new MountebankClient();
	}

	public string GetHealthCheckUrl(HttpStatusCode statusCode = HttpStatusCode.OK)
	{
	this._client.DeleteImposter(8080);

	var imposter = this._client
	.CreateHttpImposter(8080, statusCode.ToString());
	imposter.AddStub()
	.OnPathAndMethodEqual("/api/ping", Method.Get)
	.ReturnsStatus(statusCode);

	this._client.Submit(imposter);

	return "http://localhost:7071/api/ping";
	}
	}

view raw mountebank-server-fixture.cs hosted with ❤ by GitHub

As you can see the code above, it only works in the integration testing scenario. If we want to re-use this code for both integration and E2E testing, we need to refactor the LocalhostServerFixture class.

`ServerFixture`

I use a simple factory method pattern for the refactoring exercise. Let's create a ServerFixture class. It declares the method, CreateInstance(serverName), to create an instance, based on the server name passed.

	public abstract class ServerFixture
	{
	public static ServerFixture CreateInstance(string serverName)
	{
	var type = Type.GetType($"FunctionApp.Tests.Fixtures.{serverName}ServerFixture");
	var instance = (ServerFixture)Activator.CreateInstance(type);

	return instance;
	}

	public abstract string GetHealthCheckUrl(HttpStatusCode statusCode = HttpStatusCode.OK);
	}

view raw server-fixture.cs hosted with ❤ by GitHub

Let's refactor the existing LocalhostServerFixture class.

`LocalhostServerFixture`

LocalhostServerFixture now inherits ServerFixture. The existing GetHealthCheckUrl() method now has the override modifier.

	public class LocalhostServerFixture : ServerFixture
	{
	...

	public override string GetHealthCheckUrl(HttpStatusCode statusCode = HttpStatusCode.OK)
	{
	...
	}
	}

view raw localhost-server-fixture-revised.cs hosted with ❤ by GitHub

We've just completed refactoring LocalhostServerFixture. It's time to create another fixture class for the E2E tests.

`FunctionAppServerFixture`

Let's create the FunctionAppServerFixture class that inherits ServerFixture, and implement the GetHealthCheckUrl() method to only return the endpoint URL. As you can see, the relevant information to compose the endpoint URL comes from the environment variables.

	public class FunctionAppServerFixture : ServerFixture
	{
	private const string FunctionAppNameKey = "FunctionAppName";
	private const string FunctionAuthKeyKey = "FunctionAuthKey";

	private readonly string _functionAppName;
	private readonly string _functionAuthKey;

	public FunctionAppServerFixture()
	{
	this._functionAppName = Environment.GetEnvironmentVariable(FunctionAppNameKey);
	this._functionAuthKey = Environment.GetEnvironmentVariable(FunctionAuthKeyKey);
	}

	public override string GetHealthCheckUrl(HttpStatusCode statusCode = HttpStatusCode.OK)
	{
	return $"https://{this._functionAppName}.azurewebsites.net/api/ping?code={this._functionAuthKey}";
	}
	}

view raw functionapp-server-fixture.cs hosted with ❤ by GitHub

Now, we've got FunctionAppServerFixture for E2E testing. Refactor the test code!

`HealthCheckHttpTriggerTests`

First of all, we need to modify the Init() method of the HealthCheckHttpTriggerTests class. It gets the server name from the environment variable, which decides to create either LocalhostServerFixture for integration testing or FunctionAppServerFixture for E2E testing. In addition to that, we add another decorator, TestCategory("E2E"), on the test method for E2E testing.

	[TestClass]
	public class HealthCheckHttpTriggerTests
	{
	private const string CategoryIntegration = "Integration";
	private const string CategoryE2E = "E2E";

	private const string DefaultServerName = "Localhost";
	private const string ServerNameKey = "ServerName";

	private ServerFixture _fixture;

	[TestInitialize]
	public void Init()
	{
	// Gets the server name from the environment variable.
	var serverName = Environment.GetEnvironmentVariable(ServerNameKey);
	if (string.IsNullOrWhiteSpace(serverName))
	{
	serverName = DefaultServerName;
	}

	// Creates the fixture from the factory method pattern.
	this._fixture = ServerFixture.CreateInstance(serverName);
	}

	...

	[TestMethod]
	[TestCategory(CategoryIntegration)]
	// Adds the test category for end-to-end testing
	[TestCategory(CategoryE2E)]
	public async Task Given_Url_When_Invoked_Then_Trigger_Should_Return_Healthy()
	{
	...
	}
	}

view raw health-check-http-trigger-tests-revised.cs hosted with ❤ by GitHub

Now, we've got the test code that is used for both integration tests and E2E tests. Let's run the tests in our local development environment.

Run Integration Tests

Based on the instruction from my previous post, run the Mountebank server and the Azure Functions runtime locally, then execute the command below for integration tests.

dotnet test [Test_Project_Name].csproj -c Release --filter:"TestCategory=Integration"

view raw integration-test-dotnet-cli.sh hosted with ❤ by GitHub

Our refactored integration works just fine and here's the result.

Run End-to-End Tests

This time, we're running the E2E tests from our local machine. To do this, we assume that the Azure Functions app has already been deployed to Azure. Set up the environment like below. Don't get bothered of these key and name as they are not real.

	set ServerName=FunctionApp
	set FunctionAuthKey=uCQISIGoaMYz6d/6yR/Q2aw6PVdxLhzOh1gy9IjfoRbTN9OgmOFgTQ==
	set FunctionAppName=fncapp-mountebank

view raw e2e-test-environment-variables.sh hosted with ❤ by GitHub

The following command is to run the E2E tests.

dotnet test [Test_Project_Name].csproj -c Release --filter:"TestCategory=E2E"

view raw e2e-test-dotnet-cli.sh hosted with ❤ by GitHub

As we implemented above, the E2E tests use FunctionAppServerFixture and here's the result.

Now, we've got the LocalhostServerFixture class refactored, and both integration tests and E2E tests successfully run on our local machine.

Compose Azure DevOps CI/CD Pipelines

Based on our successful local test runs, we're going to compose Azure CI/CD pipelines, as the last step of building the Walking Skeleton. Both unit tests and integration tests are placed within the build stage, and the E2E tests are placed within the release stage. The pipeline written in YAML uses the Azure DevOps Multi-Stage Pipelines feature. You can see the whole pipeline structure from the source code. I'm extracting some bits and pieces here for discussion.

Unit Tests

This is the extraction of the unit tests steps from build.yaml.

	stages:
	# Build Pipeline
	- stage: Build
	jobs:
	- job: HostedVs2017
	...
	variables:
	- name: UnitTestRunStatus
	value: ''

	steps:
	...
	- task: CmdLine@2
	displayName: 'Unit Test Function App'
	inputs:
	script: 'dotnet test $(Build.SourcesDirectory)\test\FunctionApp.Tests\FunctionApp.Tests.csproj -c $(BuildConfiguration) --filter:"TestCategory!=Integration&TestCategory!=E2E" --logger:trx --results-directory:$(System.DefaultWorkingDirectory)\UnitTestResults /p:CollectCoverage=true /p:CoverletOutputFormat=cobertura /p:CoverletOutput=$(System.DefaultWorkingDirectory)\CoverageResults\coverage'
	env:
	ServerName: Localhost
	continueOnError: true
	- task: PowerShell@2
	displayName: 'Save Unit Test Run Status'
	inputs:
	targetType: Inline
	script: 'Write-Host "##vso[task.setvariable variable=UnitTestRunStatus]$(Agent.JobStatus)"'
	- task: PublishTestResults@2
	displayName: 'Publish Unit Test Results'
	inputs:
	testRunTitle: 'Unit Tests'
	testResultsFormat: VSTest
	testResultsFiles: '$(System.DefaultWorkingDirectory)/UnitTestResults/*.trx'
	mergeTestResults: true
	- task: PublishCodeCoverageResults@1
	displayName: 'Publish Code Coverage Results'
	inputs:
	codeCoverageTool: 'cobertura'
	summaryFileLocation: '$(System.DefaultWorkingDirectory)/CoverageResults/*.xml'
	failIfCoverageEmpty: false
	...

view raw unit-test-build.yaml hosted with ❤ by GitHub

Unit Test Function App task looks overwhelming. In overall, it's not that different from the test command above. Instead, it comes with a few more options.
- --filter: This option filters out all tests not having either TestCategory of Integration nor E2E. In other words, with this filter, this task only performs unit tests.
- --logger: This option has a value of trx, which exports the test results in the .trx format, which is used in Visual Studio.
- --results-directory: This option sets the output directory of the test result.
- /p:CollectCoverage: This option enables the code coverage analysis.
- /p:CoverletOutputFormat: This option has a value of cobertura, which defines the output format of the code coverage.
- /p:CoverletOutput: This option sets the output directory of the code coverage analysis result.
In addition to this, this task has another attribute of continueOnError and its value of true. It forces the pipeline to continue, although this task fails (test fails).
Save Unit Test Run Status task stores the previous task status, whether it Succeeded, Failed or SucceededWithIssues, to UnitTestRunStatus.
Publish Unit Test Results task uploads the test result. As trx is used for the test result format, this task should select VSTest for the testResultsFormat attribute.
Publish Code Coverage Results task uploads the code coverage analysis report. As it uses the cobertura format, the codeCoverageTool attribute gets cobertura as its value.

That's it for the unit test pipeline setup. Let's move on to the integration test jobs on the pipeline.

Integration Tests

This is the extraction of the integration test steps from build.yaml.

	stages:
	# Build Pipeline
	- stage: Build
	jobs:
	- job: HostedVs2017
	...
	variables:
	- name: IntegrationTestRunStatus
	value: ''

	steps:
	...
	- task: CmdLine@2
	displayName: 'Integration Test Function App'
	inputs:
	script: 'dotnet test $(Build.SourcesDirectory)\test\FunctionApp.Tests\FunctionApp.Tests.csproj -c $(BuildConfiguration) --filter:"TestCategory=Integration" --logger:trx --results-directory:$(System.DefaultWorkingDirectory)\IntegrationTestResults'
	env:
	ServerName: Localhost
	continueOnError: true
	- task: PowerShell@2
	displayName: 'Save Integration Test Run Status'
	inputs:
	targetType: Inline
	script: 'Write-Host "##vso[task.setvariable variable=IntegrationTestRunStatus]$(Agent.JobStatus)"'
	- task: PublishTestResults@2
	displayName: 'Publish Integration Test Results'
	inputs:
	testRunTitle: 'Integration Tests'
	testResultsFormat: VSTest
	testResultsFiles: '$(System.DefaultWorkingDirectory)/IntegrationTestResults/*.trx'
	mergeTestResults: true
	- task: PowerShell@2
	displayName: 'Cancel Pipeline on Test Run Failure'
	condition: and(succeeded(), or(eq(variables['UnitTestRunStatus'], 'Failed'), eq(variables['UnitTestRunStatus'], 'SucceededWithIssues'), eq(variables['IntegrationTestRunStatus'], 'Failed'), eq(variables['IntegrationTestRunStatus'], 'SucceededWithIssues')))
	inputs:
	targetType: Inline
	script: \|
	Write-Host "##vso[task.setvariable variable=Agent.JobStatus]Failed"
	Write-Host "##vso[task.complete result=Failed]DONE"
	...

view raw integration-test-build.yaml hosted with ❤ by GitHub

Integration Test Function App task is the same as the one for unit tests, except all the code coverage analysis options. It also has the filter of TestCategory=Integration so that this task only takes care of integration test methods.
Save Integration Test Run Status task stores the integration test status to IntegrationTestRunStatus.
Publish Integration Test Results task uploads the test result to the pipeline.
Cancel Pipeline on Test Run Failure task is important. It looks for the value of UnitTestRunStatus and IntegrationTestRunStatus. If both unit tests and integration tests are successful, it lets the pipeline continue so that all artifacts are generated for release. If either unit tests or integration tests fail, it lets the pipeline stop and mark the pipeline as Failed.

Now we've got the integration test pipeline setup. E2E test is coming up.

End-to-End Tests

E2E tests are performed after the application is deployed to Azure. Here are the steps for the E2E testing extracted from the pipeline. To use YAML for the release in the pipeline, Multi-Stage Pipelines feature MUST be turned on. If you're interested in this feature, please have a look at my the other post.

	# Release Pipeline
	- stage: Release
	jobs:
	- deployment: HostedVs2017
	...
	variables:
	- name: TestRunStatus
	value: ''

	strategy:
	runOnce:
	deploy:
	steps:
	...
	- task: CmdLine@2
	displayName: 'Run E2E Tests'
	inputs:
	script: 'dotnet vstest $(Pipeline.Workspace)\e2e\FunctionApp.Tests\FunctionApp.Tests.dll --testCaseFilter:TestCategory=E2E --logger:trx --resultsDirectory:$(Pipeline.Workspace)\TestResults'
	continueOnError: true
	env:
	ServerName: FunctionApp
	FunctionAuthKey: $(FunctionAuthKey)
	FunctionAppName: $(FunctionAppName)
	- task: PowerShell@2
	displayName: 'Save Test Run Status'
	inputs:
	targetType: Inline
	script: 'Write-Host "##vso[task.setvariable variable=TestRunStatus]$(Agent.JobStatus)"'
	- task: PublishTestResults@2
	displayName: 'Publish E2E Test Results'
	inputs:
	testRunTitle: 'E2E Tests'
	testResultsFormat: VSTest
	testResultsFiles: '$(Pipeline.Workspace)/TestResults/*.trx'
	mergeTestResults: true
	- task: PowerShell@2
	displayName: 'Cancel Pipeline on Test Run Failure'
	condition: and(succeeded(), or(eq(variables['TestRunStatus'], 'Failed'), eq(variables['TestRunStatus'], 'SucceededWithIssues')))
	inputs:
	targetType: Inline
	script: \|
	Write-Host "##vso[task.setvariable variable=Agent.JobStatus]Failed"
	Write-Host "##vso[task.complete result=Failed]DONE"

view raw e2e-test-build.yaml hosted with ❤ by GitHub

Run E2E Tests task is similar to the other two test steps. Within the build stage, as tests are performed against the .csproj projects, we use dotnet test ... command, while in the release stage, we test against .dll files. Therefore, we should use the dotnet vstest ... command. Because of the command change, the options are also changed, even though they do the same thing.
- --testCaseFilter: It's the same option as --filter. Set up the value to TestCategory=E2E so that only E2E tests are performed.
- --resultsDirectory: It's the same option as --results-directory.
Let's have a look at the environment variables part. For E2E testing, extra environment variables are required. Therefore, this task includes a few attributes under the env attribute.
Save Test Run Status task stores the test run status to the TestRunStatus variable.
Publish E2E Test Results task uploads the E2E test results to the pipeline.
Cancel Pipeline on Test Run Failure tasks checks the TestResultStatus value to determine the release stage has succeeded or not. If this value is Failed or Canceled, the pipeline itself is failed or cancelled respectively.

Now we've got all the pipeline details, including three different test runs. Let's run the pipeline. Once it's done, you can find out the high-level result view with each stage on the Summary tab.

This Tests tab articulates the test results from unit tests, integration tests and E2E tests.

This Code Coverage tab shows the code coverage analysis results. From this analysis, more unit tests need to be written (oops).

So far, we've composed the Walking Skeleton for Azure Functions API through Azure DevOps. As mentioned at the beginning of this post, the Walking Skeleton is the minimal set as the working condition. In addition to this, in the automated CI/CD pipeline, all testing scenarios are running. Therefore, when we add more "Flesh" onto the Walking Skeleton over time, it only requires minimal efforts for the growth of the system/application, with extra testing scenarios. From the SRE perspective, automation is essential, and that automation should comprise almost everything. Now, our Walking Skeleton has got everything.

In fact, SRE is not only about automation, but also about the broader practice including monitoring, scaling and resiliency. But they are beyond this post. Once I have another chance, I'll discuss those topics too. I hope this post would help you start thinking of SRE experiences.