The Governance Toolkit 365 (GT365) is a software-as-a-service solution that offers insights into a Microsoft 365 tenant and is part of the Microsoft Applied Cloud Stories initiative (#AppliedCloudStories). GT365 reads cloud data, stores the data and offers IT administrators monitoring and reporting functions to get a tenant overview at a glance, for example as a newsletter, in Power BI, as API or as a Bot that can be integrated in Microsoft Teams. The GT365 workloads are fully processed in Microsoft Azure using serverless computing. Here are some architectural insights on how to avoid throttling and how to use fan-out functions.
We received the requirements for the Governance Toolkit 365 from our Microsoft 365 customers. They want to see at a glance how many users and groups they are managing, which groups have guest users, how groups are used, how many users have licenses and so on. To answer these requests, we developed GT365 as a SaaS solution. This enables us developers to quickly manage and update the solution. Further information on the GT365 functionality can be found at www.governancetoolkit365.com and technical details at https://github.com/delegate365/GovernanceToolkit365.
From the business side, the service should of course be operated at low cost. Since large amounts of data can be expected in some cases, the technical implementation requires a good architecture as well. We therefore decided to operate the solution as an Azure Function in the consumption plan. This is ideal for the workload and makes sense since the GT365 only has to run once a day and not permanently. In addition, Azure Functions offer the option of encapsulating individual functions and operating them in parallel. These features and pay-per-use principle are optimal for our application.
So let's start ... at the end. Microsoft 365 administrators should see nice and interactive graphics of users, licenses, groups, members, group owners and guests. Some of our customer´s tenants include tens of thousands of users, thousands of groups with hundreds or thousands of members. We receive the data with the Microsoft Graph API. But we have to plan the amount of data we have to process. The application must not be interrupted or stopped before the work is done every day. The following screenshot, which was created with Power BI from the data from GT365, shows such an example. This Microsoft 365 tenant has about 62.000 users and about 17.000 groups.
Various interactive Power BI dashboards therefore display the result stored in an Azure storage. To ensure that the collected data can also be used by other customers and services, GT365 offers an API, a Bot, a newsletter system and much more.
How do the data get into the storage tables? Well, that's the job of some Azure functions. And we had some learnings on how to best use Azure Functions. Here we show the architecture from a bird's eye view.
Azure Functions as solution
When building such a solution, you often start with a simple function that grows gradually. We can implement several functions in an Azure Function App. Basically, the Start () function contains some functionality (in our case C# code). The function is executed at a certain time and runs until the task is completed.
You can find samples for developing code for Azure functions at https://github.com/Azure-Samples/azure-functions-tests.
In GT365, we have the following code blocks. We want to run through all customers (stored in a database). For every customer, we want to read all users. Also, we want to read all groups of that customer. Then, for every group, we want to read the members, guests, owners and other group metadata as here.
So is it a good idea to run each function (F1 to F5) after the previous one (sequentially)? No! That would not work with the amounts of data and within the possible time window and would not meet our goal.
Design the (Function) app architecture
In Azure functions, we can benefit from the microservices architecture, automatic management and scaling (fan-out). We also opted for the consumption schedule, which means that each function can only take up to 10 minutes run time. We have to split the workload into different parts and make sure that each function works for itself. So we design the architecture as follows. (This article describes the basic functions and a very simplified version.)
We use Azure storage queues as a messaging system between the functions to distribute the workload. Information that needs to be processed is queued, e.g. a tenant ID or a group ID. In the graphic, the functions are shown as blue boxes, the queues as yellow boxes. The orange boxes are clients that work with the collected data as mentioned above. A messaging system (a queue store) delivers asynchronous messages for communication between application components, basically as shown here.
So here is what happens:
- F1 is started by a schedule that is run once every night. Each customer ID is put in queue Q1.
- F2 listens to Q1 and runs for each customer (in parallel). This is a fan out. The tenant ID is also entered in Q2 and in Q3
- F3 listens to Q2. All users in the client are read page by page (100 users per page). Every record is written to the central Azure storage (grey box).
- F4 listens to Q3. This function gets a link with the next page of groups to read (100 groups per page). All groups in the client are read page by page. Every group is put into Q4 with it´s metadata. After a page is processed, F4 calls itself by putting the next page in Q3. This happens till no more pages are left.
- F5 listens to Q4. This function reads all members, owners, guests, etc of one specific group (100 users per page). Every record is written to the central Azure storage.
With that Azure Functions architecture, we ensure that
- each function runs independently and as a black box
- each function runs within a maximum of 10 minutes
- execution continues even if an error occurs
- an inexpensive approach is used to obtain data
So, let´s fill that model with some sample data. Let's say we have 20 customers to go through. Each customer has an average of 5000 users and 1000 groups with some hundred members as here. The light blue boxes show the number of function runs.
In F2, F4 and F5, the functions get 100 items per page and process that data, visualized with the green boxes with 100 in it. This also optimizes the number of external requests against the Graph API (and we have to avoid Throtteling/HTTP 429, that´s a different story). We see that F5 has the greatest load. In detail, the calculation looks like this:
For one customer: 1 + 1 + 50 + 10 + 1.000 = 1.062 function calls
For 20 customers: 1 + 20 + 1.000 + 200 + 20.000 = 21.221 function calls - that´s a lot of computing power required
The ingenious story here is: Azure functions are executed automatically with our queue triggers. For example, 20 queue entries can call 20 functions simultaneously. The functions are executed independently of each other on 20 different machines (containers) without the need for special programming. That´s awesome. This enormously reduces the overall runtime and optimizes the solution. All operations will be completed in a matter of minutes. The generated GT365 data can then be used from various clients and are refreshed every day.
With the right architecture and the right choice of technical implementation, big problems can be quickly transformed into small and solvable solutions. Serverless computing and Azure functions provide such a platform at low cost. This is the power of serverless computing using the example of GT365. I hope I could convey the idea of how solutions can be designed. Find further solutions at www.cloudstories.dev, on Twitter #AppliedCloudStories, and check out the ready-to-use Governance Toolkit 365. Try it out for free today!