-
Notifications
You must be signed in to change notification settings - Fork 55
Green Software Foundation Reference Guide Manual
The purpose of Green Software Foundation Software Carbon Intensity (SCI) specification will be to enable standardization across industry empowering individuals and organizations to make more informed choices in the software solutions that they pick. The specification can be applied to any software to measure and reduce its carbon emissions by creating a standardized and practical methodology.
In this guide, our intent is to demonstrate how we can apply SCI specification to a software application. For the software application, we will take an example of a fintech application that provides ESG investment opportunities to an end-user. The user can be a retail investor or an advisor. All the business functionality is delivered as a set of APIs. The mobile application (available to retail users) and web-based portal (for advisors) uses the APIs to interact with the fintech application.
The software stack comprises the following components
The process for creating the SCI score of the application can be broken down into the following high-level steps Define the Software boundary Choose R - Define the functional unit for your application. Define what components need to be measured and how it would be measured
The first step in calculating the SCI of the application is to define the software boundary for your application. In our fintech application, the software stack defines our software boundary.
The software boundary of the application would help you determine what components to include to capture the carbon footprint of the application. The maximum size of the boundary should be everything within the realm of influence of your software application. If there is a choice you can make regarding the software which can reduce emissions, that should be included in the software boundary. For the fintech application, this includes the following -
-
Carbon footprint for developing the software application – This includes the carbon footprint of your development, testing or pre-production environments. For instance, examples would include application servers, databases, environments used for training the models and environments used for delivering the application.
-
Carbon footprint for running the software application - This includes the carbon footprint of your production environments. For instance, examples would comprise servers, databases, security and monitoring infrastructure, failover and redundancy infrastructure and backups for running the production application.
-
Carbon footprint for delivering the software application – This includes the carbon footprint of how your application is being delivered – as a web application, mobile application and so on. For instance, this would comprise of calculating the carbon footprint of your web application, like for a user interface being delivered on a mobile application, this will include the energy consumption of the device and also the size of your web page (i.e., byte transferred) over the network to display the entire page and the backend serving the request. The backend footprint would come from step 2 above (- Carbon footprint for running the software application)
Apart from the above approach, the embodied emissions of the software application should also be factored in (wherever applicable) for the SCI calculation.
Our fintech application also relies on third-party APIs to get the financial data as part of the data acquisition, and in some cases, the carbon footprint of the software/services providing the financial data may not be published. As part of SCI implementation, it’s a good practice to call such instances for your application where data is not available for calculation or use a proxy to fill in the carbon footprint details. The same may be true for other components like embodied carbon or using a shared hosting infrastructure.
Next, we need to choose R for the fintech application
Choose R for the application
As per the SCI specification, the SCI is a rate, carbon emissions per one unit of R. From the specification, Software Carbon Intensity Equation is
SCI = ((E * I) + M) per R Where: E = Energy consumed by a software system I= Location-based marginal carbon emissions M = Embodied emissions of a software system R = Functional unit (e.g. carbon per additional user, API-call, ML job, etc)
It’s a score rather than a total, the carbon emissions of your software per a baseline R. R can be defined based on your application and how does your application scale - by user, by APIs, by transactions and so on.
In the fintech application, we define R as per user, hence the metric would be Carbon per User. In order to calculate R, we need to define the baseline. For the fintech application, the baseline can come up from load test/stress test being carried out prior to production release ( or equivalent benchmarking techniques). So, if you can have an initial load test for 1000 users, you define an average for one user. When your application is deployed in production, you can then calculate carbon delta over a period of time.
Define what components need to be measured and how it would be measured
Given the above strategy, let’s look at how to capture the carbon footprint for each component.
The data capture component of a fintech application consists of getting structured financial data (i.e., technical and fundamental data) from third-party vendors through APIs and FTP upload.
The data capture components also contain a custom in-house solution that crawls the internet for unstructured financial content (i.e., news sources and documents) daily and uses it for market predictions and analysis. Both the data sources (structured financial data + unstructured financial content) are later used by the Analytics environment for building the ML models.
The SCI for our data acquisition would include an average of the following -
= SCI score provided by the third-party vendor (if available) + SCI score of the fintech data acquisition component
The SCI score of the fintech data acquisition component should include the supporting infrastructure. For instance, for the data acquisition component that is deployed on the cloud, this would include the following -
- Carbon footprint of the computing resources (i.e., VMs or Serverless based on your architecture) running the data acquisition component.
- Carbon footprint of the database server (i.e., managed database infrastructure) which stores the acquired data.
- Carbon footprint for continuous integration and continuous deployment environment and pipeline code.
- Carbon footprint for creating backups and storage
- Carbon footprint of computing resources to support redundancy and failover.
- Carbon footprint of the network used for data transmission (i.e. fetching the documents from the internet)
The data acquisition application might have used third-party libraries to build out the code functionality. The data acquisition application should also include the SCI of the libraries (if available)
The data platform environment consists of the data pipeline and database infrastructure for the fintech application.
The data pipeline component takes the raw data (structured and unstructured) stored by the data acquisition component and converts it to a format that can be used for downstream processing (i.e., creating ML features, variations in stock price changes etc.). The processed data is stored in the database.
The data pipeline environment runs in a cloud region, where there is a lower carbon footprint.
To calculate the SCI for the data pipeline component, the supporting infrastructure includes
- Carbon footprint of the computing resources (i.e., we assume serverless architecture) running the data pipeline component.
- Carbon footprint of the database server (i.e., managed database infrastructure) which stores the processed data.
- Carbon footprint for creating backups and storage
- Carbon footprint of computing resources to support redundancy and failover (i.e., standby, or active-passive database instances)
- Carbon footprint of the network (i.e., data transferred from data acquisition database component to data pipeline)
Analytics and insights
The analytics and insight platform environment consists of generating investment insights using deep learning and machine learning.
We assume 2 ML models
- Derive equity sentiment from unstructured data (i.e., daily news, quarterly reports)
- Equity recommendations based on structured data (i.e., technical, and fundamental data)
- Augment equity recommendations with sentiments for daily and long-term recommendations.
To calculate the SCI for ML model, the supporting infrastructure includes –
- Carbon footprint of servers (CPU, GPU or TPU) used for training the model
- Carbon footprint for training and validating the model
- Carbon footprint for fine-tuning pre-trained model (if the pre-trained model is used)
- Carbon footprint of updating/re-training the model every X interval (i.e after it deployed in production, based on feedback)
- Carbon footprint of data acquisition and data pipeline (i.e., calculated earlier from data acquisition and data platform environment)
- Carbon footprint of storing the model and its version
- Carbon footprint for MLOps infrastructure and pipeline
- Carbon footprint of database which stores the insights generated from ML model
- Carbon footprint for creating backups and storage.
Please note, as you keep accumulating data, the downstream processing would require more computation over a period, which changes the SCI score incrementally. Now, how do you design sustainable applications and keep SCI score in control, would be covered in later articles.
The computing environment provides an environment to run the application backend and implementation for the public-facing fintech APIs. The backend functionalities and the ML prediction service is available as a set of microservices, which are available as containers.
The containers are deployed into container-managed solutions like the Kubernetes cluster. Multiple regional clusters are used for high availability and managing redundancy and the containers are auto-scaled by the Kubernetes cluster based on load/usage.
To calculate the SCI for the computing environment, the supporting infrastructure includes –
- Carbon footprint of Kubernetes environment running the containers (i.e., master for management and nodes for running the containers).
- Carbon footprint of the auto-scaling environment (i.e., new nodes added for the autoscale duration)
- Carbon footprint for managing redundancy (i.e., multi-region Kubernetes clusters)
- Carbon footprint for Kubernetes supporting infrastructure (ingress, egress etc.)
- Carbon footprint of notification services (i.e., sending push notifications to devices). These are implemented as serverless architecture.
- Carbon footprint of database which stores the user data for the fintech application and personalized investing recommendations
- Carbon footprint for backups (i.e., databases and environments)
- Carbon footprint for CI/CD environment and pipeline
- Carbon footprint for caching servers, CDN and edge environments for high performance.
The fintech application functionality is available as a set of APIs. The API implementation is provided by the microservices applications running in the computing environment as discussed in the computing environment section.
The Carbon footprint of supporting infrastructure for API includes
- Carbon footprint of API gateway and implementation (i.e services like Apigee, Amazon API Gateway)
- Carbon footprint of API implementation (i.e this value would come from the computing environment that was discussed in the computing environment)
- Carbon footprint for caching servers, CDN and edge environments for high performance.
The tool's application consists of the web application and the mobile application for the fintech application. The web and mobile applications use APIs to communicate with the fintech application.
The web application consists of web frameworks like React or Angular and bootstrap for responsive design. The mobile application is developed using flutter.
To calculate the SCI for web applications, we also need to consider the data transferred over the network during loading the web page, along with the energy intensity of the user’s device and the network. As part of SCI, one of our projects is looking at how to provide average values (of commonly used devices and systems) which can be used by the application.
Apart from the above, the SCI of fintech web/mobile applications also includes the SCI for the API implementation, which was calculated in the API section.
The security environment consists of providing security and threat detection services/solutions to the fintech application. The monitoring environment provides audit and visibility into the application and services being used in terms of central logging, performance, alerts and SRE principles.
The fintech application should consider all the security and monitoring infrastructure for calculating the SCI score.
The Carbon footprint of supporting infrastructure includes -
- Carbon footprint of security solutions (i.e services like AWS GuardDuty, Google Cloud Armor, Azure advanced threat protection)
- Carbon footprint of monitoring environment (i.e services like Cloud Monitoring)
- Carbon footprint for storage for logs and security/monitoring (i.e including archives)
- Carbon footprint for generating and sending alerts
You can calculate the SCI score of each of the above software components (i.e data platform, analytics and insight etc) as part of your stress/load testing. Once you deploy the application in production, you can compare it against the real values based on actual usages and can use the real value as a benchmark for reporting.
For the computing environment, depending on your architecture design, you can also calculate the SCI for each of your microservices. At a minimum, you should provide the SCI for the total computing environment.
For the tools component, as the mobile application would be installed globally (and electricity intensity and the network would vary) and you might need to calculate SCI score across regions and average it. You need to have telemetry data (for devices, regions etc.) to calculate the score.
One the application is up and running, monitoring the application for SCI deviation should be part of the SRE strategy. You would define a baseline SCI score based on your initial deployment and use it to report the baseline score and improvements/deviations over a period of time.
The SCI score can be computed for each of the components as shown above and you can set up alerts based on the deviation. For instance, the SCI score of the data capture component should not be deviated by 10% from the base value.
For an application, which does not provide SCI calculation at the component level, you should at a minimum define SLI at an overall environment level (i.e production instance, development, pre-production etc.) or at the overall application level, based on how much granular data is available for calculating the SCI score.
This completes our reference guide on how to calculate the SCI score for a real-world application
Add embodied carbon in SCI equation wherever applicable Add examples of on-premises to include embodied carbon (add racks, HVAC etc..) Some of the carbon footprint calculations are common across, this can be consolidated. Currently, it's kept separate, so a reader can directly go to the relevant component and will have separate calculations for each component.