Scaling the AWS Console by Steven GassertScaling the AWS Console by Steven Gassert

Scaling the AWS Console

Steven Gassert

Steven Gassert

In large-scale cloud environments, delivering seamless user experiences is no easy feat, especially when dealing with vast amounts of configuration data required for user interactions. This article dives into a project I led at Amazon Web Services, where our team addressed performance and scalability issues for the AWS Console, achieving a boost in both customer experience and cost savings.
The Challenge: Managing 700MB of Configuration Data on Every Request
Before this project, the AWS Console team faced a unique challenge: every page load required 700MB of configuration information to be included in the <meta /> tag of the HTML response. This data, critical to how users could search and interact with AWS services, had to be refetched each time a page loaded. The 700MB payload included details like which AWS services were publicly accessible (such as S3 and DynamoDB) versus those restricted to internal access (like AWS Bedrock before its public launch).
This setup presented several issues:
High Latency: Due to the massive data transfer, the time to contentful paint was delayed, impacting the user experience.
Inefficiency in Resource Usage: Transmitting 700MB per request incurred considerable network costs, especially given AWS's global user base (roughly 4 billion requests per day)
The Solution: A Two-Pronged Approach to Data Distribution and Access Control
To address these issues, I implemented a two-part solution: a CDN-based approach for static configuration data and a high-performance API layer for user access control.
Delivering Configuration Data via CDN We moved the 700MB of configuration data out of the HTML meta tags and into a CDN-served asset. By using CloudFront, we could ensure this data was globally distributed, reducing latency and achieving high availability. Here’s how it worked:
Minimal Latency: Storing configuration data as a CDN asset allowed us to leverage CloudFront’s caching, reducing the time to access the data significantly. Now, the data only needed to be fetched from the CDN, which had points of presence around the world.
Infinite Caching with Fingerprinting: To avoid unnecessary reloads, we configured CloudFront to cache the CDN asset indefinitely. Each update to the configuration data generated a new fingerprinted filename, ensuring that browsers would only fetch new data when required, without frequent invalidations.
Availability and Scalability: Using a CDN enabled us to handle high request volumes and distribute data efficiently, even during peak periods.
Creating a High-Performance API for Access Control To handle 600 transactions per second across every AWS region, we introduced a dedicated API layer to check user permissions for specific AWS services. This API played a crucial role in maintaining the security and access control of AWS resources, ensuring users could only view publicly available services like S3 and DynamoDB, while restricting access to internally available services like AWS Bedrock (before its public launch).
Scalable, Global Access Control: The API was designed to support every AWS region, providing a consistent experience to users regardless of location. This solution ensured compliance with internal security policies while maintaining high performance.
Optimized for Speed: By offloading access control checks to a dedicated API, we could reduce the burden on the HTML payload, further improving time to contentful paint.
Results: Faster Load Times and Cost Savings
This solution had a significant impact on both performance and cost:
Reduced Latency: Removing the 700MB payload from every request led to a noticeable decrease in load times. With the CDN caching mechanism, the configuration data could be served almost instantly to users, improving their overall experience.
Cost Savings: By eliminating the need to send large amounts of configuration data with every request, we reduced data transfer costs considerably. Furthermore, the caching strategy allowed us to serve this data efficiently without needing constant updates.
Reliable Updates: The CDN asset’s fingerprinting ensured updates were smoothly managed without disrupting users. Each configuration change led to a new asset version, making updates seamless while maintaining the cache's integrity.
Conclusion
Leading this project at AWS underscored the importance of designing for both scalability and user experience. By transitioning configuration data to a CDN asset and introducing a robust API for access control, we optimized the AWS Console's performance, cut costs, and ensured a secure, streamlined experience for global users. This project exemplifies how thoughtful infrastructure design can solve complex challenges at scale, benefiting both users and the business.
Like this project

Posted Nov 4, 2024

Reduced the latency of loading the AWS Console (console.aws.amazon.com) for billions of daily requests and saved money for Amazon Web Services