Several months ago, my company's costs for Amazon Web Services increased unexpectedly by several hundred USD. It's not huge, but something is amiss.
I check the reason. The costs are coming from a small landing page with 2 images. It is hosted through CloudFront. I thought this was a nifty, low-maintenance, high-performance solution. Amazon takes care of everything.
Turns out I didn't configure rate limits using the AWS WAF & Shield feature. Some anonymous guy who hates this landing page performs a burst DDoS attack, and I'm eating the cost. Whoops.
I configure the WAF rate limit rules. Why doesn't Amazon take care of this? Rate limits should be turned on by default. They're actually quite complicated to set up! I spent several hours. It's an investment in learning.
Another month goes by. The costs spike even higher. It's the same kind of DDoS spikes. What, aren't my rate limit rules working? I check them. Duh, this is even less intuitive than I thought. I made a wrong choice and the limits were ineffective. Fix that. This should work now.
Another month. The costs are even higher! What? It's the same kind of DDoS spikes. Except, now the rate limits are working. Most of the requests are blocked. But not the first seconds! Amazon says it: rate limits aren't effective immediately. The service takes time to determine the limits were exceeded. During the first few seconds, millions of requests are served, and I pay the bill.
OK. I configure even lower rate limits. Maybe this will help...? Another month goes by. Costs are even higher. Same DDoS burst attacks continue.
I contact Amazon support. I point out this is a problem. They say, we have Shield Advanced, it has better protection against spike attacks. Well, maybe. But the cost is $3,000 per month. For a landing page with 2 images.
So I create a budget EC2 instance and move the page to that. Low-tech solutions for high-tech problems.
As far as I can tell, and Amazon support confirms this, there is no cost-effective solution. CloudFront is dangerous to use. Even with WAF rate limits.
If you have $3,000 bucks to spare for Shield Advanced each month, maybe that works better. Maybe even that is not quite there. (Support says Shield Advanced gives a full refund in case of spike attack. They wouldn't need refunds if rate limits were effective!)
Conceptually, the problem is easy to solve. There should be a rate limit configurable per CloudFront node. Then each node can enforce its individual rate limit, and it can do so immediately. Then, the existing distribution-wide rate limits can kick in on top of that.
I heard CloudFlare has better solutions for this. If someone keeps picking on this poor landing page, maybe I'll try that.