On self-hosting a static website: Trying out a DIY serverless approach

11 Mar 2024 • ~2700 words • ~13 minute read

Two minor updates for this post! I did eventually drop the VPS, as I discovered bunny.net. I do plan to write a separate post on that. I also closed the AWS account used for the Cloudfront/S3 examples, so many of the links in this no longer work.

I've used a single VPS to serve numerous static sites for years. It runs an nginx server. It's simple and dependable. But in the last year or so, it's been a tad under used, serving only this site. This prompted me to look into alternatives, so I looked into taking a serverless approach.

I thought I'd document my findings. ~~(TLDR, I'm still using the same VPS with nginx.)~~

This is all about fulfilling a desire to self-host to some degree. I suppose the best way to describe that from a serverless point of view is DIY-serverless, or BYOI-serverless (bring your own infrastructure). I like to self-host where possible; I like getting my hands dirty. I know there are hundreds of online services that will do exactly what I want, quite often for free (Netlify, Cloudflare Pages, Vercel, etc), but I like to tinker.

Preamble

The brief is pretty simple, really, but I suppose the "requirements" I am working with are this:

The aim is to serve a static site (this static site, generated by Hugo)
The solution should save on cost
The solution should have TLS/SSL baked in
The solution should use a CDN or similar
The solution should not be complex

The following will outline my experiences using serverless object storage for the above. Breaking from the rules above, I did also try out Kubernetes. That's a beast of its own and a write up on that will come shortly.

I'll be using the 'Quick start' project from Hugo's documentation throughout.

Linode Object Storage

I'm a longtime user and lover of Linode, so this was the first place I looked for a solution. Under the hood, Linode's object storage service uses Ceph object storage, which maintains a high level of API compatibility with AWS S3.

It's super quick and easy to create a static website in Linode's object storage using the s3cmd CLI tool. From within my Hugo project, I just run the following commands:

$ hugo
$ s3cmd mb s3://<BUCKET_NAME>
$ s3cmd ws-create \
    --ws-index=index.html \
    --ws-error=404.html \
    s3://<BUCKET_NAME>
$ s3cmd \
    --no-mime-magic \
    --acl-public \
    --delete-removed \
    --delete-after \
    sync public/ s3://<BUCKET_NAME>

<BUCKET_NAME> can be anything, I believe, though in order to use it with domains the bucket name is supposed to match the full domain you intend to use. I'll go with obj-ws-1a.peterkeenan.net.

See the static site at the bucket's URL here. All good - so far. So, ideally at this stage all we want to do is slap a CDN in front of it by adding proxied DNS record to Cloudflare. So let's now visit obj-ws-1a.peterkeenan.net. And… it doesn't work!

Problem #1: Objects and ugly URLs

Based on the message being returned, the DNS record is routing to the bucket just fine - except it's not accessing the content, as the bucket's URL above does. This can be remedied by adding /index.html.

Something about proxying with Cloudflare doesn't play well with object-based static site hosting (remember in object storage, it's not a file system - everything is an object with a key, metadata and data). It's fine hitting actual objects themselves, but not the 'pretty URL'. So /posts/my-first-post/index.html is okay, but /posts/my-first-post is not.

Problem #2: No caching via Cloudflare

Contrast this to a non-proxied record obj-ws-1b.peterkeenan.net, which behaves as expected. You can hit the pretty URL and get the underlying object just fine.

The non-proxied version doesn't benefit from Cloudflare's caching. This is not ideal.

At this stage, it seems we're choosing between forms of UX/UI. Do we want low latency, or simpler, nicer user-friendly URLs?

Amazon S3

Having tried the Linode/Cloudflare pairing, I was curious to play around with Amazon S3 given that it has - for better or for worse - become 'the standard' for object storage API.

Creating buckets with website config is well documented, so I'll skip the step-by-step. I created an S3 bucket obj-ws-2.1.peterkeenan.net and enabled the static website settings, making the website publicly available. Just as with obj-ws-1 sites, we can navigate around the website just fine. So far, much the same result as our initial step with Linode.

Problem #3: Certificate mismatch

Adding the DNS record to my Cloudflare zone for peterkeenan.net to create obj-ws-2-1.peterkeenan.net yielded a slightly new result: it benefits from being proxied by Cloudflare, so gets caching, but there's a little certificate configuration to be done.

Cloudflare provides four options for TLS/SSL, but this is applied at 'zone' level (ie, for the entire domain), rather than isolated subdomains. I tend to opt for 'Full', but I believe we could opt for 'Flexible' here and be up and running, sort of - but let's press on.

Possible solution #1: S3 website + CloudFront

Amazon has their own service for caching called CloudFront. I set up a CloudFront distribution pointing at the bucket's website URL.¹ This gives us a new URL to hit. Here we enjoy:

caching
TLS/SSL
pretty URLs

Great! Job done? Not quite. There's a bit of extra leg work involved in using our own domain. Again, this is well documented, so I'm not doing a step-by-step, but in summary we have to:

request a certificate for the domain(s) in ACM
add unique records to our DNS in order to confirm ownership of the domain(s)
add the domain(s) we want to use to the CloudFront distribution, and associate them with the certificate

Only once the above is done can we then add a DNS record for the desired subdomain to the CloudFront distribution. Here it is in all its glory: obj-ws-2-1a.peterkeenan.net

Problem #4: Multiple 'entrypoints'

There's nothing wrong with the above solution, and I have a feeling its quite commonly used. I really don't like the trail of unused 'entrypoints' we have left though…

http://obj-ws-2-1.peterkeenan.net.s3-website-eu-west-1.amazonaws.com/
https://d2w1tge9bnpkc.cloudfront.net/
https://obj-ws-2-1a.peterkeenan.net/

One is borderline acceptable. Two feels messy… There's nothing to be done about the CloudFront URL, but we should be able to remove the S3 website URL from the mix as we're in AWS world, and we can be specific and grant internal resources access to each other.

Problem #5: Objects and ugly URLs (again)

I created a new bucket, this time not configuring it as a website. This bucket is entirely locked down and private. I created a new CloudFront distribution, this time using the Origin Access Controls and a bucket policy, allowing the distribution to access objects. This is, like all the other AWS stuff, pretty well documented.

The certificate I created in the previous step was good for a handful of subdomains, so I added this new subdomain: https://obj-ws-2-2.peterkeenan.net/

Landing on the home page actually serves content, unlike previously where it would give an access error, but this is purely because we get the option to define a default root object, index.html in this case. When navigating to the first post, we are back at square one: we get access denied on the pretty URL, but are fine if we explicitly hit the underlying object - /posts/my-first-post/index.html

This is one instance where I actually expected this to occur, as the bucket is not configured to serve websites, so requests are purely looking for an object at a path (key). It shouldn't serve an object from /posts/my-first-post because there is no object there.

Possible solution #2: S3 website with locked down resource policy

The issue outlined in the 'problem #4' can be remedied, but it is a bit hacky. Returning to the bucket that has been set up to serve website content, we can add a bucket policy that only allows traffic that contains a Referer header with a pre-defined secret. Something like this:

{
    "Version": "2008-10-17",
    "Id": "PolicyForRefererHeader",
    "Statement": [
        {
            "Sid": "AllowTrafficWithRefererSecret",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::<BUCKET_NAME>/*",
            "Condition": {
                "StringEquals": {
                    "aws:Referer": "<YOUR_SECRET_VALUE>"
                }
            }
        }
    ]
}

We can then set CloudFront to add this header and secret value when it requests content from the origin. While it doesn't remove the S3 website URL from the equation, it basically makes it useless to everything but the CloudFront distribution itself, provided the secret remains … uh … secret.

Some thoughts

Cast back to the initial Linode/Cloudflare examples:

proxied, where pretty URLs did not work
direct, where pretty URLs did work

I would have expected these to behave the same, because the buckets were configured to serve website content, and the DNS records were pointing to website endpoints. I assumed the unexpected behaviour with the proxied version was something to do with how Cloudflare works, rather than the bucket itself. I found a number of threads on Linode's community forum about this, but it's a bit odd. I think if I had been able to get this combination of Linode and Cloudflare up and running, I would have been quite happy. Or if there's another CDN I could use, that could suit me too.

It's worth mentioning that Linode's implementation of object storage does allow for access policies. In theory, we could lockdown our object-storage website endpoint to only permit access to Cloudflare IPs.

AWS, as expected, offers way more options and customisability: certificate management, caching, detailed access policies. I could even go further and use Route 53 for DNS management.

On being all in

I haven't ruled out moving things to S3/CloudFront. It's probably the best option for a DIY serverless approach. I think there's something to be said for not being all in on cloud stuff though. It's good to mix and match and not become overly reliant on one cloud provider.

I'm not big a fan of Cloudflare - I pretty much only use it for its CDN-like feature. I could manage my DNS records anywhere, but in order to benefit from Cloudflare's caching, you basically have to use their DNS service. You also have to use their proxy. I do question whether or not I even need to serve content 'globally', and wouldn't be fussed about abandoning Cloudflare, but the classic serverless site would typically combine object storage and a CDN.

On my reluctance for multiple endpoints

Am I a weirdo for not liking that trail of endpoints? Probably… I reckon this is common. For my simple static website, it's harmless provided there's clarity over the canonical URL. But as a pattern, it feels messy and needless.

This is how Cloudflare Pages, Netlify and such operate. They create a URL of their own and the customer optionally adds their own domain. Whether using a custom domain or not, that service URL doesn't disappear. This is unavoidable when using managed services online. I just don't like it.

VPSs have a single IPv4 and a single IPv6. I create a thing. I decide what meaningful domain I want that thing to have. I setup DNS records to point that meaningful domain to those source IP addresses. That's it. It's simple, elegant, clean. My thing has a single user-friendly URL. The only other way to access that thing is by using the IP address itself (which the URL does for you).

Returning to the requirements

Hosting a Hugo site

Let's just quickly return to the requirements I put out earlier. I was quite specific about wanting to serve a Hugo static website, because I knew it could potentially flag up issues relating to pretty URLs.

The problems I faced with pretty URLs and object access would easily be solved if this were a React app or similar. So I can see object storage websites being a great match for a SPA.

Hugo has the option of building for 'ugly' URLs. So, instead of hitting /posts/my-first-post you would hit /posts/my-first-post.html. So if it was unavoidable to combine Hugo and serving the website via object storage, there is a way. It's just a little old-school (nothing wrong with that!).

Cost

In terms of cost, it's hard to truly compare solutions, but this is the situation:

The current Linode VPS I use costs me $5 per month. Linode object storage costs $5 per month up to the first 250GB of storage capacity used. So if you are starting afresh and wanting to use Linode's object storage - there's no saving over a VPS, but the more you use Linode's object storage, the more cost effective it becomes. Once you hit more than 250GB, you're charged at $0.02 per additional GB.

Cloudflare have a generous free tier, though I am always suspicious of those… Anyways, in this example there is no cost associated with Cloudflare.

AWS complicate things a little by having various levels of free tier. CloudFront is a service that has a pretty good 'always free' quota, so the CDN element of the AWS solution is free for the purposes of this example.

S3 is pretty affordable. For the scenario outlined here, it'd probably be less than $0.05 per month, if that.

Amazon Certificate Manager is free for public certificates.

So if I really wanted to save my $5 VPS money, Linode/Cloudflare would probably be an entirely free solution, because I already use Linode's object storage so am already paying their flat-rate for the first 250GB. Except, it didn't work 😂

TLS/SSL

This is a standard offering on all serverless hosting services. Because it wasn't part of the actual issues, I actually skipped the fact that I created certificates myself for the Linode object storage websites. It's quick and easy and requires use of the certbot CLI tool. This was fine for me because I already use that on my VPSs.

AWS of course have a pretty easy to use service for this. I think the only advantage there is that AWS automatically renew public certificates (I think), while the certbot ones I created for Linode were manual, and will need recreating and reassociating with the bucket once the current certificates expire.

Then there's of course Cloudflare's attempt to do their own certification by having various tiers between the client, the proxy and the server. I'd maybe use this in a situation where I'm POC-ing something and want to just get up and running quickly, but definitely not for something long-lived.

Complexity

Where do we begin?! The DIY serverless hosting of a static website should be pretty simple, but as this article outlines there've been a few 'gotchas' along the way. Yes, these are specific to the technologies I've chosen. But even the towering Amazon's solution required a bit of hackiness.

The main working solution from this little spike involved IAM policies, certificate management, setting up CloudFront with custom domains. Yes, there's less server involved than a VPS running nginx, but I would say there's just as much upfront work involved. Whether that makes it more or less complex to get up and running than a classic VPS, I will leave up to you.

Overall, I feel there's possibly more architectural complexity with the serverless pattern than the simplicity of: here's a virtual server, with an IP address, with some stuff on it.

Conclusion

This has been a fun little exercise, but I'm sticking to my VPS a little longer… I do think there's something to the DIY serverless approach: as I mentioned earlier, had things worked with Linode I think I would have gone that route.

It surprises me that AWS don't make it simpler to really restrict traffic between their own services (CloudFront and S3 for example). I may have missed something here, so might come back to look again. I assumed that I would be able to write a bucket policy using CloudFront as a service principal, and defining the distribution ARN - but having to resolve to HTTP header secrets seems a bit bizarre!

Info

It's important to note that S3 uses two endpoints, the S3 one and the website one. When setting up a distribution, it suggests using the website one (if the bucket has it enabled), but you can use the S3 one.