Skip links

Let’s Talk About Lambdas

Hey kids, how goes it? I’m back in San Francisco enjoying the 50 degree walks to my WeWork in the morning. Sweatshirts in the summer? How whimsical!

Anyway, I was scrolling through Hacker News the other day (as one does) and came across this gem: The Demise of the Mildly Dynamic Website. In the article, Hugo makes some interesting observations about the evolution of web development over the last 25 years or so. They don’t come across as critical of this evolution (at least to me) but more as “we should notice what we’ve lost:”

Even more interestingly, these two attributes combined to create something very special in PHP: support for the “hackish state”, an environment optimal for casual tinkering with server-side scripting — without having to bring out a whole framework and take the time to set it up. If you had an idea for some kind of dynamic, server-side behaviour, you could probably have it prototyped in PHP in 15 minutes. It might be crude, but it would be real. But the “hackish state” of web development that PHP supported went beyond this in a way that, having experienced it, I find hard to describe; perhaps the closest analogy is environments designed to enable programmatic art, like the interesting recent development of live music programming environments. You could call it a kind of jamming. Though there are countless server-side web frameworks, PHP is uniquely adapted to do server-side development in a jamming state of mind.

My own involvement with PHP goes back to the late 90’s. I had a friend Ben who I met during my first year at “uni” (as the Brits say) who had decided to postpone his education to join a startup. It was the 90’s, this happened a lot. Anyway, he got into web development, with PHP being his tool of choice. Back then that was actually considered the state of the art, as the alternative was usually CGI scripts in Perl. One day he sat me down at his computer (a desktop with a single massive 19″ CRT, of course) to show me how he did things. There was the HTML, which you could edit and then immediately see in the browser, but then there were these blocks of weird scripts that needed a server side interpreter to render correctly. He saved the file in Dreamweaver (of course) and then FTPed it up to the server, and voila! everything worked.

I didn’t quite appreciate it at the time, but it was all sort of magical how simple it was, going from just playing in HTML to dropping in these little bits of scripts inline and having things happen. Not necessarily small things either. You could hit a database and get some data, or fetch a fragment of some other page and just kinda jam it in there. All by editing text files and uploading them to a server somewhere.

I brought this up with the Brain Trust in our call this morning, and it turns out that nostalgia for this time isn’t universal. “Sure it was simple, but it was so unmaintainable!” and “All of our crazy build systems are needed to support our complex modern applications.” Fair points all. But I guess you never forget your first, the one that turned you on to the possibilities the world contained before you got old and jaded.

Sigh.

Anyhoo, the Hacker News discussion of this article focused less on this aspect and more on a comment made about Lambdas (quoted in its entirety):

AWS Lambda: CGI But It’s Trendy. Recently we’ve seen the rise in popularity of AWS Lambda, a “functions as a service” provider. From my perspective this is literally a reinvention of CGI, except a) much more complicated for essentially the same functionality, b) with vendor lock-in, c) with a much more complex and bespoke deployment process which requires the use of special tools.

What captured people’s imaginations about AWS Lambda is that it lets you a) give any piece of code an URL, and b) that code doesn’t consume resources when it’s not being used. Yet these are also exactly the attributes possessed by PHP or CGI scripts. In fact, it’s far easier for me to write a PHP script and rsync it to a web server of mine than for me to figure out the extensive and complex tooling for creating, maintaining and deploying AWS Lambda functions — and it comes without the lock-in to boot. Moreover, the former allows me to give an URL to a piece of code instantly, whereas with the latter I have to figure out how to setup AWS API Gateway plumbing correctly. I’m genuinely curious how many people find AWS Lambda interesting because they’ve never encountered, or never properly looked at, CGI.

Of course, it’s fair to say if Lambda does offer anything that PHP/CGI doesn’t it is high availability. That’s something you can’t say about a solitary PHP server. But it’s not as though it’s infeasible to create a highly available PHP cluster either — and I can still deploy a script and give it an URL in 30 seconds with nought but rsync. Moreover, the average server is reliable enough that a lack of HA just doesn’t seem to matter in many circumstances. This is especially true when one is talking about niche dynamic functionality which is only occasionally used. These are often the same circumstances where Lambda might be used — things called rarely enough it’s not worth keeping a daemon spun up. The PHP websites I’ve published have never been highly available and I’ve never lost sleep over that; the ability to just give a piece of code an URL in 30 seconds, without complex deployment tooling, proprietary APIs or vendor lock-in seems to me a lot more valuable for the things I do.

For the uninitiated, Lambdas are Amazon’s version of the “serverless” paradigm that has become very popular in recent years. The idea here is that you can have bits of functionality (scripts essentially) that can just live in the cloud and wait for you to invoke them, often via an HTTP(S) request or API call from some other process somewhere. You don’t need to set up a server or pay for it hourly or monthly; you just register your script with Amazon (or some other provider) and it runs when called.

This idea of “serverless” has really captured the imagination of many in the developer community. Being (almost) entirely on demand, it scratches that “infinite scalability” itch many of us seem to have, where we’ll deal with any amount of complexity if it guarantees we don’t have to think about how to scale things. It also appeals to people who have an affinity for simplicity. It allows you to break down a complex problem into discreet functions, and then deploy these one at a time. Sure, you have to figure out how to maintain state (when that’s needed) but it lets you conceptualize the problem in a different way. Some particular problems really lend themselves well to this process.

For one startup, I had a need to produce ZIP files on demand, some of which could be quite large. As the files were already in S3, it was easy to just shove this process out into a lambda where it could download things into temporary storage, bundle them up, push the resulting ZIP back to S3, and let me know when its done so I could let the user know.

I’ve also seen cases where Lambdas have been abused, or made to do things that aren’t a natural fit, typically in the name of the aforementioned scalability. There are definite costs (financial and otherwise) associated with with using serverless technologies that need to be considered:

  1. They add significant application complexity. As Hugo said, serverless is simple conceptually but complex in practice. Creating the lambda involves either uploading a zip with the function or using the API to update it from S3, then setting some parameters. If you want it to be accessible via HTTP(S) you’ll need to set up an API gateway and get that wired up. Debugging can be challenging as you’ll need to throw in logging to see where things are failing and then pore over logs to figure out what’s happening. (once you are able to identify the actual request that failed, not always a simple task) These things often fail for random reasons like running out of memory or timeouts, and their asynchronous nature often means you aren’t alerted for a while compounding the log search dilemma.
  2. They can be redundant: Most web applications need a server somewhere. This has become more nuanced recently as the trend has been to move to Single Page Applications or page generators that pump out static HTML sites that can be dropped into an S3 bucket or some service with no dynamic capabilities, but I think generally speaking most web applications (as opposed to sites) are deployed to servers or containers. If you already have a well-engineered app with predictable scalability characteristics deployed behind a load balancer with the ability to launch new containers as needed, lambdas don’t necessarily bring anything new to the table. Hugo calling Lambdas “CGI but trendy” got some heat, but in a sense he was right. There are plenty of software platforms that don’t use resources until they are invoked, and unless your needs fall into a very specific band (discussed below) you may be better served through other means using the hardware you’re already using.
  3. They invite vendor lock-in. As with all hot technologies, many different cloud vendors were quick to push out their own version of serverless/worker type tools. This left developers who happened to live in their ecosystems to figure out the intricacies of how they worked, and how they didn’t. The reward for this work is a new dependency that you can’t easily move somewhere else.
  4. They break useful abstractions. The web is stateless, kinda. Over the last 25 years we’ve come up with numerous ways to work around this, to make our lives easier by letting users “log in,” for instance, and have certain bits of context follow them around to make our applications more responsive. With serverless we lose a lot of that, as each request coming in acts in isolation. Sure you can pass in a session key or JWT and do stuff with that, but then you aren’t really enjoying the scalability benefits since you’re still using the database to populate your context. It also feels a whole lot more kludgy than your elegant middleware layer that the rest of your app uses to decorate the User object as the website customer clicks around.

Keeping these in mind, there are also many cases where Lambdas make sense:

  1. The application is used in overwhelming bursts. Think of something like a flash sale site for something really in demand, or Ticketmaster the moment TSwift tickets go on sale. Like really sudden spikes in traffic. In these cases, you’ll likely need to build your whole architecture around these spikes, and avoid things like traditional cookie-based sessions for logins in favor of JWT or similar. That said, I think people underestimate how far they can go with a properly load-balanced server setup.
  2. The application is hardly used at all. On the other end of the spectrum, you may have some bit of functionality that is very important, but doesn’t make sense to implement on a long-lived server as it’s used very infrequently. A serverless function could be ideal since you don’t have to pay for it until it’s used, and it’s often very cheap for each run.
  3. The application needs to perform work non-interactively: Most applications have an interactive/real time portion (for example, the website itself) as well as an offline/worker portion that takes care of a lot of the plumbing. For example, you may need to rebuild a search index periodically, or update user records in a batch format, or sync data with another application where the only API available involves polling. These are great cases for lambdas.

Like any technology, Lambdas/serverless are a useful tool when used correctly. I think in this case Hugo’s analysis was pretty much spot on, with the caveat that his examples tend to be more for personal (presumably small scale) services rather than the 1M-requests-per-second-type services modern engineers picture themselves building. Maybe the problem is that we’ve inverted things a bit, and instead of hacking a solution together and fixing bottlenecks in the application when we find them, we now assume everything will need to scale to Facebook levels on Day 1 and then choose technologies accordingly? It’s hard to say.

So what do you think? Did I miss anything? What’s your favorite thing about serverless?