Discovering Pulumi - Deepstash

Discovering Pulumi

Once we realized Serverless Components wasn’t an ideal solution for Webiny, we started looking for an alternative.

The key features that we were looking for were the following:

  • open-source
  • use code (preferably TypeScript) to create cloud infrastructure resources
  • cloud infrastructure code should be flexible, meaning, users should be able to make adjustments to it
  • support for multiple cloud providers is a must
  • no vendor lock-in



Once we understood how to use Pulumi concepts with Webiny’s project organization, the next step was integrating the Pulumi CLI with the Webiny CLI .

For starters, we didn’t want our users to install the Pulumi CLI manually. We wanted it to happen automatically.

We’ve created our version of the Pulumi SDK , which lets us use the Pulumi CLI more programmatically. all of the necessary Pulumi CLI binaries and plugins are downloaded and stored inside the project’s node_modules folder.


Another useful feature is the automatic tagging of the deployed cloud infrastructure resources. In other words, every taggable cloud infrastructure resource will be tagged with WbyProjectName and WbyEnvironment tags. For developers, this makes it much easier to see all of the deployed resources within their Webiny project.

We created a tagResources function, which essentially registers a global stack transformation via pulumi.runtime.registerStackTransformation function to achieve this.


At Webiny, we believe cloud engineering and infrastructure-as-code is the future, so, naturally, we wanted our users to use familiar code (ideally TypeScript) and development tools to define their cloud infrastructure. Within Serverless Components, the components are configured via YAML files, of which we were never really big fans, simply because of the fact that writing code instead of configurations gives more flexibility to developers.


Integrating Pulumi’s Programming Model With Webiny

In terms of project organization , every Webiny project consists of two key concepts: packages and project applications (or just applications).

As an example, a default Webiny project includes three project applications:

  • API - essentially, represents your GraphQL HTTP API
  • Admin Area - the Admin Area (React) application
  • Website - the public website, a (React) application with static site generation (SSG) in the cloud


Building a serverless framework certainly has its challenges. One of which is the deployment of cloud infrastructure, which, in the world of serverless, is one of the fundamental operations developers need to do, even while the application is still in development.

Before the version 5 release , Webiny relied on an infrastructure provisioning technology called Serverless Components (not to be confused with Serverless Framework ).


Finally, to protect our users from accidental deletions of mission-critical cloud infrastructure resources, we’ve used Pulumi’s protect

The protect option marks a resource as protected. A protected resource cannot be deleted directly. Instead, you must first set protect: false and run pulumi up. Then you can delete the resource by removing the line of code or by running pulumi destroy.

The default is to inherit this value from the parent resource and false for resources without a parent.

this feature is automatically enabled for resources like DynamoDB tables , Cognito User Pools and similar.


when it comes to active development, we don’t have to unleash the full potential of the cloud, like we’re doing in staging and production environments.

For example, it probably makes no sense to deploy an Amazon ElasticSearch cluster into multiple availability zones (AZs) in most cases. A single AZ is enough for development purposes.

Another good example are VPCs and potentially NAT Gateways.

What is even more interesting is the fact that this can be achieved with a simple if statement, which we placed in the index.ts.


As mentioned, by default, every Webiny project comes with three project applications: API, Admin Area, and Website, which are located in the api , apps/admin , and apps/website folders, respectively:

As we can see, every project application follows the same general organization. The two folders in each project are:

  • code - contains application code (one or more packages)
  • pulumi - contains cloud infrastructure (Pulumi) code


For us, the integration of Pulumi with Webiny consisted of three steps:

  1. figure out how to integrate Pulumi’s programming model with Webiny
  2. integrate Pulumi CLI into Webiny CLI
  3. figure out the optimal way of handling cloud infrastructure state files



Vendor lock-in was introduced at a later stage of the Serverless Components product development. Essentially, to deploy a component, the user is now forced to use a proprietary service that comes with it. And while, among other things, the service enables much faster deployments, from our perspective, we saw this as an additional point of friction for our users. Ideally, users should be able to set up Webiny with only an AWS account.


While the idea around components for different use-cases certainly sounded interesting, ultimately, it was not ideal for Webiny. We frequently received questions regarding further component configuration and customization, which was not easy to perform.

“How do I configure a different database?”, “How do I set a VPC?”, “How do I set up a specific configuration parameter for my S3 bucket?” were just some of the questions we received.


The last piece of the puzzle was storing cloud infrastructure state files. Here we went with the following approach.

For local development, users’ cloud infrastructure state files are stored locally within their Webiny project using the Local Filesystem Backend , which we’ve seen worked great for developers.

On the other hand, for ephemeral environments spawned in CI/CD or long-lived environments like staging or production, through our documentation , we advise our users to use centralized and remote storage by using backends like Amazon S3 and even Pulumi Service (


Deepstash helps you become inspired, wiser and productive, through bite-sized ideas from the best articles, books and videos out there.



A Jamstack application consists of a static UI (in HTML and JavaScript) and a set of serverless functions to support dynamic UI elements via JavaScript. There are many benefits to the Jamstack approach. But perhaps one of the most significant benefits is performance. Since the UI is no longer generated at runtime from a central server, there is much less load on the server and we can now deploy the UI via edge networks such as CDNs.


Cloud computing is on-demand access, via the internet, to computing resources—applications, servers (physical servers and virtual servers), data storage, development tools, networking capabilities, and more—hosted at a remote data center managed by a cloud services provider (or CSP). The CSP makes these resources available for a monthly subscription fee or bills them according to usage.

Cloud computing has the following benefits:

  1. Lower IT costs
  2. Improved agility and Time-To-Value
  3. Better scaling.



Big data Hadoop
  • Ability to store and process huge amounts of any kind of data, quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT) , that's a key consideration.
  • Computing power. Hadoop's distributed computing model processes big data fast. The more computing nodes you use, the more processing power you have.
  • Fault tolerance. Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored automatically.
  • Flexibility. Unlike traditional relational databases, you don’t have to preprocess data before storing it. You can store as much data as you want and decide how to use it later. That includes unstructured data like text, images and videos.
  • Low cost. The open-source framework is free and uses commodity hardware to store large quantities of data.
  • Scalability. You can easily grow your system to handle more data simply by adding nodes. Little administration is required.

MapReduce programming is not a good match for all problems. It’s good for simple information requests and problems that can be divided into independent units, but it's not efficient for iterative and interactive analytic tasks. MapReduce is file-intensive. Because the nodes don’t intercommunicate except through sorts and shuffles, iterative algorithms require multiple map-shuffle/sort-reduce phases to complete. This creates multiple files between MapReduce phases and is inefficient for advanced analytic computing.

There’s a widely acknowledged talent gap. It can be difficult to find entry-level programmers who have sufficient Java skills to be productive with MapReduce. That's one reason distribution providers are racing to put relational (SQL) technology on top of Hadoop. It is much easier to find programmers with SQL skills than MapReduce skills. And, Hadoop administration seems part art and part science, requiring low-level knowledge of operating systems, hardware and Hadoop kernel settings.