The Linux kernel is the heart of many modern production systems. It decides when any code is allowed to run and which programs/users can access which resources. It manages memory, mediates access to hardware, and does a bulk of work under the hood on behalf of programs running on top. Since the kernel is always involved in any code execution, it is in the best position to protect the system from malicious programs, enforce the desired system security policy, and provide security features for safer production environments.
In this post, we will review some Linux kernel security configurations we use at Cloudflare and how they help to block or minimize a potential system compromise.
When a machine (either a laptop or a server) boots, it goes through several boot stages:
Within a secure boot architecture each stage from the above diagram verifies the integrity of the next stage before passing execution to it, thus forming a so-called secure boot chain. This way “trustworthiness” is extended to every component in the boot chain, because if we verified the code integrity of a particular stage, we can trust this code to verify the integrity of the next stage.
We have previously covered how Cloudflare implements secure boot in the initial stages of the boot process. In this post, we will focus on the Linux kernel.
Secure boot is the cornerstone of any operating system security mechanism. The Linux kernel is the primary enforcer of the operating system security configuration and policy, so we have to be sure that the Linux kernel itself has not been tampered with. In our previous post about secure boot we showed how we use UEFI Secure Boot to ensure the integrity of the Linux kernel.
But what happens next? After the kernel gets executed, it may try to load additional drivers, or as they are called in the Linux world, kernel modules. And kernel module loading is not confined just to the boot process. A module can be loaded at any time during runtime — a new device being plugged in and a driver is needed, some additional extensions in the networking stack are required (for example, for fine-grained firewall rules), or just manually by the system administrator.
However, uncontrolled kernel module loading might pose a significant risk to system integrity. Unlike regular programs, which get executed as user space processes, kernel modules are pieces of code which get injected and executed directly in the Linux kernel address space. There is no separation between the code and data in different kernel modules and core kernel subsystems, so everything can access everything. This means that a rogue kernel module can completely nullify the trustworthiness of the operating system and make secure boot useless. As an example, consider a simple Debian 12 (Bookworm installation), but with SELinux configured and enforced:
ignat@dev:~$ lsb_release --all
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 12 (bookworm)
Release: 12
Codename: bookworm
ignat@dev:~$ uname -a
Linux dev 6.1.0-18-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
ignat@dev:~$ sudo getenforce
Enforcing
Now we need to do some research. First, we see that we’re running 6.1.76 Linux Kernel. If we explore the source code, we would see that inside the kernel, the SELinux configuration is stored in a singleton structure, which is defined as follows:
struct selinux_state {
#ifdef CONFIG_SECURITY_SELINUX_DISABLE
bool disabled;
#endif
#ifdef CONFIG_SECURITY_SELINUX_DEVELOP
bool enforcing;
#endif
bool checkreqprot;
bool initialized;
bool policycap[__POLICYDB_CAP_MAX];
struct page *status_page;
struct mutex status_lock;
struct selinux_avc *avc;
struct selinux_policy __rcu *policy;
struct mutex policy_mutex;
} __randomize_layout;
From the above, we can see that if the kernel configuration has CONFIG_SECURITY_SELINUX_DEVELOP
enabled, the structure would have a boolean variable enforcing
, which controls the enforcement status of SELinux at runtime. This is exactly what the above $ sudo getenforce
command returns. We can double check that the Debian kernel indeed has the configuration option enabled:
ignat@dev:~$ grep CONFIG_SECURITY_SELINUX_DEVELOP /boot/config-`uname -r`
CONFIG_SECURITY_SELINUX_DEVELOP=y
Good! Now that we have a variable in the kernel, which is responsible for some security enforcement, we can try to attack it. One problem though is the __randomize_layout
attribute: since CONFIG_SECURITY_SELINUX_DISABLE
is actually not set for our Debian kernel, normally enforcing
would be the first member of the struct. Thus if we know where the struct is, we immediately know the position of the enforcing
flag. With __randomize_layout
, during kernel compilation the compiler might place members at arbitrary positions within the struct, so it is harder to create generic exploits. But arbitrary struct randomization within the kernel may introduce performance impact, so is often disabled and it is disabled for the Debian kernel:
ignat@dev:~$ grep RANDSTRUCT /boot/config-`uname -r`
CONFIG_RANDSTRUCT_NONE=y
We can also confirm the compiled position of the enforcing
flag using the pahole tool and either kernel debug symbols, if available, or (on modern kernels, if enabled) in-kernel BTF information. We will use the latter:
ignat@dev:~$ pahole -C selinux_state /sys/kernel/btf/vmlinux
struct selinux_state {
bool enforcing; /* 0 1 */
bool checkreqprot; /* 1 1 */
bool initialized; /* 2 1 */
bool policycap[8]; /* 3 8 */
/* XXX 5 bytes hole, try to pack */
struct page * status_page; /* 16 8 */
struct mutex status_lock; /* 24 32 */
struct selinux_avc * avc; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct selinux_policy * policy; /* 64 8 */
struct mutex policy_mutex; /* 72 32 */
/* size: 104, cachelines: 2, members: 9 */
/* sum members: 99, holes: 1, sum holes: 5 */
/* last cacheline: 40 bytes */
};
So enforcing
is indeed located at the start of the structure and we don’t even have to be a privileged user to confirm this.
Great! All we need is the runtime address of the selinux_state
variable inside the kernel:
(shell/bash)
ignat@dev:~$ sudo grep selinux_state /proc/kallsyms
ffffffffbc3bcae0 B selinux_state
With all the information, we can write an almost textbook simple kernel module to manipulate the SELinux state:
Mymod.c:
#include <linux/module.h>
static int __init mod_init(void)
{
bool *selinux_enforce = (bool *)0xffffffffbc3bcae0;
*selinux_enforce = false;
return 0;
}
static void mod_fini(void)
{
}
module_init(mod_init);
module_exit(mod_fini);
MODULE_DESCRIPTION("A somewhat malicious module");
MODULE_AUTHOR("Ignat Korchagin <ignat@cloudflare.com>");
MODULE_LICENSE("GPL");
And the respective Kbuild
file:
obj-m := mymod.o
With these two files we can build a full fledged kernel module according to the official kernel docs:
ignat@dev:~$ cd mymod/
ignat@dev:~/mymod$ ls
Kbuild mymod.c
ignat@dev:~/mymod$ make -C /lib/modules/`uname -r`/build M=$PWD
make: Entering directory '/usr/src/linux-headers-6.1.0-18-cloud-amd64'
CC [M] /home/ignat/mymod/mymod.o
MODPOST /home/ignat/mymod/Module.symvers
CC [M] /home/ignat/mymod/mymod.mod.o
LD [M] /home/ignat/mymod/mymod.ko
BTF [M] /home/ignat/mymod/mymod.ko
Skipping BTF generation for /home/ignat/mymod/mymod.ko due to unavailability of vmlinux
make: Leaving directory '/usr/src/linux-headers-6.1.0-18-cloud-amd64'
If we try to load this module now, the system may not allow it due to the SELinux policy:
ignat@dev:~/mymod$ sudo insmod mymod.ko
insmod: ERROR: could not load module mymod.ko: Permission denied
We can workaround it by copying the module into the standard module path somewhere:
ignat@dev:~/mymod$ sudo cp mymod.ko /lib/modules/`uname -r`/kernel/crypto/
Now let’s try it out:
ignat@dev:~/mymod$ sudo getenforce
Enforcing
ignat@dev:~/mymod$ sudo insmod /lib/modules/`uname -r`/kernel/crypto/mymod.ko
ignat@dev:~/mymod$ sudo getenforce
Permissive
Not only did we disable the SELinux protection via a malicious kernel module, we did it quietly. Normal sudo setenforce 0
, even if allowed, would go through the official selinuxfs interface and would emit an audit message. Our code manipulated the kernel memory directly, so no one was alerted. This illustrates why uncontrolled kernel module loading is very dangerous and that is why most security standards and commercial security monitoring products advocate for close monitoring of kernel module loading.
But we don’t need to monitor kernel modules at Cloudflare. Let’s repeat the exercise on a Cloudflare production kernel (module recompilation skipped for brevity):
ignat@dev:~/mymod$ uname -a
Linux dev 6.6.17-cloudflare-2024.2.9 #1 SMP PREEMPT_DYNAMIC Mon Sep 27 00:00:00 UTC 2010 x86_64 GNU/Linux
ignat@dev:~/mymod$ sudo insmod /lib/modules/`uname -r`/kernel/crypto/mymod.ko
insmod: ERROR: could not insert module /lib/modules/6.6.17-cloudflare-2024.2.9/kernel/crypto/mymod.ko: Key was rejected by service
We get a Key was rejected by service
error when trying to load a module, and the kernel log will have the following message:
ignat@dev:~/mymod$ sudo dmesg | tail -n 1
[41515.037031] Loading of unsigned module is rejected
This is because the Cloudflare kernel requires all the kernel modules to have a valid signature, so we don’t even have to worry about a malicious module being loaded at some point:
ignat@dev:~$ grep MODULE_SIG_FORCE /boot/config-`uname -r`
CONFIG_MODULE_SIG_FORCE=y
For completeness it is worth noting that the Debian stock kernel also supports module signatures, but does not enforce it:
ignat@dev:~$ grep MODULE_SIG /boot/config-6.1.0-18-cloud-amd64
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
…
The above configuration means that the kernel will validate a module signature, if available. But if not - the module will be loaded anyway with a warning message emitted and the kernel will be tainted.
Signed kernel modules are great, but it creates a key management problem: to sign a module we need a signing keypair that is trusted by the kernel. The public key of the keypair is usually directly embedded into the kernel binary, so the kernel can easily use it to verify module signatures. The private key of the pair needs to be protected and secure, because if it is leaked, anyone could compile and sign a potentially malicious kernel module which would be accepted by our kernel.
But what is the best way to eliminate the risk of losing something? Not to have it in the first place! Luckily the kernel build system will generate a random keypair for module signing, if none is provided. At Cloudflare, we use that feature to sign all the kernel modules during the kernel compilation stage. When the compilation and signing is done though, instead of storing the key in a secure place, we just destroy the private key:
So with the above process:
With this scheme not only do we not have to worry about module signing key management, we also use a different key for each kernel we release to production. So even if a particular build process is hijacked and the signing key is not destroyed and potentially leaked, the key will no longer be valid when a kernel update is released.
There are some flexibility downsides though, as we can’t “retrofit” a new kernel module for an already released kernel (for example, for a new piece of hardware we are adopting). However, it is not a practical limitation for us as we release kernels often (roughly every week) to keep up with a steady stream of bug fixes and vulnerability patches in the Linux Kernel.
KEXEC (or kexec_load()
) is an interesting system call in Linux, which allows for one kernel to directly execute (or jump to) another kernel. The idea behind this is to switch/update/downgrade kernels faster without going through a full reboot cycle to minimize the potential system downtime. However, it was developed quite a while ago, when secure boot and system integrity was not quite a concern. Therefore its original design has security flaws and is known to be able to bypass secure boot and potentially compromise system integrity.
We can see the problems just based on the definition of the system call itself:
struct kexec_segment {
const void *buf;
size_t bufsz;
const void *mem;
size_t memsz;
};
...
long kexec_load(unsigned long entry, unsigned long nr_segments, struct kexec_segment *segments, unsigned long flags);
So the kernel expects just a collection of buffers with code to execute. Back in those days there was not much desire to do a lot of data parsing inside the kernel, so the idea was to parse the to-be-executed kernel image in user space and provide the kernel with only the data it needs. Also, to switch kernels live, we need an intermediate program which would take over while the old kernel is shutting down and the new kernel has not yet been executed. In the kexec world this program is called purgatory. Thus the problem is evident: we give the kernel a bunch of code and it will happily execute it at the highest privilege level. But instead of the original kernel or purgatory code, we can easily provide code similar to the one demonstrated earlier in this post, which disables SELinux (or does something else to the kernel).
At Cloudflare we have had kexec_load()
disabled for some time now just because of this. The advantage of faster reboots with kexec comes with a (small) risk of improperly initialized hardware, so it was not worth using it even without the security concerns. However, kexec does provide one useful feature — it is the foundation of the Linux kernel crashdumping solution. In a nutshell, if a kernel crashes in production (due to a bug or some other error), a backup kernel (previously loaded with kexec) can take over, collect and save the memory dump for further investigation. This allows to more effectively investigate kernel and other issues in production, so it is a powerful tool to have.
Luckily, since the original problems with kexec were outlined, Linux developed an alternative secure interface for kexec: instead of buffers with code it expects file descriptors with the to-be-executed kernel image and initrd and does parsing inside the kernel. Thus, only a valid kernel image can be supplied. On top of this, we can configure and require kexec to ensure the provided images are properly signed, so only authorized code can be executed in the kexec scenario. A secure configuration for kexec looks something like this:
ignat@dev:~$ grep KEXEC /boot/config-`uname -r`
CONFIG_KEXEC_CORE=y
CONFIG_HAVE_IMA_KEXEC=y
# CONFIG_KEXEC is not set
CONFIG_KEXEC_FILE=y
CONFIG_KEXEC_SIG=y
CONFIG_KEXEC_SIG_FORCE=y
CONFIG_KEXEC_BZIMAGE_VERIFY_SIG=y
…
Above we ensure that the legacy kexec_load()
system call is disabled by disabling CONFIG_KEXEC
, but still can configure Linux Kernel crashdumping via the new kexec_file_load()
system call via CONFIG_KEXEC_FILE=y
with enforced signature checks (CONFIG_KEXEC_SIG=y
and CONFIG_KEXEC_SIG_FORCE=y
).
Note that stock Debian kernel has the legacy kexec_load()
system call enabled and does not enforce signature checks for kexec_file_load()
(similar to module signature checks):
ignat@dev:~$ grep KEXEC /boot/config-6.1.0-18-cloud-amd64
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
CONFIG_KEXEC_SIG=y
# CONFIG_KEXEC_SIG_FORCE is not set
CONFIG_KEXEC_BZIMAGE_VERIFY_SIG=y
…
Even on the stock Debian kernel if you try to repeat the exercise we described in the “Secure boot” section of this post after a system reboot, you will likely see it would fail to disable SELinux now. This is because we hardcoded the kernel address of the selinux_state
structure in our malicious kernel module, but the address changed now:
ignat@dev:~$ sudo grep selinux_state /proc/kallsyms
ffffffffb41bcae0 B selinux_state
Kernel Address Space Layout Randomization (or KASLR) is a simple concept: it slightly and randomly shifts the kernel code and data on each boot:
This is to combat targeted exploitation (like the malicious module in this post) based on the knowledge of the location of internal kernel structures and code. It is especially useful for popular Linux distribution kernels, like the Debian one, because most users use the same binary and anyone can download the debug symbols and the System.map file with all the addresses of the kernel internals. Just to note: it will not prevent the module loading and doing harm, but it will likely not achieve the targeted effect of disabling SELinux. Instead, it will modify a random piece of kernel memory potentially causing the kernel to crash.
Both the Cloudflare kernel and the Debian one have this feature enabled:
ignat@dev:~$ grep RANDOMIZE_BASE /boot/config-`uname -r`
CONFIG_RANDOMIZE_BASE=y
While KASLR helps with targeted exploits, it is quite easy to bypass since everything is shifted by a single random offset as shown on the diagram above. Thus if the attacker knows at least one runtime kernel address, they can recover this offset by subtracting the runtime address from the compile time address of the same symbol (function or data structure) from the kernel’s System.map file. Once they know the offset, they can recover the addresses of all other symbols by adjusting them by this offset.
Therefore, modern kernels take precautions not to leak kernel addresses at least to unprivileged users. One of the main tunables for this is the kptr_restrict sysctl. It is a good idea to set it at least to 1
to not allow regular users to see kernel pointers:
(shell/bash)
ignat@dev:~$ sudo sysctl -w kernel.kptr_restrict=1
kernel.kptr_restrict = 1
ignat@dev:~$ grep selinux_state /proc/kallsyms
0000000000000000 B selinux_state
Privileged users can still see the pointers:
ignat@dev:~$ sudo grep selinux_state /proc/kallsyms
ffffffffb41bcae0 B selinux_state
Similar to kptr_restrict sysctl there is also dmesg_restrict, which if set, would prevent regular users from reading the kernel log (which may also leak kernel pointers via its messages). While you need to explicitly set kptr_restrict sysctl to a non-zero value on each boot (or use some system sysctl configuration utility, like this one), you can configure dmesg_restrict initial value via the CONFIG_SECURITY_DMESG_RESTRICT
kernel configuration option. Both the Cloudflare kernel and the Debian one enforce dmesg_restrict this way:
ignat@dev:~$ grep CONFIG_SECURITY_DMESG_RESTRICT /boot/config-`uname -r`
CONFIG_SECURITY_DMESG_RESTRICT=y
Worth noting that /proc/kallsyms
and the kernel log are not the only sources of potential kernel pointer leaks. There is a lot of legacy in the Linux kernel and [new sources are continuously being found and patched]. That’s why it is very important to stay up to date with the latest kernel bugfix releases.
Linux Security Modules (LSM) is a hook-based framework for implementing security policies and Mandatory Access Control in the Linux Kernel. We have [covered our usage of another LSM module, BPF-LSM, previously].
BPF-LSM is a useful foundational piece for our kernel security, but in this post we want to mention another useful LSM module we use — the Lockdown LSM. Lockdown can be in three states (controlled by the /sys/kernel/security/lockdown
special file):
ignat@dev:~$ cat /sys/kernel/security/lockdown
[none] integrity confidentiality
none
is the state where nothing is enforced and the module is effectively disabled. When Lockdown is in the integrity
state, the kernel tries to prevent any operation, which may compromise its integrity. We already covered some examples of these in this post: loading unsigned modules and executing unsigned code via KEXEC. But there are other potential ways (which are mentioned in the LSM’s man page), all of which this LSM tries to block. confidentiality
is the most restrictive mode, where Lockdown will also try to prevent any information leakage from the kernel. In practice this may be too restrictive for server workloads as it blocks all runtime debugging capabilities, like perf
or eBPF.
Let’s see the Lockdown LSM in action. On a barebones Debian system the initial state is none
meaning nothing is locked down:
ignat@dev:~$ uname -a
Linux dev 6.1.0-18-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
ignat@dev:~$ cat /sys/kernel/security/lockdown
[none] integrity confidentiality
We can switch the system into the integrity
mode:
ignat@dev:~$ echo integrity | sudo tee /sys/kernel/security/lockdown
integrity
ignat@dev:~$ cat /sys/kernel/security/lockdown
none [integrity] confidentiality
It is worth noting that we can only put the system into a more restrictive state, but not back. That is, once in integrity
mode we can only switch to confidentiality
mode, but not back to none
:
ignat@dev:~$ echo none | sudo tee /sys/kernel/security/lockdown
none
tee: /sys/kernel/security/lockdown: Operation not permitted
Now we can see that even on a stock Debian kernel, which as we discovered above, does not enforce module signatures by default, we cannot load a potentially malicious unsigned kernel module anymore:
ignat@dev:~$ sudo insmod mymod/mymod.ko
insmod: ERROR: could not insert module mymod/mymod.ko: Operation not permitted
And the kernel log will helpfully point out that this is due to Lockdown LSM:
ignat@dev:~$ sudo dmesg | tail -n 1
[21728.820129] Lockdown: insmod: unsigned module loading is restricted; see man kernel_lockdown.7
As we can see, Lockdown LSM helps to tighten the security of a kernel, which otherwise may not have other enforcing bits enabled, like the stock Debian one.
If you compile your own kernel, you can go one step further and set the initial state of the Lockdown LSM to be more restrictive than none from the start. This is exactly what we did for the Cloudflare production kernel:
ignat@dev:~$ grep LOCK_DOWN /boot/config-6.6.17-cloudflare-2024.2.9
# CONFIG_LOCK_DOWN_KERNEL_FORCE_NONE is not set
CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY=y
# CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY is not set
In this post we reviewed some useful Linux kernel security configuration options we use at Cloudflare. This is only a small subset, and there are many more available and even more are being constantly developed, reviewed, and improved by the Linux kernel community. We hope that this post will shed some light on these security features and that, if you haven’t already, you may consider enabling them in your Linux systems.
Tune in for more news, announcements and thought-provoking discussions! Don't miss the full Security Week hub page.
]]>JUST OVER A DECADE AGO, Bitcoin appeared to many of its adherents to be the crypto-anarchist holy grail: truly private digital cash for the Internet.
Satoshi Nakamoto, the cryptocurrency’s mysterious and unidentifiable inventor, had stated in an email introducing Bitcoin that “participants can be anonymous.” And the Silk Road dark-web drug market seemed like living proof of that potential, enabling the sale of hundreds of millions of dollars in illegal drugs and other contraband for bitcoin while flaunting its impunity from law enforcement.
This is the story of the revelation in late 2013 that Bitcoin was, in fact, the opposite of untraceable—that its blockchain would actually allow researchers, tech companies, and law enforcement to trace and identify users with even more transparency than the existing financial system. That discovery would upend the world of cybercrime. Bitcoin tracing would, over the next few years, solve the mystery of the theft of a half-billion dollar stash of bitcoins from the world’s first crypto exchange, help enable the biggest dark-web drug market takedown in history, lead to the arrest of hundreds of pedophiles around the world in the bust of the dark web’s largest child sexual abuse video site, and result in the first-, second-, and third-biggest law enforcement monetary seizures in the history of the US Justice Department.
]]>Before SaaS Quick Launch, configuring and launching third-party SaaS products could be time-consuming and costly, especially in certain categories like security and monitoring. Some products require hours of engineering time to manually set up permissions policies and cloud infrastructure. Manual multistep configuration processes also introduce risks when buyers rely on unvetted deployment templates and instructions from third-party resources.
SaaS Quick Launch helps buyers make the deployment process easy, fast, and secure by offering step-by-step instructions and resource deployment using preconfigured AWS CloudFormation templates. The software vendor and AWS validate these templates to ensure that the configuration adheres to the latest AWS security standards.
Getting started with SaaS Quick Launch
It’s easy to find which SaaS products have Quick Launch enabled when you are browsing in AWS Marketplace. Products that have this feature configured have a Quick Launch tag in their description.
After completing the purchase process for a Quick Launch–enabled product, you will see a button to set up your account. That button will take you to the Configure and launch page, where you can complete the registration to set up your SaaS account, deploy any required AWS resources, and launch the SaaS product.
The first step ensures that your account has the required AWS permissions to configure the software.
The second step involves configuring the vendor account, either to sign in to an existing account or to create a new account on the vendor website. After signing in, the vendor site may pass essential keys and parameters that are needed in the next step to configure the integration.
The third step allows you to configure the software and AWS integration. In this step, the vendor provides one or more CloudFormation templates that provision the required AWS resources to configure and use the product.
The final step is to launch the software once everything is configured.
Availability
Sellers can enable this feature in their SaaS product. If you are a seller and want to learn how to set this up in your product, check the Seller Guide for detailed instructions.
To learn more about SaaS in AWS Marketplace, visit the service page and view all the available SaaS products currently in AWS Marketplace.
— Marcia
]]>Background
Late last year we announced the AWS Digital Sovereignty Pledge and made a commitment to offer you (and all AWS customers) the most advanced set of sovereignty controls and features available in the cloud. Since that announcement we have taken several important steps forward in fulfillment of that pledge:
May 2023 – We announced that AWS Nitro System had been validated by an independent third-party to confirm that it contains no mechanism that allows anyone at AWS to access your data on AWS hosts. At the same time we announced that the AWS Key Management Service (KMS) External Key Store allows you to store keys outside of AWS and use them to encrypt data stored in AWS.
August 2023 – We announced AWS Dedicated Local Zones, infrastructure that is fully managed by AWS and built for exclusive use by a customer or community, and placed in a customer-specified location or data center.
AWS European Sovereign Cloud
The upcoming AWS European Sovereign Cloud will be separate from, and independent of, the eight existing AWS Regions already open in Frankfurt, Ireland, London, Milan, Paris, Stockholm, Spain, and Zurich. It will give you additional options for deployment, while providing AWS services, APIs, and tools that you are already familiar with. The design will help you meet your data residency, operational autonomy, and resiliency needs.
In order to maintain separation between this cloud and the existing AWS Global Cloud you will need to create a fresh AWS account. The metadata you create such as data labels, categories, permissions, and configurations will be stored within the EU. This does not apply to AWS account information such as spend and billing data, which will be aggregated and used to ensure that you get favorable pricing within any applicable volume usage tiers.
As I mentioned earlier, this cloud will be operated and supported by AWS employees located in and residents of the EU, with support available 24/7/365.
The AWS European Sovereign Cloud will be operationally independent of the other regions, with separate in-Region billing and usage metering systems.
Initial Region
The initial region will be located in Germany. It will launch with multiple Availability Zones, each in separate and distinct geographic locations, with enough distance between them to significantly reduce the risk of a single event impacting your business continuity. We will have additional details on the list of available services, instance types, and so forth as we get closer to the launch.
Over time, this and other regions in this cloud will also function as parent regions for AWS Outposts and Dedicated Local Zones. These options give you even more flexibility with regard to isolation and in-country data residency. If you would like to express your interest in Dedicated Local Zones in your country, please contact your AWS account manager.
Get Ready
You can start to build applications today in any of the existing regions and move them to the AWS European Sovereign Cloud when the region launches. You can also initiate conversations with your local regulatory authorities in order to better understand any issues that are specific to your particular location.
— Jeff;
]]>In the face of rapid digital transformation, a positive organizational culture and user-centric design are the backbone of successful software delivery. And while Artificial Intelligence (AI) is the center of so many contemporary technical conversations, the impact of AI development tools on teams is still in its infancy.
These are just some of the findings from the 2023 Accelerate State of DevOps Report, the annual report from Google Cloud’s DevOps Research and Assessment (DORA) team.
For nine years, the State of DevOps survey has assembled data from more than 36,000 professionals worldwide, making it the largest and longest-running research of its kind. This year, we took a deep dive into how high-performing DevOps performers bake these technical, process, and cultural capabilities into their development practices to drive success. Specifically, we explored three key outcomes of a having a DevOps practice and the capabilities that contribute to achieving them:
This year, we were working with a particularly robust data set: the total number of organic respondents increased by 3.6x compared to last year, allowing us to perform a deeper analysis of the relationship between ways of working and outcomes. Thank you to everyone who took the survey this year!
Our research shows that an organization’s level of software delivery performance predicts overall performance, team performance, and employee well-being. In turn, we use the following measures to understand the throughput and stability of software changes:
Our analysis revealed four performance levels, including the return of the Elite performance level, which we did not detect in last year’s cohort. Elite performers around the world are able to achieve both throughput and stability.
There are several key takeaways for teams who want to understand how to improve their software delivery capabilities. Here are some of the key insights from this year’s report:
1. Establish a healthy culture
Culture is foundational to building technical capabilities, igniting technical performance, reaching organizational performance goals, and helping employees be successful. A healthy culture can help reduce burnout, increase productivity, and increase job satisfaction. Teams with generative cultures, composed of people who felt included and like they belonged on their team, have 30% higher organizational performance than organizations without a generative culture.
2. Build with users in mind
Teams can deploy as fast and successfully as they'd like, but without the user in mind, it might be for naught. Our research shows that a user-centric approach to building applications and services is one of the strongest predictors of overall organizational performance. In fact, building with the user in mind appears to inform and drive improvements across all of the technical, process, and cultural capabilities we explore in the DORA research. Teams that focus on the user have 40% higher organizational performance than teams that don’t.
3. Amplify technical capabilities with quality documentation
High-quality documentation amplifies the impact that DevOps technical capabilities (for example, continuous integration and trunk-based development) have on organizational performance. This means that quality documentation not only helps establish these technical capabilities, but helps them matter. For example, SRE practices are estimated to have 1.4x more impact on organizational performance when high-quality documentation is in place. Overall, high-quality documentation leads to 25% higher team performance relative to low-quality documentation.
4. Distribute work fairly
People who identify as underrepresented and women or those who chose to self-describe their gender have higher levels of burnout. There are likely multiple systemic and environmental factors that cause this. Unsurprisingly, we find that respondents who take on more repetitive work are more likely to experience higher levels of burnout, and members of underrepresented groups are more likely to take on more repetitive work: Underrepresented respondents report 24% more burnout than those who are not underrepresented. Underrepresented respondents do 29% more repetitive work than those who are not underrepresented.And women or those who self-reported their gender do 40% more repetitive work than men.
5. Increase infrastructure flexibility with cloud
Teams can get the most value out of the cloud by leveraging the characteristics of cloud like rapid elasticity and on-demand self-service. These characteristics predict a more flexible infrastructure. Using a public cloud, for example, leads to a 22% increase in infrastructure flexibility relative to not using the cloud. This flexibility, in turn, leads to teams with 30% higher organizational performance than those with inflexible infrastructures.
There is a lot of enthusiasm about the potential of AI development tools. We saw this in this year’s results — in fact a majority of respondents are incorporating at least some AI into the tasks we included in our survey. But we anticipate that it will take some time for AI-powered tools to come into widespread and coordinated use in the industry. We are very interested in seeing how adoption grows over time and the impact that growth will have on performance measures and outcomes that are important to organizations. Here’s where we are seeing the adoption of AI tools today:
The key takeaway from DORA’s research is that high performance requires continuous improvement. Regularly measure outcomes across your organization, teams, and employees. Identify areas for optimization and make incremental changes to dial up performance.
Don't let these insights sit on a shelf — put them into action. Contextualize the findings based on your team's current practices and pain points. Have open conversations about your bottlenecks. Comparing your metrics year-over-year is more meaningful than comparing yourself to other companies. Sustainable success comes from repeatedly finding and fixing your weaknesses. DORA's framework can help you determine which capabilities to focus on next for the biggest performance boost.
We hope the Accelerate State of DevOps Report helps organizations of all sizes, industries, and regions improve their DevOps capabilities, and we look forward to hearing your thoughts and feedback. To learn more about the report and implementing DevOps with Google Cloud:
Last year, we announced the Browser Rendering API – letting users running Puppeteer, a browser automation library, directly in Workers. Puppeteer is one of the most popular libraries used to interact with a headless browser instance to accomplish tasks like taking screenshots, generating PDFs, crawling web pages, and testing web applications. We’ve heard from developers that configuring and maintaining their own serverless browser automation systems can be quite painful.
The Workers Browser Rendering API solves this. It makes the Puppeteer library available directly in your Worker, connected to a real web browser, without the need to configure and manage infrastructure or keep browser sessions warm yourself. You can use @cloudflare/puppeteer to run the full Puppeteer API directly on Workers!
We’ve seen so much interest from the developer community since launching last year. While the Browser Rendering API is still in beta (sign up to our waitlist to get access), we wanted to share a way to get more out of our current limits by using the Browser Rendering API with Durable Objects. We’ll also be sharing pricing for the Rendering API, so you can build knowing exactly what you’ll pay for.
As a designer or frontend developer, you want to make sure that content is well-designed for visitors browsing on different screen sizes. With the number of possible devices that users are browsing on are growing, it becomes difficult to test all the possibilities manually. While there are many testing tools on the market, we want to show how easy it is to create your own Chromium based tool with the Workers Browser Rendering API and Durable Objects.
We’ll be using the Worker to handle any incoming requests, pass them to the Durable Object to take screenshots and store them in an R2 bucket. The Durable Object is used to create a browser session that’s persistent. By using Durable Object Alarms we can keep browsers open for longer and reuse browser sessions across requests.
Let’s dive into how we can build this application:
name = "rendering-api-demo"
main = "src/index.js"
compatibility_date = "2023-09-04"
compatibility_flags = [ "nodejs_compat"]
account_id = "c05e6a39aa4ccdd53ad17032f8a4dc10"
# Browser Rendering API binding
browser = { binding = "MYBROWSER" }
# Bind an R2 Bucket
[[r2_buckets]]
binding = "BUCKET"
bucket_name = "screenshots"
# Binding to a Durable Object
[[durable_objects.bindings]]
name = "BROWSER"
class_name = "Browser"
[[migrations]]
tag = "v1" # Should be unique for each entry
new_classes = ["Browser"] # Array of new classes
2. Define the Worker
This Worker simply passes the request onto the Durable Object.
export default {
async fetch(request, env) {
let id = env.BROWSER.idFromName("browser");
let obj = env.BROWSER.get(id);
// Send a request to the Durable Object, then await its response.
let resp = await obj.fetch(request.url);
let count = await resp.text();
return new Response("success");
}
};
3. Define the Durable Object class
const KEEP_BROWSER_ALIVE_IN_SECONDS = 60;
export class Browser {
constructor(state, env) {
this.state = state;
this.env = env;
this.keptAliveInSeconds = 0;
this.storage = this.state.storage;
}
async fetch(request) {
// screen resolutions to test out
const width = [1920, 1366, 1536, 360, 414]
const height = [1080, 768, 864, 640, 896]
// use the current date and time to create a folder structure for R2
const nowDate = new Date()
var coeff = 1000 * 60 * 5
var roundedDate = (new Date(Math.round(nowDate.getTime() / coeff) * coeff)).toString();
var folder = roundedDate.split(" GMT")[0]
//if there's a browser session open, re-use it
if (!this.browser) {
console.log(`Browser DO: Starting new instance`);
try {
this.browser = await puppeteer.launch(this.env.MYBROWSER);
} catch (e) {
console.log(`Browser DO: Could not start browser instance. Error: ${e}`);
}
}
// Reset keptAlive after each call to the DO
this.keptAliveInSeconds = 0;
const page = await this.browser.newPage();
// take screenshots of each screen size
for (let i = 0; i < width.length; i++) {
await page.setViewport({ width: width[i], height: height[i] });
await page.goto("https://workers.cloudflare.com/");
const fileName = "screenshot_" + width[i] + "x" + height[i]
const sc = await page.screenshot({
path: fileName + ".jpg"
}
);
this.env.BUCKET.put(folder + "/"+ fileName + ".jpg", sc);
}
// Reset keptAlive after performing tasks to the DO.
this.keptAliveInSeconds = 0;
// set the first alarm to keep DO alive
let currentAlarm = await this.storage.getAlarm();
if (currentAlarm == null) {
console.log(`Browser DO: setting alarm`);
const TEN_SECONDS = 10 * 1000;
this.storage.setAlarm(Date.now() + TEN_SECONDS);
}
await this.browser.close();
return new Response("success");
}
async alarm() {
this.keptAliveInSeconds += 10;
// Extend browser DO life
if (this.keptAliveInSeconds < KEEP_BROWSER_ALIVE_IN_SECONDS) {
console.log(`Browser DO: has been kept alive for ${this.keptAliveInSeconds} seconds. Extending lifespan.`);
this.storage.setAlarm(Date.now() + 10 * 1000);
} else console.log(`Browser DO: cxceeded life of ${KEEP_BROWSER_ALIVE_IN_SECONDS}. Browser DO will be shut down in 10 seconds.`);
}
}
That’s it! With less than a hundred lines of code, you can fully customize a powerful tool to automate responsive web design testing. You can even incorporate it into your CI pipeline to automatically test different window sizes with each build and verify the result is as expected by using an automated library like pixelmatch.
We’ve spoken to many customers deploying a Puppeteer service on their own infrastructure, on public cloud containers or functions or using managed services. The common theme that we’ve heard is that these services are costly – costly to maintain and expensive to run.
While you won’t be billed for the Browser Rendering API yet, we want to be transparent with you about costs you start building. We know it’s important to understand the pricing structure so that you don’t get a surprise bill and so that you can design your application efficiently.
You pay based on two usage metrics:
Using Durable Objects to persist browser sessions improves performance by eliminating the time that it takes to spin up a new browser session. Since it re-uses sessions, it cuts down on the number of concurrent sessions needed. We highly encourage this model of session re-use if you expect to see consistent traffic for applications that you build on the Browser Rendering API.
If you have feedback about this pricing, we’re all ears. Feel free to reach out through Discord (channel name: browser-rendering-api-beta) and share your thoughts.
Sign up to our waitlist to get access to the Workers Browser Rendering API. We’re so excited to see what you build! Share your creations with us on Twitter/X @CloudflareDev or on our Discord community.
]]>Vercel builds a front-end cloud that makes it easier for engineers to deploy and run their front-end applications. With more than 100 million deployments in Vercel in the last two years, Vercel helps users take advantage of best-in-class AWS infrastructure with zero configuration by relying heavily on serverless technology. Vercel provides a lot of features that help developers host their front-end applications. However, until the beginning of this year, they hadn’t built Cron Jobs yet.
A cron job is a scheduled task that automates running specific commands or scripts at predetermined intervals or fixed times. It enables users to set up regular, repetitive actions, such as backups, sending notification emails to customers, or processing payments when a subscription needs to be renewed. Cron jobs are widely used in computing environments to improve efficiency and automate routine operations, and they were a commonly requested feature from Vercel’s customers.
In December 2022, Vercel hosted an internal hackathon to foster innovation. That’s where Vincent Voyer and Andreas Schneider joined forces to build a prototype cron job feature for the Vercel platform. They formed a team of five people and worked on the feature for a week. The team worked on different tasks, from building a user interface to display the cron jobs to creating the backend implementation of the feature.
Amazon EventBridge Scheduler
When the hackathon team started thinking about solving the cron job problem, their first idea was to use Amazon EventBridge rules that run on a schedule. However, they realized quickly that this feature has a limit of 300 rules per account per AWS Region, which wasn’t enough for their intended use. Luckily, one of the team members had read the announcement of Amazon EventBridge Scheduler in the AWS Compute blog and they thought this would be a perfect tool for their problem.
By using EventBridge Scheduler, they could schedule one-time or recurrently millions of tasks across over 270 AWS services without provisioning or managing the underlying infrastructure.
For creating a new cron job in Vercel, a customer needs to define the frequency in which this task will run and the API they want to invoke. Vercel, in the backend, uses EventBridge Scheduler and creates a new schedule when a new cron job is created.
To call the endpoint, the team used an AWS Lambda function that receives the path that needs to be invoked as input parameters.
When the time comes for the cron job to run, EventBridge Scheduler invokes the function, which then calls the customer website endpoint that was configured.
By the end of the week, Vincent and his team had a working prototype version of the cron jobs feature, and they won a prize at the hackathon.
Building Vercel Cron Jobs
After working for one week on this prototype in December, the hackathon ended, and Vincent and his team returned to their regular jobs. In early January 2023, Vincent and the Vercel team decided to take the project and turn it into a real product.
During the hackathon, the team built the fundamental parts of the feature, but there were some details that they needed to polish to make it production ready. Vincent and Andreas worked on the feature, and in less than two months, on February 22, 2023, they announced Vercel Cron Jobs to the public. The announcement tweet got over 400 thousand views, and the community loved the launch.
The adoption of this feature was very rapid. Within a few months of launching Cron Jobs, Vercel reached over 7 million cron invocations per week, and they expect the adoption to continue growing.
How Vercel Cron Jobs Handles Scale
With this pace of adoption, scaling this feature is crucial for Vercel. In order to scale the amount of cron invocations at this pace, they had to make some business and architectural decisions.
From the business perspective, they defined limits for their free-tier customers. Free-tier customers can create a maximum of two cron jobs in their account, and they can only have hourly schedules. This means that free customers cannot run a cron job every 30 minutes; instead, they can do it at most every hour. Only customers on Vercel paid tiers can take advantage of EventBridge Scheduler minute granularity for scheduling tasks.
Also, for free customers, minute precision isn’t guaranteed. To achieve this, Vincent took advantage of the time window configuration from EventBridge Scheduler. The flexible time window configuration allows you to start a schedule within a window of time. This means that the scheduled tasks are dispersed across the time window to reduce the impact of multiple requests on downstream services. This is very useful if, for example, many customers want to run their jobs at midnight. By using the flexible time window, the load can spread across a set window of time.
From the architectural perspective, Vercel took advantage of hosting the APIs and owning the functions that the cron jobs invoke.
This means that when the Lambda function is started by EventBridge Scheduler, the function ends its run without waiting for a response from the API. Then Vercel validates if the cron job ran by checking if the API and Vercel function ran correctly from its observability mechanisms. In this way, the function duration is very short, less than 400 milliseconds. This allows Vercel to run a lot of functions per second without affecting their concurrency limits.
What Was The Impact?
Vercel’s implementation of Cron Jobs is an excellent example of what serverless technologies enable. In two months, with two people working full time, they were able to launch a feature that their community needed and enthusiastically adopted. This feature shows the completeness of Vercel’s platform and is an important feature to convince their customers to move to a paid account.
If you want to get started with EventBridge Scheduler, see Serverless Land patterns for EventBridge Scheduler, where you’ll find a broad range of examples to help you.
— Marcia
]]>There’s something for everyone in the weather app category. There are incredibly technical, complex apps, apps with a narrow focus, ones junked up with ads that don’t respect your privacy, and everything in between.
One of my favorite newer entrants in the category that I’ve been keeping an eye on for a while is Mercury Weather, a weather app that’s available as a universal purchase on all of Apple’s platforms. The app, by Triple Glazed Studios, is a pleasure to use, combining a clear, simple design with coverage on of all of Apple’s platforms.
In some ways, Mercury Weather is a spiritual successor to Weather Line, a graph-centric weather app that was sold to an unnamed purchaser a couple of years ago, which some suspect was Fox Weather based on the app’s 2023 redesign. The comparison is apt but sells Mercury Weather short because its design is superior to what Weather Line’s ever was. The app uses beautiful gradient backgrounds to convey the temperature and conditions, along with a modern layout and clear typography to make it fast and easy to check current conditions and the forecast.
Mercury Weather uses its colorful backgrounds to convey information about the current conditions.
On the iPhone, Mercury Weather is divided into three primary sections. The current temperature, conditions, humidity, wind speed, and UV index occupy the top section of the screen, backed by a gradient that conveys temperature or cloud cover. That section is followed by an hourly forecast that scrolls horizontally to show a full 24 hours of data. The next section is a daily forecast for the next eight days with high and low temperatures and forecast conditions. The daily forecast also includes the selected day’s sunrise and sunset times and the expected maximum UV index, wind, and rainfall. At the very bottom, there’s also a button that shows monthly averages, which slide up from the bottom of the screen when tapped with temperature, sunshine, and precipitation averages.
Apple’s Weather app (left) versus Mercury Weather (right).
The globe button at the top of Mercury Weather lets you search for, save, and switch between the weather forecasts for multiple locations. Also, the Settings button at the bottom of the screen offers options for displaying the actual or ‘feels like’ temperature, specific (Apple Weather or OpenWeather) or dynamic weather data sources, and whether warnings are shown.
Examples of Mercury Weather’s iPad widget options.
Mercury Weather’s iPhone and iPad apps include a collection of small and medium-sized Home Screen widgets. There are options to display the current weather conditions, hourly forecast, and daily forecast for your location, and others that allow you to show the same weather conditions and forecasts for any saved location. The iPhone app has Lock Screen widgets in all available sizes with options for current, hourly, and daily data, too.
Mercury Weather on iPad.
The iPad version of Mercury Weather covers the same ground as the iPhone version, but the larger display allows the app’s tiled interface to be spread out and seen all at once. Also, the daily forecast section of the app adds a text-based forecast to the mix on the iPad and locations are included in an expandable left sidebar.
Looks like a nice day ahead.
Mercury Weather’s Watch app takes a similar approach to the iPhone app, but the forecast is limited to your current location. Like the other versions of the app, the app’s design is terrific on both the iPad and Apple Watch.
Mercury Weather in light and dark mode on the Mac.
The latest addition to the Mercury Weather mix is a Mac version of the app. The interface is essentially the same as the iPad version but with the addition of a menu bar app that can be turned off if you prefer.
Mercury Weather’s menu bar app.
I’ve been using my iPhone in StandBy mode when I’m at my desk, where I use Apple’s Weather widget to monitor local conditions. However, I can’t do that if I’m not at home. For those times, I’ve been running Mercury Weather, which adds an icon for the current conditions, plus the temperature to my menu bar. Clicking on the menu bar item adds a text description of the conditions, the forecast high and low temperature for the day, four-hour and four-day forecast graphs, and a button to open the full app. It’s the perfect amount of information for a quick check of the weather.
Mercury Weather handles the basics of weather apps extremely well. If you’re looking for radar and other more advanced features, you should try a different app. However, if all you want is the current conditions and hourly and daily forecasts presented in an easy-to-read, refined app that is consistent across all of Apple’s platforms, Mercury Weather is a great choice.
Mercury Weather is free to download on the App Store, with Home and Lock Screen widgets, the Apple Watch app, historical data, and more than one saved location available to subscribers for $1.99/month or $9.99/year, with a $34.99 lifetime purchase option. A Family Sharing subscription is $3.49/month, $16.99/year, or $59.99 for a lifetime purchase.
Founded in 2015, Club MacStories has delivered exclusive content every week for over six years.
In that time, members have enjoyed nearly 400 weekly and monthly newsletters packed with more of your favorite MacStories writing as well as Club-only podcasts, eBooks, discounts on apps, icons, and services. Join today, and you’ll get everything new that we publish every week, plus access to our entire archive of back issues and downloadable perks.
The Club expanded in 2021 with Club MacStories+ and Club Premier. Club MacStories+ members enjoy even more exclusive stories, a vibrant Discord community, a rotating roster of app discounts, and more. And, with Club Premier, you get everything we offer at every Club level plus an extended, ad-free version of our podcast AppStories that is delivered early each week in high-bitrate audio.
Join Now]]>Over the last couple of months, Workers KV has suffered from a series of incidents, culminating in three back-to-back incidents during the week of July 17th, 2023. These incidents have directly impacted customers that rely on KV — and this isn’t good enough.
We’re going to share the work we have done to understand why KV has had such a spate of incidents and, more importantly, share in depth what we’re doing to dramatically improve how we deploy changes to KV going forward.
Workers KV — or just “KV” — is a key-value service for storing data: specifically, data with high read throughput requirements. It’s especially useful for user configuration, service routing, small assets and/or authentication data.
We use KV extensively inside Cloudflare too, with Cloudflare Access (part of our Zero Trust suite) and Cloudflare Pages being some of our highest profile internal customers. Both teams benefit from KV’s ability to keep regularly accessed key-value pairs close to where they’re accessed, as well its ability to scale out horizontally without any need to become an expert in operating KV.
Given Cloudflare’s extensive use of KV, it wasn’t just external customers impacted. Our own internal teams felt the pain of these incidents, too.
Back in June 2023, we announced the move to a new architecture for KV, which is designed to address two major points of customer feedback we’ve had around KV: high latency for infrequently accessed keys (or a key accessed in different regions), and working to ensure the upper bound on KV’s eventual consistency model for writes is 60 seconds — not “mostly 60 seconds”.
At the time of the blog, we’d already been testing this internally, including early access with our community champions and running a small % of production traffic to validate stability and performance expectations beyond what we could emulate within a staging environment.
However, in the weeks between mid-June and culminating in the series of incidents during the week of July 17th, we would continue to increase the volume of new traffic onto the new architecture. When we did this, we would encounter previously unseen problems (many of these customer-impacting) — then immediately roll back, fix bugs, and repeat. Internally, we’d begun to identify that this pattern was becoming unsustainable — each attempt to cut traffic onto the new architecture would surface errors or behaviors we hadn’t seen before and couldn’t immediately explain, and thus we would roll back and assess.
The issues at the root of this series of incidents proved to be significantly challenging to track and observe. Once identified, the two causes themselves proved to be quick to fix, but an (1) observability gap in our error reporting and (2) a mutation to local state that resulted in an unexpected mutation of global state were both hard to observe and reproduce over the days following the customer-facing impact ending.
One important piece of context to understand before we go into detail on the post-mortem: Workers KV is composed of two separate Workers scripts – internally referred to as the Storage Gateway Worker and SuperCache. SuperCache is an optional path in the Storage Gateway Worker workflow, and is the basis for KV's new (faster) backend (refer to the blog).
Here is a timeline of events:
Time | Description |
---|---|
2023-07-17 21:52 UTC | Cloudflare observes alerts showing 500 HTTP status codes in the MEL01 data-center (Melbourne, AU) and begins investigating. We also begin to see a small set of customers reporting HTTP 500s being returned via multiple channels. It is not immediately clear if this is a data-center-wide issue or KV specific, as there had not been a recent KV deployment, and the issue directly correlated with three data-centers being brought back online. |
2023-07-18 00:09 UTC | We disable the new backend for KV in MEL01 in an attempt to mitigate the issue (noting that there had not been a recent deployment or change to the % of users on the new backend). |
2023-07-18 05:42 UTC | Investigating alerts showing 500 HTTP status codes in VIE02 (Vienna, AT) and JNB01 (Johannesburg, SA). |
2023-07-18 13:51 UTC | The new backend is disabled globally after seeing issues in VIE02 (Vienna, AT) and JNB01 (Johannesburg, SA) data-centers, similar to MEL01. In both cases, they had also recently come back online after maintenance, but it remained unclear as to why KV was failing. |
2023-07-20 19:12 UTC | The new backend is inadvertently re-enabled while deploying the update due to a misconfiguration in a deployment script. |
2023-07-20 19:33 UTC | The new backend is (re-) disabled globally as HTTP 500 errors return. |
2023-07-20 23:46 UTC | Broken Workers script pipeline deployed as part of gradual rollout due to incorrectly defined pipeline configuration in the deployment script. Metrics begin to report that a subset of traffic is being black-holed. |
2023-07-20 23:56 UTC | Broken pipeline rolled back; errors rates return to pre-incident (normal) levels. |
All timestamps referenced are in Coordinated Universal Time (UTC).
We initially observed alerts showing 500 HTTP status codes in the MEL01 data-center (Melbourne, AU) at 21:52 UTC on July 17th, and began investigating. We also received reports from a small set of customers reporting HTTP 500s being returned via multiple channels. This correlated with three data centers being brought back online, and it was not immediately clear if it related to the data centers or was KV-specific — especially given there had not been a recent KV deployment. On 05:42, we began investigating alerts showing 500 HTTP status codes in VIE02 (Vienna) and JNB02 (Johannesburg) data-centers; while both had recently come back online after maintenance, it was still unclear why KV was failing. At 13:51 UTC, we made the decision to disable the new backend globally.
Following the incident on July 18th, we attempted to deploy an allow-list configuration to reduce the scope of impacted accounts. However, while attempting to roll out a change for the Storage Gateway Worker at 19:12 UTC on July 20th, an older configuration was progressed causing the new backend to be enabled again, leading to the third event. As the team worked to fix this and deploy this configuration, they attempted to manually progress the deployment at 23:46 UTC, which resulted in the passing of a malformed configuration value that caused traffic to be sent to an invalid Workers script configuration.
After all deployments and the broken Workers configuration (pipeline) had been rolled back at 23:56 on the 20th July, we spent the following three days working to identify the root cause of the issue. We lacked observability as KV's Worker script (responsible for much of KV's logic) was throwing an unhandled exception very early on in the request handling process. This was further exacerbated by prior work to disable error reporting in a disabled data-center due to the noise generated, which had previously resulted in logs being rate-limited upstream from our service.
This previous mitigation prevented us from capturing meaningful logs from the Worker, including identifying the exception itself, as an uncaught exception terminates request processing. This has raised the priority of improving how unhandled exceptions are reported and surfaced in a Worker (see Recommendations, below, for further details). This issue was exacerbated by the fact that KV's Worker script would fail to re-enter its "healthy" state when a Cloudflare data center was brought back online, as the Worker was mutating an environment variable perceived to be in request scope, but that was in global scope and persisted across requests. This effectively left the Worker “frozen” with the previous, invalid configuration for the affected locations.
Further, the introduction of a new progressive release process for Workers KV, designed to de-risk rollouts (as an action from a prior incident), prolonged the incident. We found a bug in the deployment logic that led to a broader outage due to an incorrectly defined configuration.
This configuration effectively caused us to drop a single-digit % of traffic until it was rolled back 10 minutes later. This code is untested at scale, and we need to spend more time hardening it before using it as the default path in production.
Additionally: although the root cause of the incidents was limited to three Cloudflare data-centers (Melbourne, Vienna, and Johannesburg), traffic across these regions still uses these data centers to route reads and writes to our system of record. Because these three data centers participate in KV’s new backend as regional tiers, a portion of traffic across the Oceania, Europe, and African regions was affected. Only a portion of keys from enrolled namespaces use any given data center as a regional tier in order to limit a single (regional) point of failure, so while traffic across all data centers in the region was impacted, nowhere was all traffic in a given data center affected.
We estimated the affected traffic to be 0.2-0.5% of KV's global traffic (based on our error reporting), however we observed some customers with error rates approaching 20% of their total KV operations. The impact was spread across KV namespaces and keys for customers within the scope of this incident.
Both KV’s high total traffic volume and its role as a critical dependency for many customers amplify the impact of even small error rates. In all cases, once the changes were rolled back, errors returned to normal levels and did not persist.
Before we dive into what we’re doing to significantly improve how we build, test, deploy and observe Workers KV going forward, we think there are lessons from the real world that can equally apply to how we improve the safety factor of the software we ship.
In traditional engineering and construction, there is an extremely common procedure known as a “JSEA”, or Job Safety and Environmental Analysis (sometimes just “JSA”). A JSEA is designed to help you iterate through a list of tasks, the potential hazards, and most importantly, the controls that will be applied to prevent those hazards from damaging equipment, injuring people, or worse.
One of the most critical concepts is the “hierarchy of controls” — that is, what controls should be applied to mitigate these hazards. In most practices, these are elimination, substitution, engineering, administration and personal protective equipment. Elimination and substitution are fairly self-explanatory: is there a different way to achieve this goal? Can we eliminate that task completely? Engineering and administration ask us whether there is additional engineering work, such as changing the placement of a panel, or using a horizontal boring machine to lay an underground pipe vs. opening up a trench that people can fall into.
The last and lowest on the hierarchy, is personal protective equipment (PPE). A hard hat can protect you from severe injury from something falling from above, but it’s a last resort, and it certainly isn’t guaranteed. In engineering practice, any hazard that only lists PPE as a mitigating factor is unsatisfactory: there must be additional controls in place. For example, instead of only wearing a hard hat, we should engineer the floor of scaffolding so that large objects (such as a wrench) cannot fall through in the first place. Further, if we require that all tools are attached to the wearer, then it significantly reduces the chance the tool can be dropped in the first place. These controls ensure that there are multiple degrees of mitigation — defense in depth — before your hard hat has to come into play.
Coming back to software, we can draw parallels between these controls: engineering can be likened to improving automation, gradual rollouts, and detailed metrics. Similarly, personal protective equipment can be likened to code review: useful, but code review cannot be the only thing protecting you from shipping bugs or untested code. Automation with linters, more robust testing, and new metrics are all vastly safer ways of shipping software.
As we spent time assessing where to improve our existing controls and how to put new controls in place to mitigate risks and improve the reliability (safety) of Workers KV, we took a similar approach: eliminating unnecessary changes, engineering more resilience into our codebase, automation, deployment tooling, and only then looking at human processes.
Cloudflare is undertaking a larger, more structured review of KV's observability tooling, release infrastructure and processes to mitigate not only the contributing factors to the incidents within this report, but recent incidents related to KV. Critically, we see tooling and automation as the most powerful mechanisms for preventing incidents, with process improvements designed to provide an additional layer of protection. Process improvements alone cannot be the only mitigation.
Specifically, we have identified and prioritized the below efforts as the most important next steps towards meeting our own availability SLOs, and (above all) make KV a service that customers building on Workers can rely on for storing configuration and service data in the hot path of their traffic:
This is not an exhaustive list: we're continuing to expand on preventative measures associated with these and other incidents. These changes will not only improve KVs reliability, but other services across Cloudflare that KV relies on, or that rely on KV.
We recognize that KV hasn’t lived up to our customers’ expectations recently. Because we rely on KV so heavily internally, we’ve felt that pain first hand as well. The work to fix the issues that led to this cycle of incidents is already underway. That work will not only improve KV’s reliability but also improve the reliability of any software written on the Cloudflare Workers developer platform, whether by our customers or by ourselves.
]]>Public IPv4 Charge
As you may know, IPv4 addresses are an increasingly scarce resource and the cost to acquire a single public IPv4 address has risen more than 300% over the past 5 years. This change reflects our own costs and is also intended to encourage you to be a bit more frugal with your use of public IPv4 addresses and to think about accelerating your adoption of IPv6 as a modernization and conservation measure.
This change applies to all AWS services including Amazon Elastic Compute Cloud (Amazon EC2), Amazon Relational Database Service (RDS) database instances, Amazon Elastic Kubernetes Service (EKS) nodes, and other AWS services that can have a public IPv4 address allocated and attached, in all AWS regions (commercial, AWS China, and GovCloud). Here’s a summary in tabular form:
Public IP Address Type | Current Price/Hour (USD) | New Price/Hour (USD) (Effective February 1, 2024) |
In-use Public IPv4 address (including Amazon provided public IPv4 and Elastic IP) assigned to resources in your VPC, Amazon Global Accelerator, and AWS Site-to-site VPN tunnel | No charge | $0.005 |
Additional (secondary) Elastic IP Address on a running EC2 instance | $0.005 | $0.005 |
Idle Elastic IP Address in account | $0.005 | $0.005 |
The AWS Free Tier for EC2 will include 750 hours of public IPv4 address usage per month for the first 12 months, effective February 1, 2024. You will not be charged for IP addresses that you own and bring to AWS using Amazon BYOIP.
Starting today, your AWS Cost and Usage Reports automatically include public IPv4 address usage. When this price change goes in to effect next year you will also be able to use AWS Cost Explorer to see and better understand your usage.
As I noted earlier in this post, I would like to encourage you to consider accelerating your adoption of IPv6. A new blog post shows you how to use Elastic Load Balancers and NAT Gateways for ingress and egress traffic, while avoiding the use of a public IPv4 address for each instance that you launch. Here are some resources to show you how you can use IPv6 with widely used services such as EC2, Amazon Virtual Private Cloud (Amazon VPC), Amazon Elastic Kubernetes Service (EKS), Elastic Load Balancing, and Amazon Relational Database Service (RDS):
Earlier this year we enhanced EC2 Instance Connect and gave it the ability to connect to your instances using private IPv4 addresses. As a result, you no longer need to use public IPv4 addresses for administrative purposes (generally using SSH or RDP).
Public IP Insights
In order to make it easier for you to monitor, analyze, and audit your use of public IPv4 addresses, today we are launching Public IP Insights, a new feature of Amazon VPC IP Address Manager that is available to you at no cost. In addition to helping you to make efficient use of public IPv4 addresses, Public IP Insights will give you a better understanding of your security profile. You can see the breakdown of public IP types and EIP usage, with multiple filtering options:
You can also see, sort, filter, and learn more about each of the public IPv4 addresses that you are using:
Using IPv4 Addresses Efficiently
By using the new IP Insights tool and following the guidance that I shared above, you should be ready to update your application to minimize the effect of the new charge. You may also want to consider using AWS Direct Connect to set up a dedicated network connection to AWS.
Finally, be sure to read our new blog post, Identify and Optimize Public IPv4 Address Usage on AWS, for more information on how to make the best use of public IPv4 addresses.
— Jeff;
]]>Workforce Identity Federation allows use of an external identity provider (IdP) to authenticate and authorize users (including employees, partners, and contractors) to Google Cloud resources without provisioning identities in Cloud Identity. Before its introduction, only identities existing within Cloud Identity could be used with Cloud Identity Access Management (IAM).
Here’s how to configure an example Javascript web application hosted in Google Cloud to call Google Cloud APIs after being authenticated with an Azure AD using Workforce Identity Federation.
Workforce Identity can be used with IdPs supporting OpenID Connect (OIDC) or SAML 2.0. You can read more about it in our blog post and product documentation page.
There will be three high level configuration steps required:
Prepare your external IdP and get required configuration parameters.
Create a logical container for your external identities in Google Cloud in the form of Workforce Identity Pool.
Establish relation between your Workforce Identity Pool and external IdP by configuring WorkforceIdentity Pool Provider using information gathered in the first step.
Before Workforce Identity Federation can be used, a one-way trust relationship must be established between your Google Cloud environment and external IdP. This is achieved by configuring the following resources in Google Cloud: 1) a Workforce Identity Pool, which is a logical container for external identities and 2) a corresponding Workforce Identity Pool Provider, encapsulating technical details of external IdP integration.
Understanding of OIDC flow may be helpful to understand integration with your application code. We will focus on a single page web application which calls Google Cloud APIs. For simplicity, we omit details of protocol, such as audiences and claims, as they are irrelevant to understanding the flow:
Client downloads a web app with JS code. In our example, static content is exposed from the GCS storage bucket.
The unauthenticated user is redirected to an external IdP login page for authentication.
On successful login, the external IdP returns the authentication result including an ID Token.
The ID Token contains information about identity and can be exchanged into an “access token”. This is accomplished on the Google Cloud side with a service called Secure Token Service (STS) (API documentation).
STS verifies the ID Token and if successful returns a Google Identity access token.
Access token can be used as a bearer token in subsequent Google Cloud API calls. Please note: by default, access tokens are good for 1 hour (3,600 seconds). When the access token has expired, your token management code must get a new one.
As an example of incorporating this flow within your application we will use Azure Active Directory as an external IdP.
Azure AD requires following steps to act as IdP for Workforce Identity Federation:
Registering application
Assigning users (and groups) to the enterprise application
Azure AD performs identity management only for registered applications. This is why the first thing we need to do is to create a new application registration (go to Azure Active Directory and select App registrations). An important parameter to assign is the type of redirect URI, in our case we choose “Single-page application (SPA)” If our goal is to provide integration with Google Cloud Federated Console (console.cloud.google) we need to choose a type of “Web”. More information about configuration and the federated version of Cloud Console is provided in Configure Azure AD-based workforce identity federation.
Next step is to choose “ID Tokens” from the list of tokens issued by authorization endpoint. For details see OpenID Connect (OIDC).
From the information screen after you finish registration note “Application (client) ID” (this is “client id” needed for Identity Workforce Pool Provider configuration in Google Cloud), and click on “Endpoints” above.
From the endpoints window copy the “OpenID Connect metadata document” URL, navigate to it and look for the “issuer” field. Value of this field (in the form of https://login.microsoftonline.com/TENANT_ID/v2.0) will be required for the Workforce Identity Federation).
Last step is to go to “Enterprise applications”, find your application and assign users and groups that you want to give access to your application.
Configurations steps to be executed on Google Cloud:
Specify billing project
Enable APIs
Create Workforce Identity Pool
Create Workforce Identity Pool Provider
Assign required permissions to external identities from the Pool
Before you begin make sure APIs are enabled on the billing project (as Workforce Identity Federation artifacts are at organization level, you need to specify the project which will be used for billing associated with those resources):
IAM API
Security Token Service API
You can find detailed information in the “Before you begin” section of product documentation.
Please consult details of configuration parameters in the corresponding section of Cloud SDK documentation.
This is a crucial step in the configuration, as here we establish one-way trust to our Idp. We will need information from the Azure environment: issuer uri and client id, we should have them from the previous step.
Please note the attribute mapping parameter where we decide which attribute (assertion) from IdP we will use for the required google.subject attribute. In our example we use preferred_username assertion which in case of Azure AD carries email of authenticated user. This determines the syntax we will use for referencing external identities in Google Cloud as described in Represent workforce pool users in IAM policies.
Now we just need to assign the correct set of roles to our external identities. In the following example we assign the serviceUserConsumer role, which is required to consume any Google Cloud API, so will also be necessary for your external identities.
$TEST_SUBJECT in our case is an email of one of Azure AD users we assigned to our enterprise application.
Now our Workforce Identity Federation should be ready to rock’n’roll!
To use Azure AD ID Tokens in a Web Application, Microsoft recommends using the Microsoft Authentication Library for JavaScript (MSAL.js). Several wrappers exist for this library to be used in e.g. Node.js, React and Angular.
To demonstrate using the Workforce Identity Federation, the following example is provided in Javascript using the Microsoft Authentication Library for JavaScript (MSAL.js) 2.0 for Browser-Based Single-Page Applications.
As a first step, the MSAL.js library must be loaded and initialized. For simplicity we are using the CDN version of the library, in most cases the NPM package should be used instead. The script is loaded in the HTML body with async and defer options to ensure that it does not interfere with page loading time.
MSAL.js can use a popup or redirect login. Without a backend to handle the redirect, the popup method is the recommended method. To show a popup without triggering the popup blocker in modern browsers, the popup needs to be triggered by a user action (e.g. a button click) and must happen within a short period of time after the button was clicked.
As a result an HTML button is added to trigger the login popup and inline Javascript with a setup function called after the MSAL.js library is loaded, which enables the login button and initializes a PublicClientApplication from the MSAL.js library. The PublicClientApplication initialization requires the clientId and authority as defined in the AD Single Page Setup above.
Now the login functionality is added to open the login popup and handle the login response. The response contains an access token and an ID token. The ID token will be sent to STS to be exchanged for a Google Identity access token. The access token returned after successful login is intended to be used with Azure.
In our example, it is used to query the Graph API to retrieve the user information from AD. To access Google Cloud resources we need to exchange an ID token from Azure for another access token using Google Cloud STS, based on trust established by the Workforce Identity Federation setup. The Google Identity access token can now be used to access Google APIs, like e.g. the resourcemanager API to retrieve the list of projects visible to the current user (requires that the user has the IAM permission resourcemanager.projects.get
)
Please note that in calling Google Cloud STS we need to provide a proper audience, which is the URI of our Workforce Identity Federation pool provider.
The access token can then be used to authenticate against all APIs and services which support Workforce Identity Federation.
The code shared above must be hosted on a valid HTTPS endpoint which is configured in AD as an endpoint for the single page application. This can be achieved with GCLB and a public GCS bucket.
Users and Groups from Active Directory can be represented in IAM policies using the following format:
Important: All principals must have the role roles/serviceusage.serviceUsageConsumer
which contains IAM permission serviceusage.services.use
.
As with every software development exercise, sooner or later something goes wrong, here is the list of hints we think may be useful when things are not going as planned.
Familiarize yourself with IAM logging: Read the Example logs for Workforce Identity Federation section of IAM documentation.
Use Cloud Logging: This should always be the first thing to do, check the logs in Logs Explorer looking for errors, unauthorized calls, and so on. Remember, to view logs you need corresponding permissions on the project.
Check permissions: In the text we mentioned permissions required for the external identity to call Google Cloud APIs, check if your external identity has been assigned permissions including roles/serviceusage.serviceUsageConsumer
role and roles associated with APIs being called. Check preliminary requirements again.
Check your audience: When calling STS for token exchange you must have the correct audience set, which is referring to your Workforce Identity Pool Provider, check our code example for syntax.
A research paper shows that container image downloads account for 76 percent of container startup time, but on average only 6.4 percent of the data is needed for the container to start doing useful work. Starting and scaling out containerized applications requires downloading container images from a remote container registry. This may introduce a non-trivial latency, as the entire image must be downloaded and unpacked before the applications can be started.
One solution to this problem is lazy loading (also known as asynchronous loading) container images. This approach downloads data from the container registry in parallel with the application startup, such as stargz-snapshotter, a project that aims to improve the overall container start time.
Last year, we introduced Seekable OCI (SOCI), a technology open sourced by Amazon Web Services (AWS) that enables container runtimes to implement lazy loading the container image to start applications faster without modifying the container images. As part of that effort, we open sourced SOCI Snapshotter, a snapshotter plugin that enables lazy loading with SOCI in containerd.
AWS Fargate Support for SOCI
Today, I’m excited to share that AWS Fargate now supports Seekable OCI (SOCI), which helps applications deploy and scale out faster by enabling containers to start without waiting to download the entire container image. At launch, this new capability is available for Amazon Elastic Container Service (Amazon ECS) applications running on AWS Fargate.
Here’s a quick look to show how AWS Fargate support for SOCI works:
SOCI works by creating an index (SOCI index) of the files within an existing container image. This index is a key enabler to launching containers faster, providing the capability to extract an individual file from a container image without having to download the entire image. Your applications no longer need to wait to complete pulling and unpacking a container image before your applications start running. This allows you to deploy and scale out applications more quickly and reduce the rollout time for application updates.
A SOCI index is generated and stored separately from the container images. This means that your container images don’t need to be converted to use SOCI, therefore not breaking secure hash algorithm (SHA)-based security, such as container image signing. The index is then stored in the registry alongside the container image. At release, AWS Fargate support for SOCI works with Amazon Elastic Container Registry (Amazon ECR).
When you use Amazon ECS with AWS Fargate to run your SOCI-indexed containerized images, AWS Fargate automatically detects if a SOCI index for the image exists and starts the container without waiting for the entire image to be pulled. This also means that AWS Fargate will still continue to run container images that don’t have SOCI indexes.
Let’s Get Started
There are two ways to create SOCI indexes for container images.
soci
CLI provided by the soci-snapshotter project.The AWS SOCI Index Builder provides you with an automated process to get started and build SOCI indexes for your container images. The soci
CLI provides you with more flexibility around index generation and the ability to natively integrate index generation in your CI/CD pipelines.
In this article, I manually generate SOCI indexes using the soci
CLI from the soci-snapshotter
project.
Create a Repository and Push Container Images
First, I create an Amazon ECR repository called pytorch-soci
for my container image using AWS CLI.
$ aws ecr create-repository --region us-east-1 --repository-name pytorch-soci
I keep the Amazon ECR URI output and define it as a variable to make it easier for me to refer to the repository in the next step.
$ ECRSOCIURI=xyz.dkr.ecr.us-east-1.amazonaws.com/pytorch-soci:latest
For the sample application, I use a PyTorch training (CPU-based) container image from AWS Deep Learning Containers. I use the nerdctl
CLI to pull the container image because, by default, the Docker Engine stores the container image in the Docker Engine image store, not the containerd image store.
$ SAMPLE_IMAGE="763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.5.1-cpu-py36-ubuntu16.04"
$ aws ecr get-login-password --region us-east-1 | sudo nerdctl login --username AWS --password-stdin xyz.dkr.ecr.ap-southeast-1.amazonaws.com
$ sudo nerdctl pull --platform linux/amd64 $SAMPLE_IMAGE
Then, I tag the container image for the repository that I created in the previous step.
$ sudo nerdctl tag $SAMPLE_IMAGE $ECRSOCIURI
Next, I need to push the container image into the ECR repository.
$ sudo nerdctl push $ECRSOCIURI
At this point, my container image is already in my Amazon ECR repository.
Create SOCI Indexes
Next, I need to create SOCI index.
A SOCI index is an artifact that enables lazy loading of container images. A SOCI index consists of 1) a SOCI index manifest and 2) a set of zTOCs. The following image illustrates the components in a SOCI index manifest, and how it refers to a container image manifest.
The SOCI index manifest contains the list of zTOCs and a reference to the image for which the manifest was generated. A zTOC, or table of contents for compressed data, consists of two parts:
To learn more about the concept and term, please visit soci-snapshotter
Terminology page.
Before I can create SOCI indexes, I need to install the soci
CLI. To learn more about how to install the soci
, visit Getting Started with soci-snapshotter.
To create SOCI indexes, I use the soci create
command.
$ sudo soci create $ECRSOCIURI
layer sha256:4c6ec688ebe374ea7d89ce967576d221a177ebd2c02ca9f053197f954102e30b -> ztoc skipped
layer sha256:ab09082b308205f9bf973c4b887132374f34ec64b923deef7e2f7ea1a34c1dad -> ztoc skipped
layer sha256:cd413555f0d1643e96fe0d4da7f5ed5e8dc9c6004b0731a0a810acab381d8c61 -> ztoc skipped
layer sha256:eee85b8a173b8fde0e319d42ae4adb7990ed2a0ce97ca5563cf85f529879a301 -> ztoc skipped
layer sha256:3a1b659108d7aaa52a58355c7f5704fcd6ab1b348ec9b61da925f3c3affa7efc -> ztoc skipped
layer sha256:d8f520dcac6d926130409c7b3a8f77aea639642ba1347359aaf81a8b43ce1f99 -> ztoc skipped
layer sha256:d75d26599d366ecd2aa1bfa72926948ce821815f89604b6a0a49cfca100570a0 -> ztoc skipped
layer sha256:a429d26ed72a85a6588f4b2af0049ae75761dac1bb8ba8017b8830878fb51124 -> ztoc skipped
layer sha256:5bebf55933a382e053394e285accaecb1dec9e215a5c7da0b9962a2d09a579bc -> ztoc skipped
layer sha256:5dfa26c6b9c9d1ccbcb1eaa65befa376805d9324174ac580ca76fdedc3575f54 -> ztoc skipped
layer sha256:0ba7bf18aa406cb7dc372ac732de222b04d1c824ff1705d8900831c3d1361ff5 -> ztoc skipped
layer sha256:4007a89234b4f56c03e6831dc220550d2e5fba935d9f5f5bcea64857ac4f4888 -> ztoc sha256:0b4d78c856b7e9e3d507ac6ba64e2e2468997639608ef43c088637f379bb47e4
layer sha256:089632f60d8cfe243c5bc355a77401c9a8d2f415d730f00f6f91d44bb96c251b -> ztoc sha256:f6a16d3d07326fe3bddbdb1aab5fbd4e924ec357b4292a6933158cc7cc33605b
layer sha256:f18dd99041c3095ade3d5013a61a00eeab8b878ba9be8545c2eabfbca3f3a7f3 -> ztoc sha256:95d7966c964dabb54cb110a1a8373d7b88cfc479336d473f6ba0f275afa629dd
layer sha256:69e1edcfbd217582677d4636de8be2a25a24775469d677664c8714ed64f557c3 -> ztoc sha256:ac0e18bd39d398917942c4b87ac75b90240df1e5cb13999869158877b400b865
From the above output, I can see that soci
CLI created zTOCs for four layers, which and this means only these four layers will be lazily pulled and the other container image layers will be downloaded in full before the container image starts. This is because there is less of a launch time impact in lazy loading very small container image layers. However, you can configure this behavior using the --min-layer-size
flag when you run soci create
.
Verify and Push SOCI Indexes
The soci
CLI also provides several commands that can help you to review the SOCI Indexes that have been generated.
To see a list of all index manifests, I can run the following command.
$ sudo soci index list
DIGEST SIZE IMAGE REF PLATFORM MEDIA TYPE CREATED
sha256:ea5c3489622d4e97d4ad5e300c8482c3d30b2be44a12c68779776014b15c5822 1931 xyz.dkr.ecr.us-east-1.amazonaws.com/pytorch-soci:latest linux/amd64 application/vnd.oci.image.manifest.v1+json 10m4s ago
sha256:ea5c3489622d4e97d4ad5e300c8482c3d30b2be44a12c68779776014b15c5822 1931 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.5.1-cpu-py36-ubuntu16.04 linux/amd64 application/vnd.oci.image.manifest.v1+json 10m4s ago
While optional, if I need to see the list of zTOC, I can use the following command.
$ sudo soci ztoc list
DIGEST SIZE LAYER DIGEST
sha256:0b4d78c856b7e9e3d507ac6ba64e2e2468997639608ef43c088637f379bb47e4 2038072 sha256:4007a89234b4f56c03e6831dc220550d2e5fba935d9f5f5bcea64857ac4f4888
sha256:95d7966c964dabb54cb110a1a8373d7b88cfc479336d473f6ba0f275afa629dd 11442416 sha256:f18dd99041c3095ade3d5013a61a00eeab8b878ba9be8545c2eabfbca3f3a7f3
sha256:ac0e18bd39d398917942c4b87ac75b90240df1e5cb13999869158877b400b865 36277264 sha256:69e1edcfbd217582677d4636de8be2a25a24775469d677664c8714ed64f557c3
sha256:f6a16d3d07326fe3bddbdb1aab5fbd4e924ec357b4292a6933158cc7cc33605b 10152696 sha256:089632f60d8cfe243c5bc355a77401c9a8d2f415d730f00f6f91d44bb96c251b
This series of zTOCs contains all of the information that SOCI needs to find a given file in a layer. To review the zTOC for each layer, I can use one of the digest sums from the preceding output and use the following command.
$ sudo soci ztoc info sha256:0b4d78c856b7e9e3d507ac6ba64e2e2468997639608ef43c088637f379bb47e4
{
"version": "0.9",
"build_tool": "AWS SOCI CLI v0.1",
"size": 2038072,
"span_size": 4194304,
"num_spans": 33,
"num_files": 5552,
"num_multi_span_files": 26,
"files": [
{
"filename": "bin/",
"offset": 512,
"size": 0,
"type": "dir",
"start_span": 0,
"end_span": 0
},
{
"filename": "bin/bash",
"offset": 1024,
"size": 1037528,
"type": "reg",
"start_span": 0,
"end_span": 0
}
---Trimmed for brevity---
Now, I need to use the following command to push all SOCI-related artifacts into the Amazon ECR.
$ PASSWORD=$(aws ecr get-login-password --region us-east-1)
$ sudo soci push --user AWS:$PASSWORD $ECRSOCIURI
If I go to my Amazon ECR repository, I can verify the index is created. Here, I can see that two additional objects are listed alongside my container image: a SOCI Index and an Image index. The image index allows AWS Fargate to look up SOCI indexes associated with my container image.
Understanding SOCI Performance
The main objective of SOCI is to minimize the required time to start containerized applications. To measure the performance of AWS Fargate lazy loading container images using SOCI, I need to understand how long it takes for my container images to start with SOCI and without SOCI.
To understand the duration needed for each container image to start, I can use metrics available from the DescribeTasks
API on Amazon ECS. The first metric is createdAt
, the timestamp for the time when the task was created and entered the PENDING
state. The second metric is startedAt
, the time when the task transitioned from the PENDING
state to the RUNNING
state.
For this, I have created another Amazon ECR repository using the same container image but without generating a SOCI index, called pytorch-without-soci
. If I compare these container images, I have two additional objects in pytorch-soci
(an image index and a SOCI index) that don’t exist in pytorch-without-soci
.
Deploy and Run Applications
To run the applications, I have created an Amazon ECS cluster called demo-pytorch-soci-cluster
, a VPC and the required ECS task execution role. If you’re new to Amazon ECS, you can follow Getting started with Amazon ECS to be more familiar with how to deploy and run your containerized applications.
Now, let’s deploy and run both the container images with FARGATE
as the launch type. I define five tasks for each pytorch-soci
and pytorch-without-soci
.
$ aws ecs \
--region us-east-1 \
run-task \
--count 5 \
--launch-type FARGATE \
--task-definition arn:aws:ecs:us-east-1:XYZ:task-definition/pytorch-soci \
--cluster socidemo
$ aws ecs \
--region us-east-1 \
run-task \
--count 5 \
--launch-type FARGATE \
--task-definition arn:aws:ecs:us-east-1:XYZ:task-definition/pytorch-without-soci \
--cluster socidemo
After a few minutes, there are 10 running tasks on my ECS cluster.
After verifying that all my tasks are running, I run the following script to get two metrics: createdAt
and startedAt
.
#!/bin/bash
CLUSTER=<CLUSTER_NAME>
TASKDEF=<TASK_DEFINITION>
REGION="us-east-1"
TASKS=$(aws ecs list-tasks \
--cluster $CLUSTER \
--family $TASKDEF \
--region $REGION \
--query 'taskArns[*]' \
--output text)
aws ecs describe-tasks \
--tasks $TASKS \
--region $REGION \
--cluster $CLUSTER \
--query "tasks[] | reverse(sort_by(@, &createdAt)) | [].[{startedAt: startedAt, createdAt: createdAt, taskArn: taskArn}]" \
--output table
Running the above command for the container image without SOCI indexes — pytorch-without-soci
— produces following output:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| DescribeTasks |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+
| createdAt | startedAt | taskArn |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+
| 2023-07-07T17:43:59.233000+00:00| 2023-07-07T17:46:09.856000+00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/dcdf19b6e66444aeb3bc607a3114fae0 |
| 2023-07-07T17:43:59.233000+00:00| 2023-07-07T17:46:09.459000+00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/9178b75c98ee4c4e8d9c681ddb26f2ca |
| 2023-07-07T17:43:59.233000+00:00| 2023-07-07T17:46:21.645000+00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/7da51e036c414cbab7690409ce08cc99 |
| 2023-07-07T17:43:59.233000+00:00| 2023-07-07T17:46:00.606000+00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/5ee8f48194874e6dbba75a5ef753cad2 |
| 2023-07-07T17:43:59.233000+00:00| 2023-07-07T17:46:02.461000+00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/58531a9e94ed44deb5377fa997caec36 |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+
From the average aggregated delta time (between startedAt
and createdAt
) for each task, the pytorch-without-soci
(without SOCI indexes) successfully ran after 129 seconds.
Next, I’m running same command but for pytorch-soci
which comes with SOCI indexes.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| DescribeTasks |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+
| createdAt | startedAt | taskArn |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+
| 2023-07-07T17:43:53.318000+00:00| 2023-07-07T17:44:51.076000+00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/c57d8cff6033494b97f6fd0e1b797b8f |
| 2023-07-07T17:43:53.318000+00:00| 2023-07-07T17:44:52.212000+00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/6d168f9e99324a59bd6e28de36289456 |
| 2023-07-07T17:43:53.318000+00:00| 2023-07-07T17:45:05.443000+00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/4bdc43b4c1f84f8d9d40dbd1a41645da |
| 2023-07-07T17:43:53.318000+00:00| 2023-07-07T17:44:50.618000+00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/43ea53ea84154d5aa90f8fdd7414c6df |
| 2023-07-07T17:43:53.318000+00:00| 2023-07-07T17:44:50.777000+00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/0731bea30d42449e9006a5d8902756d5 |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+
Here, I see my container image with SOCI-enabled — pytorch-soci
— was started 60 seconds after being created.
This means that running my sample application with SOCI indexes on AWS Fargate is approximately 50 percent faster compared to running without SOCI indexes.
It’s recommended to benchmark the startup and scaling-out time of your application with and without SOCI. This helps you to have a better understanding of how your application behaves and if your applications benefit from AWS Fargate support for SOCI.
Customer Voices
During the private preview period, we heard lots of feedback from our customers about AWS Fargate support for SOCI. Here’s what our customers say:
Autodesk provides critical design, make, and operate software solutions across the architecture, engineering, construction, manufacturing, media, and entertainment industries. “SOCI has given us a 50% improvement in startup performance for our time-sensitive simulation workloads running on Amazon ECS with AWS Fargate. This allows our application to scale out faster, enabling us to quickly serve increased user demand and save on costs by reducing idle compute capacity. The AWS Partner Solution for creating the SOCI index is easy to configure and deploy.” – Boaz Brudner, Head of Innovyze SaaS Engineering, AI and Architecture, Autodesk.
Flywire is a global payments enablement and software company, on a mission to deliver the world’s most important and complex payments. “We run multi-step deployment pipelines on Amazon ECS with AWS Fargate which can take several minutes to complete. With SOCI, the total pipeline duration is reduced by over 50% without making any changes to our applications, or the deployment process. This allowed us to drastically reduce the rollout time for our application updates. For some of our larger images of over 750MB, SOCI improved the task startup time by more than 60%.”, Samuel Burgos, Sr. Cloud Security Engineer, Flywire.
Virtuoso is a leading software corporation that makes functional UI and end-to-end testing software. “SOCI has helped us reduce the lag between demand and availability of compute. We have very bursty workloads which our customers expect to start as fast as possible. SOCI helps our ECS tasks spin-up 40% faster, allowing us to quickly scale our application and reduce the pool of idle compute capacity, enabling us to deliver value more efficiently. Setting up SOCI was really easy. We opted to use the quick-start AWS Partner’s solution with which we could leave our build and deployment pipelines untouched.”, Mathew Hall, Head of Site Reliability Engineering, Virtuoso.
Things to Know
Availability — AWS Fargate support for SOCI is available in all AWS Regions where Amazon ECS, AWS Fargate, and Amazon ECR are available.
Pricing — AWS Fargate support for SOCI is available at no additional cost and you will only be charged for storing the SOCI indexes in Amazon ECR.
Get Started — Learn more about benefits and how to get started on the AWS Fargate Support for SOCI page.
Happy building.
— Donnie
Last month, Microsoft announced that it would continue its put-ChatGPT-in-everything adventure with a new Windows 11 feature called Copilot. The company added generative AI to Edge and to the Bing-powered taskbar Search field months ago, but Copilot promises to be the most visible and hard-to-ignore version of Microsoft's big AI push in its most visible and hard-to-ignore product.
This week's Windows Insider Preview build for Dev channel users, build 23493, will be the first to enable Copilot for public testers. After installing the update, preview users can press Windows + C to open a Copilot column on the right side of the screen. It will use the same Microsoft account you use for the rest of the OS (it's unclear whether it will work without a Microsoft account, though, to date, the preview has required sign-up and sign-in). And like the other Bing Chat implementations, it has three different "conversation style" settings that either try to rein the chatbot in and keep its answers straightforward and factual or allow it to get "more creative" but more prone to confabulations.
In addition to chatting, Copilot will also support creating AI images using OpenAI's DALL-E 2 model, the same technology used for the Bing Image Creator. Some features announced last month, including third-party plugin support, aren't included in this initial preview, and later versions will also be able to adjust a wider range of Windows settings.
]]>Did you read the news about the Windows XP activation algorithm getting cracked and suddenly get nostalgic for the blue skies and bluer taskbar of that old Windows release? Or maybe you just like attractive, high-resolution desktop wallpapers and you want to make a change? It turns out that Microsoft's design team has rendered an updated 4K version of the default Windows XP wallpaper—you might know it by its name, "Bliss."
It's one of several retro-themed wallpapers on this Microsoft Design site, including photorealistic renderings of Solitaire, Paint, and (of course) Clippy. The site has been around for a while and hasn't been updated since December 2022, but Windows engineer Jennifer Gentleman tweeted about it yesterday—it's new to me and maybe to you, too. The most recent wallpapers appear to be products of Microsoft's Design Week event.
Among others, the Microsoft Design site also hosts the default wallpapers that have come with several Surface PCs, quite a few Pride Month-themed wallpaper designs, and several images focused on the company's recent emoji redesigns and the icons for the Microsoft 365 apps.
]]>CI/CD servers are high-value targets for attackers because of their central role in critical development processes. They provide access to source code, a valuable asset for software companies, and can deploy code to production environments, creating serious risks if not adequately secured. Even a single vulnerability can enable attackers to compromise the supply chain, inject malware, and seize control of systems.
According to “The State of Software Supply Chain Security 2023”, this has led to a rise in supply chain attacks since 2020, and 57% of organizations have suffered security incidents related to DevOps toolchain exposures.
To avoid data breaches and business disruptions, securing CI/CD servers should be a top priority. Furthermore, Google’s “2022 Accelerate State of DevOps Report” suggests that implementing proper security controls can have a positive impact on software delivery performance.
In this whitepaper, we present 9 effective ways to prevent a supply chain attack on your CI/CD server, providing practical guidance and best practices to help you strengthen security and protect critical development processes.
By implementing these strategies, you can minimize the risk of a supply chain attack and ensure the integrity, availability, and confidentiality of your software supply chain.
]]>Ivory, Tapbots’ Mastodon client, is now available on the Mac, and like its iOS and iPadOS counterparts that Federico reviewed in January, Ivory for Mac is every bit as polished.
A lot has changed since Ivory was released on the iPhone and iPad. At the time, there were hardly any native Mastodon apps for the Mac, so I was using Elk in a pinned Safari tab. That’s changed. There are several excellent native apps now, including Mona, which I reviewed earlier this month. What Ivory brings to the growing field of native apps is what we saw with iOS and iPadOS: impeccable taste and snappy performance that few other apps can match.
By now, most MacStories readers are probably familiar with the table stakes features for Mastodon clients. Ivory ticks all of those boxes. Also, if you’ve already tried Ivory for iOS or iPadOS, you’ve got a big head start on the Mac app because they’re very similar. However, if you’re new to Ivory, I encourage you to check out Federico’s review of Ivory for the iPhone and iPad because I’m not going to cover that same ground again. Instead, I want to focus on the Mac version’s unique features and the details that make it such a compelling choice for Mac users.
Opening Ivory as a single, narrow column closely resembles the iPhone version of the app.
The most apparent difference is Ivory for Mac’s multi-column layout. Set to a single, narrow column, Ivory for Mac resembles its iPhone sibling but with the app’s tabs on the left edge of the window. However, in the bottom left corner of the window is the button for expanding the number of Ivory’s columns. You’ll find the same button in the iPad app, where you’re limited to a single additional column. In contrast, the Mac app supports up to six columns.
Ivory’s multi-column design is the most readable of any Mastodon app I’ve used. It’s easy with multiple columns of text and media for a multi-column window to look cluttered, so it’s a testament to Ivory’s design that it’s as readable as it is. One of the touches that helps a lot is that instead of including a tab bar for each column, Ivory uses drop-down menus at the top of each column to allow users to pick what the column shows. That eliminates a lot of duplicative interface elements you find in other apps like Mona.
Ivory for Mac supports up to six columns.
I really appreciate the additional columns I can open on the Mac. When I use Ivory on my iPhone, it’s usually to read my own timeline. However, when I’m at my Mac, I’m usually working and want to keep tabs on the mentions coming into our MacStories accounts. With Ivory, I typically open columns for my own timeline and mentions, plus the mentions for MacStories, Club MacStories, and AppStories. If that becomes too distracting, or it’s a quiet day without a lot of activity, though, I can easily close the columns and focus on just my timeline.
Ivory’s new post indicator is much easier to read than many others I’ve seen.
Another design elements of Ivory for Mac that I love are the indicator that shows how many new messages are available in your timeline. It’s a small touch, but it’s very readable at a glance, especially using the bright yellow accent color that I’ve chosen. Similarly, I appreciate the clear label for private mentions. I’m not a fan of mixing private mentions in the main timeline, but that’s how Mastodon works, so at least Ivory makes those messages stand out as different in kind from other posts.
I love how clearly Ivory labels private messages.
Also, if I mute an account, it only disappears for the account from which it’s muted. Mona also hides muted accounts from columns displaying different accounts. I understand the rationale that if you don’t want to see a post in one timeline, you might not want to see it anywhere. However, that’s not always the case and rarely is for me. It’s more work to mute an account in multiple places, but I like that Ivory leaves the choice up to me instead of assuming I don’t want to see a post in any timeline.
There are two features I’d love to see Ivory add: saved or recent searches and the ability to save the local views of particular Mastodon instances. I often repeat searches on Mastodon, and having a saved or recents list would eliminate the friction of retyping searches. I also follow a couple of app development servers that I’d love to be able to save as a pinned view in Ivory.
Those small items aside, though, Ivory for Mac is every bit as elegantly designed and performant as Federico described in his review of the iPhone and iPad versions of the app. Thinking back to when I first started using Mastodon, it’s incredible how far developers have elevated the options users have. Just a few short months ago, there were hardly any native Mac clients on the Mac. Now there are several, with Ivory right there among the very best.
Ivory is available on the Mac App Store as a free download with many of its features unlocked via a subscription, which is $1.99/month, $14.99/year, or $24.99/year for the iOS, iPadOS, and macOS versions as a bundle.
Founded in 2015, Club MacStories has delivered exclusive content every week for over six years.
In that time, members have enjoyed nearly 400 weekly and monthly newsletters packed with more of your favorite MacStories writing as well as Club-only podcasts, eBooks, discounts on apps, icons, and services. Join today, and you’ll get everything new that we publish every week, plus access to our entire archive of back issues and downloadable perks.
The Club expanded in 2021 with Club MacStories+ and Club Premier. Club MacStories+ members enjoy even more exclusive stories, a vibrant Discord community, a rotating roster of app discounts, and more. And, with Club Premier, you get everything we offer at every Club level plus an extended, ad-free version of our podcast AppStories that is delivered early each week in high-bitrate audio.
Join Now]]>When I searched for the best Mac email clients for Gmail/Google Apps users in September, I was surprised to find that there was an app built specifically for this purpose. You didn't need to customize it, change its settings, or bolt on a bunch of extensions to make it work and feel right; Mimestream was both deeply hooked into Gmail and very much a Mac app.
Mimestream spent more than three years in a free beta period, releasing more than 220 updates for 167,000 users and adding more than 100 features. Now that a 1.0 release is out—and the company has grown from a solo developer to a five-person team—there's a price for the product.
Mimestream is $30 per year if you buy during this launch period, then $50 per year after that (if you were a beta user, check your inbox for a bigger discount code). There's still a 14-day, no-credit-card-required trial period. Individual users can install it on up to five devices, and there's Family Sharing across iCloud accounts.
]]>To read Etrian Odyssey Origins Collection New Trailer Showcases Difficulty Options, Auto Mapping, Improved Graphics, and More in full, please visit The Mako Reactor. Thank you.
]]>The Workers Browser Rendering API allows developers to programmatically control and interact with a headless browser instance and create automation flows for their applications and products.
Since the private beta announcement, based on the feedback we've been receiving and our own roadmap, the team has been working on the developer experience and improving the platform architecture for the best possible performance and reliability. Today we enter the open beta and will start onboarding the customers on the wait list.
Starting today, Wrangler, our command-line tool for configuring, building, and deploying applications with Cloudflare developer products, has support for the Browser Rendering API bindings.
You can install Wrangler Beta using npm:
npm install wrangler --save-dev
Bindings allow your Workers to interact with resources on the Cloudflare developer platform. In this case, they will provide your Worker script with an authenticated endpoint to interact with a dedicated Chromium browser instance.
This is all you need in your wrangler.toml
once this service is enabled for your account:
browser = { binding = "MYBROWSER", type = "browser" }
Now you can deploy any Worker script that requires Browser Rendering capabilities. You can spawn Chromium instances and interact with them programmatically in any way you typically do manually behind your browser.
Under the hood, the Browser Rendering API gives you access to a WebSocket endpoint that speaks the DevTools Protocol. DevTools is what allows us to instrument a Chromium instance running in our global network, and it's the same protocol that Chrome uses on your computer when you inspect a page.
With enough dedication, you can, in fact, implement your own DevTools client and talk the protocol directly. But that'd be crazy; almost no one does that.
So…
Puppeteer is one of the most popular libraries that abstract the lower-level DevTools protocol from developers and provides a high-level API that you can use to easily instrument Chrome/Chromium and automate browsing sessions. It's widely used for things like creating screenshots, crawling pages, and testing web applications.
Puppeteer typically connects to a local Chrome or Chromium browser using the DevTools port.
We forked a version of Puppeteer and patched it to connect to the Workers Browser Rendering API instead. The changes are minimal; after connecting the developers can then use the full Puppeteer API as they would on a standard setup.
Our version is open sourced here, and the npm can be installed from npmjs as @cloudflare/puppeteer. Using it from a Worker is as easy as:
import puppeteer from "@cloudflare/puppeteer";
And then all it takes to launch a browser from your script is:
const browser = await puppeteer.launch(env.MYBROWSER);
In the long term, we will update Puppeteer to keep matching the version of our Chromium instances infrastructure running in our network.
Following the tradition with other Developer products, we created a dedicated section for the Browser Rendering APIs in our Developer's Documentation site.
You can access this page to learn more about how the service works, Wrangler support, APIs, and limits, and find examples of starter templates for common applications.
Taking screenshots from web pages is one of the typical cases for browser automation.
Let's create a Worker that uses the Browser Rendering API to do just that. This is a perfect example of how to set up everything and get an application running in minutes, it will give you a good overview of the steps involved and the basics of the Puppeteer API, and then you can move from here to other more sophisticated use-cases.
Step one, start a project, install Wrangler and Cloudflare’s fork of Puppeteer:
npm init -f
npm install wrangler -save-dev
npm install @cloudflare/puppeteer -save-dev
Step two, let’s create the simplest possible wrangler.toml configuration file with the Browser Rendering API binding:
name = "browser-worker"
main = "src/index.ts"
compatibility_date = "2023-03-14"
node_compat = true
workers_dev = true
browser = { binding = "MYBROWSER", type = "browser" }
Step three, create src/index.ts with your Worker code:
import puppeteer from "@cloudflare/puppeteer";
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const { searchParams } = new URL(request.url);
let url = searchParams.get("url");
let img: Buffer;
if (url) {
const browser = await puppeteer.launch(env.MYBROWSER);
const page = await browser.newPage();
await page.goto(url);
img = (await page.screenshot()) as Buffer;
await browser.close();
return new Response(img, {
headers: {
"content-type": "image/jpeg",
},
});
} else {
return new Response(
"Please add the ?url=https://example.com/ parameter"
);
}
},
};
That's it, no more steps. This Worker instantiates a browser using Puppeteer, opens a new page, navigates to whatever you put in the "url" parameter, takes a screenshot of the page, closes the browser, and responds with the JPEG image of the screenshot. It can't get any easier to get started with the Browser Rendering API.
Run npx wrangler dev –remote
to test it and npx wrangler publish
when you’re done.
You can explore the entire Puppeteer API and implement other functionality and logic from here. And, because it's Workers, you can add other developer products to your code. You might need a relational database, or a KV store to cache your screenshots, or an R2 bucket to archive your crawled pages and assets, or maybe use a Durable Object to keep your browser instance alive and share it with multiple requests, or queues to handle your jobs asynchronous, we have all of this and more.
You can also find this and other examples of how to use Browser Rendering in the Developer Documentation.
Dogfooding our products is one of the best ways to test and improve them, and in some cases, our internal needs dictate or influence our roadmap. Workers Browser Rendering is a good example of that; it was born out of our necessities before we realized it could be a product. We've been using it extensively for things like taking screenshots of pages for social sharing or dashboards, testing web software in CI, or gathering page load performance metrics of our applications.
But there's one product we've been using to stress test and push the limits of the Browser Rendering API and drive the engineering sprints that brought us to open the beta to our customers today: The Cloudflare Radar URL Scanner.
The URL Scanner scans any URL and compiles a full report containing technical, performance, privacy, and security details about that page. It's processing thousands of scans per day currently. It was built on top of Workers and uses a combination of the Browser Rendering APIs with Puppeteer to create enriched HAR archives and page screenshots, Durable Objects to reuse browser instances, Queues to handle customers' load and execute jobs asynchronously, and R2 to store the final reports.
This tool will soon have its own "how we built it" blog. Still, we wanted to let you know about it now because it is a good example of how you can build sophisticated applications using Browser Rendering APIs at scale starting today.
The team will keep improving the Browser Rendering API, but a few things are worth mentioning today.
First, we are looking into upstreaming the changes in our Puppeteer fork to the main project so that using the official library with the Cloudflare Workers Browser Rendering API becomes as easy as a configuration option.
Second, one of the reasons why we decided to expose the DevTools protocol bare naked in the Worker binding is so that it can support other browser instrumentalization libraries in the future. Playwright is a good example of another popular library that developers want to use.
And last, we are also keeping an eye on and testing WebDriver BiDi, a "new standard browser automation protocol that bridges the gap between the WebDriver Classic and CDP (DevTools) protocols." Click here to know more about the status of WebDriver BiDi.
The Workers Browser Rendering API enters open beta today. We will gradually be enabling the customers in the wait list in batches and sending them emails. We look forward to seeing what you will be building with it and want to hear from you.
As usual, you can talk to us on our Developers Discord or the Community forum; the team will be listening.
This post is also available in 简体中文, 日本語, Deutsch, Français and Español.
We’re excited to announce Secrets Store - Cloudflare’s new secrets management offering!
A secrets store does exactly what the name implies - it stores secrets. Secrets are variables that are used by developers that contain sensitive information - information that only authorized users and systems should have access to.
If you’re building an application, there are various types of secrets that you need to manage. Every system should be designed to have identity & authentication data that verifies some form of identity in order to grant access to a system or application. One example of this is API tokens for making read and write requests to a database. Failure to store these tokens securely could lead to unauthorized access of information - intentional or accidental.
The stakes with secret’s management are high. Every gap in the storage of these values has potential to lead to a data leak or compromise. A security administrator’s worst nightmare.
Developers are primarily focused on creating applications, they want to build quickly, they want their system to be performant, and they want it to scale. For them, secrets management is about ease of use, performance, and reliability. On the other hand, security administrators are tasked with ensuring that these secrets remain secure. It’s their responsibility to safeguard sensitive information, ensure that security best practices are met, and to manage any fallout of an incident such as a data leak or breach. It’s their job to verify that developers at their company are building in a secure and foolproof manner.
In order for developers to build at high velocity and for security administrators to feel at ease, companies need to adopt a highly reliable and secure secrets manager. This should be a system that ensures that sensitive information is stored with the highest security measures, while maintaining ease of use that will allow engineering teams to efficiently build.
Cloudflare’s mission is to help build a better Internet - that means a more secure Internet. We recognize our customers’ need for a secure, centralized repository for storing sensitive data. Within the Cloudflare ecosystem, are various places where customers need to store and access API and authorization tokens, shared secrets, and sensitive information. It’s our job to make it easy for customers to manage these values securely.
The need for secrets management goes beyond Cloudflare. Customers have sensitive data that they manage everywhere - at their cloud provider, on their own infrastructure, across machines. Our plan is to make our Secrets Store a one-stop shop for all of our customer’s secrets.
In 2020, we launched environment variables and secrets for Cloudflare Workers, allowing customers to create and encrypt variables across their Worker scripts. By doing this, developers can obfuscate the value of a variable so that it’s no longer available in plaintext and can only be accessed by the Worker.
Adoption and use of these secrets is quickly growing. We now have more than three million Workers scripts that reference variables and secrets managed through Cloudflare. One piece of feedback that we continue to hear from customers is that these secrets are scoped too narrowly.
Today, customers can only use a variable or secret within the Worker that it’s associated with. Instead, customers have secrets that they share across Workers. They don’t want to re-create those secrets and focus their time on keeping them in sync. They want account level secrets that are managed in one place but are referenced across multiple Workers scripts and functions.
Outside of Workers, there are many use cases for secrets across Cloudflare services.
Inside our Web Application Firewall (WAF), customers can make rules that look for authorization headers in order to grant or deny access to requests. Today, when customers create these rules, they put the authorization header value in plaintext, so that anyone with WAF access in the Cloudflare account can see its value. What we’ve heard from our customers is that even internally, engineers should not have access to this type of information. Instead, what our customers want is one place to manage the value of this header or token, so that only authorized users can see, create, and rotate this value. Then when creating a WAF rule, engineers can just reference the associated secret e.g.“account.mysecretauth”. By doing this, we help our customers secure their system by reducing the access scope and enhance management of this value by keeping it updated in one place.
With new Cloudflare products and features quickly developing, we’re hearing more and more use cases for a centralized secrets manager. One that can be used to store Access Service tokens or shared secrets for Webhooks.
With the new account level Secrets Store, we’re excited to give customers the tools they need to manage secrets across Cloudflare services.
To have a secrets store, there are a number of measures that need to be in place, and we’re committing to providing these for our customers.
First, we’re going to give the tools that our customers need to restrict access to secrets. We will have scope permissions that will allow admins to choose which users can view, create, edit, or remove secrets. We also plan to add the same level of granularity to our services - giving customers the ability to say “only allow this Worker to access this secret and only allow this set of Firewall rules to access that secret”.
Next, we’re going to give our customers extensive audits that will allow them to track the access and use of their secrets. Audit logs are crucial for security administrators. They can be used to alert team members that a secret was used by an unauthorized service or that a compromised secret is being accessed when it shouldn’t be. We will give customers audit logs for every secret-related event, so that customers can see exactly who is making changes to secrets and which services are accessing and when.
In addition to the built-in security of the Secrets Store, we’re going to give customers the tools to rotate their encryption keys on-demand or at a cadence that fits the right security posture for them.
We’re excited to get the Secrets Store in our customer’s hands. If you’re interested in using this, please fill out this form, and we’ll reach out to you when it’s ready to use.
By now, you’ve likely heard that passwordless Google accounts have finally arrived. The replacement for passwords is known as "passkeys."
There are many misconceptions about passkeys, both in terms of their usability and the security and privacy benefits they offer compared with current authentication methods. That’s not surprising, given that passwords have been in use for the past 60 years, and passkeys are so new. The long and short of it is that with a few minutes of training, passkeys are easier to use than passwords, and in a matter of months—once a dozen or so industry partners finish rolling out the remaining pieces—using passkeys will be easier still. Passkeys are also vastly more secure and privacy-preserving than passwords, for reasons I'll explain later.
This article provides a primer to get people started with Google's implementation of passkeys and explains the technical underpinnings that make them a much easier and more effective way to protect against account takeovers. A handful of smaller sites—specifically, PayPal, Instacart, Best Buy, Kayak, Robinhood, Shop Pay, and Cardpointers—have rolled out various options for logging in with passkeys, but those choices are more proofs of concept than working solutions. Google is the first major online service to make passkeys available, and its offering is refined and comprehensive enough that I’m recommending people turn them on today.
]]>nix has a reputation for being confusing (it has its whole own programming language!), so I’ve been trying to figure out how to use nix in a way that’s as simple as possible and does not involve managing any configuration files or learning a new programming language. Here’s what I’ve figured out so far! We’ll talk about how to:
As usual I’ve probably gotten some stuff wrong in this post since I’m still pretty new to nix. I’m also still not sure how much I like nix – it’s very confusing! But it’s helped me compile some software that I was struggling to compile otherwise, and in general it seems to install things faster than homebrew.
People often describe nix as “declarative package management”. I don’t care that much about declarative package management, so here are two things that I appreciate about nix:
I think that the reason nix is good at compiling software is that:
/nix/store/4ykq0lpvmskdlhrvz1j3kwslgc6c7pnv-nodejs-16.17.1
and one at /nix/store/5y4bd2r99zhdbir95w5pf51bwfg37bwa-nodejs-18.9.1
.LD_LIBRARY_PATH
!I’ll give a couple of examples later in this post of two times nix made it easier for me to compile software.
here’s how I got started with nix:
~/.nix-profile/bin
on my PATHnix-env -iA nixpkgs.NAME
Basically the idea is to treat nix-env -iA
like brew install
or apt-get install
.
For example, if I want to install fish
, I can do that like this:
nix-env -iA nixpkgs.fish
This seems to just download some binaries from cache.nixos.org – pretty simple.
Some people use nix to install their Node and Python and Ruby packages, but I haven’t
been doing that – I just use npm install
and pip install
the same way I
always have.
There are a bunch of nix features/tools that I’m not using, but that I’ll mention. I originally thought that you had to use these features to use nix, because most of the nix tutorials I’ve read talk about them. But you don’t have to use them.
I won’t go into these because I haven’t really used them and there are lots of explanations out there.
I think packages in the main nix package repository are defined in github.com/NixOS/nixpkgs/
It looks like you can search for packages at search.nixos.org/packages. The two official ways to search packages seem to be:
nix-env -qaP NAME
, which is very extremely slow and which I haven’t been able to get to actually worknix --extra-experimental-features 'nix-command flakes' search nixpkgs NAME
, which does seem to work but is kind of a mouthful. Also all of the packages it prints out start with legacyPackages
for some reasonI found a way to search nix packages from the command line that I liked better:
nix-env -qa '*' > nix-packages.txt
to get a list of every package in the Nix repositorynix-search
script that just greps packages.txt
(cat ~/bin/nix-packages.txt | awk '{print $1}' | rg "$1"
)One of nix’s major design choices is that there isn’t one single bin
with all
your packages, instead you use symlinks. There are a lot of layers of symlinks. A few examples of symlinks:
~/.nix-profile
on my machine is (indirectly) a symlink to /nix/var/nix/profiles/per-user/bork/profile-111-link/
~/.nix-profile/bin/fish
is a symlink to /nix/store/afkwn6k8p8g97jiqgx9nd26503s35mgi-fish-3.5.1/bin/fish
When I install something, it creates a new profile-112-link
directory with new symlinks and updates my ~/.nix-profile
to point to that directory.
I think this means that if I install a new version of fish
and I don’t like it, I can
easily go back just by running nix-env --rollback
– it’ll move me to my previous profile directory.
If I uninstall a nix package like this, it doesn’t actually free any hard drive space, it just removes the symlinks.
$ nix-env --uninstall oil
I’m still not sure how to actually delete the package – I ran a garbage collection like this, which seemed to delete some things:
$ nix-collect-garbage
...
85 store paths deleted, 74.90 MiB freed
But I still have oil
on my system at /nix/store/8pjnk6jr54z77jiq5g2dbx8887dnxbda-oil-0.14.0
.
There’s a more aggressive version of nix-collect-garbage
that also deletes old versions of your profiles (so that you can’t rollback)
$ nix-collect-garbage -d --delete-old
That doesn’t delete /nix/store/8pjnk6jr54z77jiq5g2dbx8887dnxbda-oil-0.14.0
either though and I’m not sure why.
It looks like you can upgrade nix packages like this:
nix-channel --update
nix-env --upgrade
(similar to apt-get update && apt-get upgrade
)
I haven’t really upgraded anything yet. I think that if something goes wrong with an upgrade, you can roll back (because everything is immutable in nix!) with
nix-env --rollback
Someone linked me to this post from Ian Henry that
talks about some confusing problems with nix-env --upgrade
– maybe it
doesn’t work the way you’d expect? I guess I’ll be wary around upgrades.
paperjam
After a few months of installing existing packages, I wanted to make a custom package with nix for a program called paperjam that wasn’t already packaged.
I was actually struggling to compile paperjam
at all even without nix because the version I had
of libiconv
I has on my system was wrong. I thought it might be easier to
compile it with nix even though I didn’t know how to make nix packages yet. And
it actually was!
But figuring out how to get there was VERY confusing, so here are some notes about how I did it.
Before I started working on my paperjam
package, I wanted to build an example existing package just to
make sure I understood the process for building a package. I was really
struggling to figure out how to do this, but I asked in Discord and someone
explained to me how I could get a working package from github.com/NixOS/nixpkgs/ and build it. So here
are those instructions:
step 1: Download some arbitrary package from nixpkgs on github, for example the dash
package:
wget https://raw.githubusercontent.com/NixOS/nixpkgs/47993510dcb7713a29591517cb6ce682cc40f0ca/pkgs/shells/dash/default.nix -O dash.nix
step 2: Replace the first statement ({ lib , stdenv , buildPackages , autoreconfHook , pkg-config , fetchurl , fetchpatch , libedit , runCommand , dash }:
with with import <nixpkgs> {};
I don’t know why you have to do this,
but it works.
step 3: Run nix-build dash.nix
This compiles the package
step 4: Run nix-env -i -f dash.nix
This installs the package into my ~/.nix-profile
That’s all! Once I’d done that, I felt like I could modify the dash
package and make my own package.
paperjam
has one dependency (libpaper
) that also isn’t packaged yet, so I needed to build libpaper
first.
Here’s libpaper.nix
. I basically just wrote this by copying and pasting from
other packages in the nixpkgs repository.
My guess is what’s happening here is that nix has some default rules for
compiling C packages (like “run make install
”), so the make install
happens
default and I don’t need to configure it explicitly.
with import <nixpkgs> {};
stdenv.mkDerivation rec {
pname = "libpaper";
version = "0.1";
src = fetchFromGitHub {
owner = "naota";
repo = "libpaper";
rev = "51ca11ec543f2828672d15e4e77b92619b497ccd";
hash = "sha256-S1pzVQ/ceNsx0vGmzdDWw2TjPVLiRgzR4edFblWsekY=";
};
buildInputs = [ ];
meta = with lib; {
homepage = "https://github.com/naota/libpaper";
description = "libpaper";
platforms = platforms.unix;
license = with licenses; [ bsd3 gpl2 ];
};
}
Basically this just tells nix how to download the source from GitHub.
I built this by running nix-build libpaper.nix
Next, I needed to compile paperjam
. Here’s a link to the nix package I wrote. The main things I needed to do other than telling it where to download the source were:
asciidoc
)installFlags = [ "PREFIX=$(out)" ];
) so that it installed in the correct directory instead of /usr/local/bin
.I set the hashes by first leaving the hash empty, then running nix-build
to get an error message complaining about a mismatched hash. Then I copied the correct hash out of the error message.
I figured out how to set installFlags
just by running rg PREFIX
in the nixpkgs repository – I figured that needing to set a PREFIX
was
pretty common and someone had probably done it before, and I was right. So I
just copied and pasted that line from another package.
Then I ran:
nix-build paperjam.nix
nix-env -i -f paperjam.nix
and then everything worked and I had paperjam
installed! Hooray!
hugo
Right now I build this blog using Hugo 0.40, from 2018. I don’t need any new features so I haven’t felt a need to upgrade. On Linux this is easy: Hugo’s releases are a static binary, so I can just download the 5-year-old binary from the releases page and run it. Easy!
But on this Mac I ran into some complications. Mac hardware has changed in the
last 5 years, so the Mac Hugo binary I downloaded crashed. And when I tried to
build it from source with go build
, that didn’t work either because Go build
norms have changed in the last 5 years as well.
I was working around this by running Hugo in a Linux docker container, but I didn’t love that: it was kind of slow and it felt silly. It shouldn’t be that hard to compile one Go program!
Nix to the rescue! Here’s what I did to install the old version of Hugo with nix.
I wanted to install Hugo 0.40 and put it in my PATH as hugo-0.40
. Here’s how
I did it. I did this in a kind of weird way, but it worked (Searching and installing old versions of Nix packages
describes a probably more normal method).
step 1: Search through the nixpkgs repo to find Hugo 0.40
I found the .nix
file here github.com/NixOS/nixpkgs/blob/17b2ef2/p…
step 2: Download that file and build it
I downloaded that file (and another file called deps.nix
in the same directory), replaced the first line with with import <nixpkgs> {};
, and built it with nix-build hugo.nix
.
That almost worked without any changes, but I had to make two changes:
with stdenv.lib
to with lib
for some reason.hugo040
so that it wouldn’t conflict with the other version of hugo
that I had installedstep 3: Rename hugo
to hugo-0.40
I write a little post install script to rename the Hugo binary.
postInstall = ''
mv $out/bin/hugo $out/bin/hugo-0.40
'';
I figured out how to run this by running rg 'mv '
in the nixpkgs repository and just copying and modifying something that seemed related.
step 4: Install it
I installed into my ~/.nix-profile/bin
by running nix-env -i -f hugo.nix
.
And it all works! I put the final .nix
file into my own personal nixpkgs repo so that I can use it again later if I
want.
I think it’s worth noting here that this hugo.nix
file isn’t magic – the
reason I can easily compile Hugo 0.40 today is that many people worked for a long time to make it possible to
package that version of Hugo in a reproducible way.
Installing paperjam
and this 5-year-old version of Hugo were both
surprisingly painless and actually much easier than compiling it without nix,
because nix made it much easier for me to compile the paperjam
package with
the right version of libiconv
, and because someone 5 years ago had already
gone to the trouble of listing out the exact dependencies for Hugo.
I don’t have any plans to get much more complicated with nix (and it’s still very possible I’ll get frustrated with it and go back to homebrew!), but we’ll see what happens! I’ve found it much easier to start in a simple way and then start using more features if I feel the need instead of adopting a whole bunch of complicated stuff all at once.
I probably won’t use nix on Linux – I’ve always been happy enough with apt
(on Debian-based distros) and pacman
(on Arch-based distros), and they’re
much less confusing. But on a Mac it seems like it might be worth it. We’ll
see! It’s very possible in 3 months I’ll get frustrated with nix and just go back to homebrew.
Update from 5 months in: nix is still going well, and I’ve only run into 1
problem, which is that every nix-env -iA
package installation started failing
with the error “bad meta.outputsToInstall”.
This script from Ross Light fixes that problem though. It lists every derivation installed in my current profile and creates a new profile with the exact same derivations. This feels like a nix bug (surely creating a new profile with the exact same derivations should be a no-op?) but I haven’t looked into it more yet.
]]>OnePlus is finally ready to detail its first mechanical keyboard. No, we didn't need another company to start making mechanical keyboards. But if you're looking for a new Bluetooth keyboard that plays particularly well with Macs, has a compact layout, and a rotary knob that looks stylish and functional, OnePlus will have one more choice for you come April.
Announced today, OnePlus is jumping into the mechanical keyboard race with a strange name, the Featuring Keyboard 81 Pro. The "81" refers to the key count, while "Pro" is assumably meant to make workers and power users think the keyboard's a good fit; but the name doesn't quite roll off the tongue. The outlier here is the "Featuring" bit, which refers to the OnePlus Featuring "co-creation" platform that builds products based off user feedback. Community users are said to have contributed to the 81 Pro's design, including its proprietary switches. OnePlus' press release today claimed it will release "many" more Featuring products.
Another huge influence on the 81 Pro is keyboard-maker Keychron, which is said to have helped engineer the product. That includes its layout, which matches the layout of the Q1 Pro that Keychron is currently crowdfunding. In addition to macOS, the keyboard is supposed to work with Windows, Linux, and Android, OnePlus' press release said. The keyboard's product page also claims support with iOS. Similar to some wireless Keychron keyboards, like the Keychron K14, there's a toggle on the keyboard's side for switching from Mac to Windows. Considering the lack of USB-A ports among Macs, the Bluetooth 5.1 keyboard charges over a USB-C to USB-C cable (there's also a USB-C to USB-A adapter).
]]>Over the years when Cloudflare has had an outage that affected our customers we have very quickly blogged about what happened, why, and what we are doing to address the causes of the outage. Today’s post is a little different. It’s about a single customer’s website not working correctly because of incorrect action taken by Cloudflare.
Although the customer was not in any way banned from Cloudflare, or lost access to their account, their website didn’t work. And it didn’t work because Cloudflare applied a bandwidth throttle between us and their origin server. The effect was that the website was unusable.
Because of this unusual throttle there was some internal confusion for our customer support team about what had happened. They, incorrectly, believed that the customer had been limited because of a breach of section 2.8 of our Self-Serve Subscription Agreement which prohibits use of our self-service CDN to serve excessive non-HTML content, such as images and video, without a paid plan that includes those services (this is, for example, designed to prevent someone building an image-hosting service on Cloudflare and consuming a huge amount of bandwidth; for that sort of use case we have paid image and video plans).
However, this customer wasn’t breaking section 2.8, and they were both a paying customer and a paying customer of Cloudflare Workers through which the throttled traffic was passing. This throttle should not have happened. In addition, there is and was no need for the customer to upgrade to some other plan level.
This incident has set off a number of workstreams inside Cloudflare to ensure better communication between teams, prevent such an incident happening, and to ensure that communications between Cloudflare and our customers are much clearer.
Before we explain our own mistake and how it came to be, we’d like to apologize to the customer. We realize the serious impact this had, and how we fell short of expectations. In this blog post, we want to explain what happened, and more importantly what we’re going to change to make sure it does not happen again.
On February 2, an on-call network engineer received an alert for a congesting interface with Equinix IX in our Ashburn data center. While this is not an unusual alert, this one stood out for two reasons. First, it was the second day in a row that it happened, and second, the congestion was due to a sudden and extreme spike of traffic.
The engineer in charge identified the customer’s domain, tardis.dev, as being responsible for this sudden spike of traffic between Cloudflare and their origin network, a storage provider. Because this congestion happens on a physical interface connected to external peers, there was an immediate impact to many of our customers and peers. A port congestion like this one typically incurs packet loss, slow throughput and higher than usual latency. While we have automatic mitigation in place for congesting interfaces, in this case the mitigation was unable to resolve the impact completely.
The traffic from this customer went suddenly from an average of 1,500 requests per second, and a 0.5 MB payload per request, to 3,000 requests per second (2x) and more than 12 MB payload per request (25x).
The congestion happened between Cloudflare and the origin network. Caching did not happen because the requests were all unique URLs going to the origin, and therefore we had no ability to serve from cache.
A Cloudflare engineer decided to apply a throttling mechanism to prevent the zone from pulling so much traffic from their origin. Let's be very clear on this action: Cloudflare does not have an established process to throttle customers that consume large amounts of bandwidth, and does not intend to have one. This remediation was a mistake, it was not sanctioned, and we deeply regret it.
We lifted the throttle through internal escalation 12 hours and 53 minutes after having set it up.
To make sure a similar incident does not happen, we are establishing clear rules to mitigate issues like this one. Any action taken against a customer domain, paying or not, will require multiple levels of approval and clear communication to the customer. Our tooling will be improved to reflect this. We have many ways of traffic shaping in situations where a huge spike of traffic affects a link and could have applied a different mitigation in this instance.
We are in the process of rewriting our terms of service to better reflect the type of services that our customers deliver on our platform today. We are also committed to explaining to our users in plain language what is permitted under self-service plans. As a developer-first company with transparency as one of its core principles, we know we can do better here. We will follow up with a blog post dedicated to these changes later.
Once again, we apologize to the customer for this action and for the confusion it created for other Cloudflare customers.
]]>So you’ve played, or re-played, the epic fantasy saga that is The Witcher 3: Wild Hunt and its two critically acclaimed expansions, Hearts of Stone and Blood and Wine. Now you’re on the hunt for your next fantasy epic, but which to choose?
]]>Millions of users rely on Cloudflare WARP to connect to the Internet through Cloudflare’s network. Individuals download the mobile or desktop application and rely on the Wireguard-based tunnel to make their browser faster and more private. Thousands of enterprises trust Cloudflare WARP to connect employees to our Secure Web Gateway and other Zero Trust services as they navigate the Internet.
We’ve heard from both groups of users that they also want to connect to other devices running WARP. Teams can build a private network on Cloudflare’s network today by connecting WARP on one side to a Cloudflare Tunnel, GRE tunnels, or IPSec tunnels on the other end. However, what if both devices already run WARP?
Starting today, we’re excited to make it even easier to build a network on Cloudflare with the launch of WARP-to-WARP connectivity. With a single click, any device running WARP in your organization can reach any other device running WARP. Developers can connect to a teammate's machine to test a web server. Administrators can reach employee devices to troubleshoot issues. The feature works with our existing private network on-ramps, like the tunnel options listed above. All with Zero Trust rules built in.
To get started, sign-up to receive early access to our closed beta. If you’re interested in learning more about how it works and what else we will be launching in the future, keep scrolling.
We understand that adopting a Zero Trust architecture can feel overwhelming at times. With Cloudflare One, our mission is to make Zero Trust prescriptive and approachable regardless of where you are on your journey today. To help users navigate the uncertain, we created resources like our vendor-agnostic Zero Trust Roadmap which lays out a battle-tested path to Zero Trust. Within our own products and services, we’ve launched a number of features to bridge the gap between the networks you manage today and the network you hope to build for your organization in the future.
Ultimately, our goal is to enable you to overlay your network on Cloudflare however you want, whether that be with existing hardware in the field, a carrier you already partner with, through existing technology standards like IPsec tunnels, or more Zero Trust approaches like WARP or Tunnel. It shouldn’t matter which method you chose to start with, the point is that you need the flexibility to get started no matter where you are in this journey. We call these connectivity options on-ramps and off-ramps.
The model laid out above allows users to start by defining their specific needs and then customize their deployment by choosing from a set of fully composable on and offramps to connect their users and devices to Cloudflare. This means that customers are able to leverage any of these solutions together to route traffic seamlessly between devices, offices, data centers, cloud environments, and self-hosted or SaaS applications.
One example of a deployment we’ve seen thousands of customers be successful with is what we call WARP-to-Tunnel. In this deployment, the on-ramp Cloudflare WARP ensures end-user traffic reaches Cloudflare’s global network in a secure and performant manner. The off-ramp Cloudflare Tunnel then ensures that, after your Zero Trust rules have been enforced, we have secure, redundant, and reliable paths to land user traffic back in your distributed, private network.
This is a great example of a deployment that is ideal for users that need to support public to private traffic flows (i.e. North-South)
But what happens when you need to support private to private traffic flows (i.e. East-West) within this deployment?
Starting today, devices on-ramping to Cloudflare with WARP will also be able to off-ramp to each other. With this announcement, we’re adding yet another tool to leverage in new or existing deployments that provides users with stronger network fabric to connect users, devices, and autonomous systems.
This means any of your Zero Trust-enrolled devices will be able to securely connect to any other device on your Cloudflare-defined network, regardless of physical location or network configuration. This unlocks the ability for you to address any device running WARP in the exact same way you are able to send traffic to services behind a Cloudflare Tunnel today. Naturally, all of this traffic flows through our in-line Zero Trust services, regardless of how it gets to Cloudflare, and this new connectivity announced today is no exception.
To power all of this, we now track where WARP devices are connected to, in Cloudflare’s global network, the same way we do for Cloudflare Tunnel. Traffic meant for a specific WARP device is relayed across our network, using Argo Smart Routing, and piped through the transport that routes IP packets to the appropriate WARP device. Since this traffic goes through our Zero Trust Secure Web Gateway — allowing various types of filtering — it means we upgrade and downgrade traffic from purely routed IP packets to fully proxied TLS connections (as well as other protocols). In the case of using SSH to remotely access a colleague’s WARP device, this means that your traffic is eligible for SSH command auditing as well.
If you already deployed Cloudflare WARP to your organization, then your IT department will be excited to learn they can use this new connectivity to reach out to any device running Cloudflare WARP. Connecting via SSH, RDP, SMB, or any other service running on the device is now simpler than ever. All of this provides Zero Trust access for the IT team members, with their actions being secured in-line, audited, and pushed to your organization’s logs.
Or, maybe you are done with designing a new function of an existing product and want to let your team members check it out at their own convenience. Sending them a link with your private IP — assigned by Cloudflare — will do the job. Their devices will see your machine as if they were in the same physical network, despite being across the other side of the world.
The usefulness doesn’t end with humans on both sides of the interaction: the weekend has arrived, and you have finally set out to move your local NAS to a host provider where you run a virtual machine. By running Cloudflare WARP on it, similarly to your laptop, you can now access your photos using the virtual machine’s private IP. This was already possible with WARP to Tunnel; but with WARP-to-WARP, you also get connectivity in reverse direction, where you can have the virtual machine periodically rsync/scp files from your laptop as well. This means you can make any server initiate traffic towards the rest of your Zero Trust organization with this new type of connectivity.
This feature will be available on all plans at no additional cost. To get started with this new feature, add your name to the closed beta, and we’ll notify you once you’ve been enrolled. Then, you’ll simply ensure that at least two devices are enrolled in Cloudflare Zero Trust and have the latest version of Cloudflare WARP installed.
This new feature builds upon the existing benefits of Cloudflare Zero Trust, which include enhanced connectivity, improved performance, and streamlined access controls. With the ability to connect to any other device in their deployment, Zero Trust users will be able to take advantage of even more robust security and connectivity options.
To get started in minutes, create a Zero Trust account, download the WARP agent, enroll these devices into your Zero Trust organization, and start creating Zero Trust policies to establish fast, secure connectivity between these devices. That’s it.
]]>In the bright afternoon hours of New Year’s Day 2023, I squint through hungover eyes at my phone screen. My Twitter feed is blowing up about some movie I’ve never heard of before: Strange Days, a ‘90s Kathryn Bigelow sci-fi flick starring Ralph Fiennes, Angela Bassett, and Juliette Lewis.
]]>Henry Cavill may be out as Geralt in the Netflix Witcher series, but we will always have Doug Cockle as Geralt in The Witcher 3: Wild Hunt. And now, thanks to the latest next-gen update to the Witcher 3, we can have the best of both worlds. The new update brings “In The Eternal Fire’s Shadow,” a Witcher-worthy side…
]]>New General Purpose (M6in/M6idn) Instances
The original general purpose EC2 instance (m1.small) was launched in 2006 and was the one and only instance type for a little over a year, until we launched the m1.large and m1.xlarge in late 2007. After that, we added the m3 in 2012, m4 in 2015, and the first in a very long line of m5 instances starting in 2017. The family tree branched in 2018 with the addition of the m5d instances with local NVMe storage.
And that brings us to today, and to the new m6in and m6idn instances, both available in 9 sizes:
Name | vCPUs | Memory | Local Storage (m6idn only) |
Network Bandwidth | EBS Bandwidth | EBS IOPS |
m6in.large m6idn.large |
2 | 8 GiB | 118 GB | Up to 25 Gbps | Up to 20 Gbps | Up to 87,500 |
m6in.xlarge m6idn.xlarge |
4 | 16 GiB | 237 GB | Up to 30 Gbps | Up to 20 Gbps | Up to 87,500 |
m6in.2xlarge m6idn.2xlarge |
8 | 32 GiB | 474 GB | Up to 40 Gbps | Up to 20 Gbps | Up to 87,500 |
m6in.4xlarge m6idn.4xlarge |
16 | 64 GiB | 950 GB | Up to 50 Gbps | Up to 20 Gbps | Up to 87,500 |
m6in.8xlarge m6idn.8xlarge |
32 | 128 GiB | 1900 GB | 50 Gbps | 20 Gbps | 87,500 |
m6in.12xlarge m6idn.12xlarge |
48 | 192 GiB | 2950 GB (2 x 1425) |
75 Gbps | 30 Gbps | 131,250 |
m6in.16xlarge m6idn.16xlarge |
64 | 256 GiB | 3800 GB (2 x 1900) |
100 Gbps | 40 Gbps | 175,000 |
m6in.24xlarge m6idn.24xlarge |
96 | 384 GiB | 5700 GB (4 x 1425) |
150 Gbps | 60 Gbps | 262,500 |
m6in.32xlarge m6idn.32xlarge |
128 | 512 GiB | 7600 GB (4 x 1900) |
200 Gbps | 80 Gbps | 350,000 |
The m6in and m6idn instances are available in the US East (Ohio, N. Virginia) and Europe (Ireland) regions in On-Demand and Spot form. Savings Plans and Reserved Instances are available.
New C6in Instances
Back in 2008 we launched the first in what would prove to be a very long line of Amazon Elastic Compute Cloud (Amazon EC2) instances designed to give you high compute performance and a higher ratio of CPU power to memory than the general purpose instances. Starting with those initial c1 instances, we went on to launch cluster computing instances in 2010 (cc1) and 2011 (cc2), and then (once we got our naming figured out), multiple generations of compute-optimized instances powered by Intel processors: c3 (2013), c4 (2015), and c5 (2016). As our customers put these instances to use in environments where networking performance was starting to become a limiting factor, we introduced c5n instances with 100 Gbps networking in 2018. We also broadened the c5 instance lineup by adding additional sizes (including bare metal), and instances with blazing-fast local NVMe storage.
Today I am happy to announce the latest in our lineup of Intel-powered compute-optimized instances, the c6in, available in 9 sizes:
Name | vCPUs | Memory |
Network Bandwidth | EBS Bandwidth |
EBS IOPS |
c6in.large | 2 | 4 GiB | Up to 25 Gbps | Up to 20 Gbps | Up to 87,500 |
c6in.xlarge | 4 | 8 GiB | Up to 30 Gbps | Up to 20 Gbps | Up to 87,500 |
c6in.2xlarge | 8 | 16 GiB | Up to 40 Gbps | Up to 20 Gbps | Up to 87,500 |
c6in.4xlarge | 16 | 32 GiB | Up to 50 Gbps | Up to 20 Gbps | Up to 87,500 |
c6in.8xlarge | 32 | 64 GiB | 50 Gbps | 20 Gbps | 87,500 |
c6in.12xlarge | 48 | 96 GiB | 75 Gbps | 30 Gbps | 131,250 |
c6in.16xlarge | 64 | 128 GiB | 100 Gbps | 40 Gbps | 175,000 |
c6in.24xlarge | 96 | 192 GiB | 150 Gbps | 60 Gbps | 262,500 |
c6in.32xlarge | 128 | 256 GiB | 200 Gbps | 80 Gbps | 350,000 |
The c6in instances are available in the US East (Ohio, N. Virginia), US West (Oregon), and Europe (Ireland) Regions.
As I noted earlier, these instances are designed to be able to handle up to twice as many packets per second (PPS) as their predecessors. This allows them to deliver increased performance in situations where they need to handle a large number of small-ish network packets, which will accelerate many applications and use cases includes network virtual appliances (firewalls, virtual routers, load balancers, and appliances that detect and protect against DDoS attacks), telecommunications (Voice over IP (VoIP) and 5G communication), build servers, caches, in-memory databases, and gaming hosts. With more network bandwidth and PPS on tap, heavy-duty analytics applications that retrieve and store massive amounts of data and objects from Amazon Amazon Simple Storage Service (Amazon S3) or data lakes will benefit. For workloads that benefit from low latency local storage, the disk versions of the new instances offer twice as much instance storage versus previous generation.
New Memory-Optimized (R6in/R6idn) Instances
The first memory-optimized instance was the m2, launched in 2009 with the now-quaint Double Extra Large and Quadruple Extra Large names, and a higher ration of memory to CPU power than the earlier m1 instances. We had yet to learn our naming lesson and launched the High Memory Cluster Eight Extra Large (aka cr1.8xlarge) in 2013, before settling on the r prefix and launching r3 instances in 2013, followed by r4 instances in 2014, and r5 instances in 2018.
And again that brings us to today, and to the new r6in and r6idn instances, also available in 9 sizes:
Name | vCPUs | Memory | Local Storage (r6idn only) |
Network Bandwidth | EBS Bandwidth | EBS IOPS |
r6in.large r6idn.large |
2 | 16 GiB | 118 GB | Up to 25 Gbps | Up to 20 Gbps | Up to 87,500 |
r6in.xlarge r6idn.xlarge |
4 | 32 GiB | 237 GB | Up to 30 Gbps | Up to 20 Gbps | Up to 87,500 |
r6in.2xlarge r6idn.2xlarge |
8 | 64 GiB | 474 GB | Up to 40 Gbps | Up to 20 Gbps | Up to 87,500 |
r6in.4xlarge r6idn.4xlarge |
16 | 128 GiB | 950 GB | Up to 50 Gbps | Up to 20 Gbps | Up to 87,500 |
r6in.8xlarge r6idn.8xlarge |
32 | 256 GiB | 1900 GB | 50 Gbps | 20 Gbps | 87,500 |
r6in.12xlarge r6idn.12xlarge |
48 | 384 GiB | 2950 GB (2 x 1425) |
75 Gbps | 30 Gbps | 131,250 |
r6in.16xlarge r6idn.16xlarge |
64 | 512 GiB | 3800 GB (2 x 1900) |
100 Gbps | 40 Gbps | 175,000 |
r6in.24xlarge r6idn.24xlarge |
96 | 768 GiB | 5700 GB (4 x 1425) |
150 Gbps | 60 Gbps | 262,500 |
r6in.32xlarge r6idn.32xlarge |
128 | 1024 GiB | 7600 GB (4 x 1900) |
200 Gbps | 80 Gbps | 350,000 |
The r6in and r6idn instances are available in the US East (Ohio, N. Virginia), US West (Oregon), and Europe (Ireland) regions in On-Demand and Spot form. Savings Plans and Reserved Instances are available.
Inside the Instances
As you can probably guess from these specs and from the blog post that I wrote to launch the c6in instances, all of these new instance types have a lot in common. I’ll do a rare cut-and-paste from that post in order to reiterate all of the other cool features that are available to you:
Ice Lake Processors – The 3rd generation Intel Xeon Scalable processors run at 3.5 GHz, and (according to Intel) offer a 1.46x average performance gain over the prior generation. All-core Intel Turbo Boost mode is enabled on all instance sizes up to and including the 12xlarge. On the larger sizes, you can control the C-states. Intel Total Memory Encryption (TME) is enabled, protecting instance memory with a single, transient 128-bit key generated at boot time within the processor.
NUMA – Short for Non-Uniform Memory Access, this important architectural feature gives you the power to optimize for workloads where the majority of requests for a particular block of memory come from one of the processors, and that block is “closer” (architecturally speaking) to one of the processors. You can control processor affinity (and take advantage of NUMA) on the 24xlarge and 32xlarge instances.
Networking – Elastic Network Adapter (ENA) is available on all sizes of m6in, m6idn, c6in, r6in, and r6idn instances, and Elastic Fabric Adapter (EFA) is available on the 32xlarge instances. In order to make use of these adapters, you will need to make sure that your AMI includes the latest NVMe and ENA drivers. You can also make use of Cluster Placement Groups.
io2 Block Express – You can use all types of EBS volumes with these instances, including the io2 Block Express volumes that we launched earlier this year. As Channy shared in his post (Amazon EBS io2 Block Express Volumes with Amazon EC2 R5b Instances Are Now Generally Available), these volumes can be as large as 64 TiB, and can deliver up to 256,000 IOPS. As you can see from the tables above, you can use a 24xlarge or 32xlarge instance to achieve this level of performance.
Choosing the Right Instance
Prior to today’s launch, you could choose a c5n, m5n, or r5n instance to get the highest network bandwidth on an EC2 instance, or an r5b instance to have access to the highest EBS IOPS performance and high EBS bandwidth. Now, customers who need high networking or EBS performance can choose from a full portfolio of instances with different memory to vCPU ratio and instance storage options available, by selecting one of c6in, m6in, m6idn, r6in, or r6idn instances.
The higher performance of the c6in instances will allow you to scale your network intensive workloads that need a low memory to vCPU, such as network virtual appliances, caching servers, and gaming hosts.
The higher performance of m6in instances will allow you to scale your network and/or EBS intensive workloads such as data analytics, and telco applications including 5G User Plane Functions (UPF). You have the option to use the m6idn instance for workloads that benefit from low-latency local storage, such as high-performance file systems, or distributed web-scale in-memory caches.
Similarly, the higher network and EBS performance of the r6in instances will allow you to scale your network-intensive SQL, NoSQL, and in-memory database workloads, with the option to use the r6idn when you need low-latency local storage.
— Jeff;
]]>As always, there’s simply too much for the team to cover and even if a launch doesn’t make this list, that doesn’t mean it’s not noteworthy. Make sure to check out What’s New for a complete rundown of all the AWS re:Invent 2022 announcements.
Here are a few more resources to help you keep up with all the re:Invent news:
Analytics | Business Applications | Artificial Intelligence / Machine Learning | Compute | Containers | Database | Management Tools | Migration & Transfer Services | Security, Identity, & Compliance | Storage |
New — Create and Share Operational Reports at Scale with Amazon QuickSight Paginated Reports
This feature allows customers to create and share highly formatted, personalized reports containing business-critical data to hundreds of thousands of end-users without any infrastructure setup or maintenance, up-front licensing, or long-term commitments.
New Amazon QuickSight API Capabilities to Accelerate Your BI Transformation
New QuickSight API capabilities allow programmatic creation and management of dashboards, analysis, and templates.
New AWS Glue 4.0 – New and Updated Engines, More Data Formats, and More
This version of Glue includes Python 3.10 and Apache Spark 3.3.0, plus native support for the Cloud Shuffle Service Plugin for Spark. It also includes Pandas support, and more.
Announcing AWS Glue for Ray (Preview)
Data engineers can use AWS Glue for Ray to process large datasets with Python and popular Python libraries.
New for Amazon Transcribe – Real-Time Analytics During Live Calls
Real-time call analytics provides APIs for developers to accurately transcribe live calls and at the same time identify customer experience issues and sentiment in real time.
Classifying and Extracting Mortgage Loan Data with Amazon Textract
The new API was created in response to requests from major lenders in the industry to help them process applications faster and reduce errors, which improves the end-customer experience and lowers operating costs.
Amazon CodeWhisperer Adds Enterprise Administrative Controls, Simple Sign-up, and Support for New Languages (Preview)
Administrators can now easily integrate CodeWhisperer with their existing workforce identity solutions, provide access to users and groups, and configure organization-wide settings.
AWS Wickr – A Secure, End-to-End Encrypted Communication Service For Enterprises With Auditing And Regulatory Requirements
Unlike many enterprise communication tools, Wickr uses end-to-end encryption mechanisms to ensure your messages, files, voice, or video calls are solely accessible to their intended recipients.
New – ENA Express: Improved Network Latency and Per-Flow Performance on EC2
Jeff Barr shares how ENA Express gives you a lot more per-flow bandwidth with a lot less variability.
New General Purpose, Compute Optimized, and Memory-Optimized Amazon EC2 Instances with Higher Packet-Processing Performance
The new instance families are designed to support your data-intensive workloads with the highest EBS performance in EC2, and the ability to handle up to twice as many packets per second (PPS) as earlier instances.
New Amazon EC2 Instance Types In the Works – C7gn, R7iz, and Hpc7g
Jeff Barr provides a look at three upcoming and exciting new instance types.
New – Amazon ECS Service Connect Enables Easy Communication Between Microservices
This new capability simplifies building and operating resilient distributed applications. You can add a layer of resilience to your ECS service communication and get traffic insights with no changes to your application code.
Announcing the availability of Microsoft Office Amazon Machine Images (AMIs) on Amazon EC2 with AWS provided licenses
With this offering, customers have the flexibility to run Microsoft Office dependent applications on EC2.
New – AWS Marketplace for Containers Now Supports Direct Deployment to Amazon EKS Clusters
This new launch makes it easier for you to find third-party Kubernetes operation software from the Amazon EKS console and deploy it to your EKS clusters using the same commands used to deploy EKS add-ons.
New – Amazon RDS Optimized Reads and Optimized Writes
These two new features will accelerate your Amazon RDS for MySQL workloads.
New – Fully Managed Blue/Green Deployments in Amazon Aurora and Amazon RDS
This new feature for Amazon Aurora with MySQL compatibility, Amazon RDS for MySQL, and Amazon RDS for MariaDB, enables you to make database updates safer, simpler, and faster.
New – AWS Config Rules Now Support Proactive Compliance
This release extends AWS Config rules to support proactive mode so that they can be run at any time before provisioning and save time spent to implement custom pre-deployment validations.
New for AWS Control Tower – Comprehensive Controls Management (Preview)
You can use the new capability to apply managed preventative, detective, and proactive controls to accounts and organizational units by service, control objective, or compliance framework.
Protect Sensitive Data with Amazon CloudWatch Logs
This new set of capabilities for Amazon CloudWatch Logs leverages pattern matching and machine learning (ML) to detect and protect sensitive log data in transit.
New – Amazon CloudWatch Cross-Account Observability
This new capability lets you search, analyze, and correlate cross-account telemetry data stored in CloudWatch such as metrics, logs, and traces.
New – A Fully Managed Schema Conversion in AWS Database Migration Service
AWS DMS Schema Conversion streamlines database migrations by making schema assessment and conversion available inside AWS DMS. You can now plan, assess, convert and migrate under one central DMS service.
AWS Application Migration Service Major Updates – New Migration Servers Grouping, Updated Launch, and Post-Launch Template
These three major updates will support your migration projects of any size.
Amazon Inspector Now Scans AWS Lambda Functions for Vulnerabilities
Until now, customers who wanted to analyze their mixed workloads (including EC2 instances, container images, and Lambda functions) against common vulnerabilities needed to use AWS and third-party tools.
Automated Data Discovery for Amazon Macie
This new capability allows you to gain visibility into where your sensitive data resides on Amazon Simple Storage Service (Amazon S3) at a fraction of the cost of running a full data inspection across all your S3 buckets.
AWS announces Amazon Verified Permissions (Preview)
This central fine-grained permissions management system simplifies changing and updating permission rules in a single place without needing to change the code.
New – Failover Controls for Amazon S3 Multi-Region Access Points
These controls let you shift S3 data access request traffic routed through an Amazon S3 Multi-Region Access Point to an alternate AWS Region within minutes to test and build highly available applications for business continuity.
New – Announcing Amazon EFS Elastic Throughput
This new throughput mode is designed to provide your applications with as much throughput as they need with pay-as-you-use pricing.
New for AWS Backup – Protect and Restore Your CloudFormation Stacks
You now have an automated solution to create and restore your applications with a simplified experience, eliminating the need to manage custom scripts.
New – Amazon Redshift Support in AWS Backup
AWS Backup allows you to define a central backup policy to manage data protection of your applications and can now also protect your Amazon Redshift clusters.
Announcing Automated in-AWS Failback for AWS Elastic Disaster Recovery
The new automated support provides a simplified and expedited experience to fail back Amazon Elastic Compute Cloud (Amazon EC2) instances to the original Region, and both failover and failback processes (for on-premises or in-AWS recovery) can be conveniently started from the AWS Management Console.
Local development gives you a fully-controllable and easy-to-debug testing environment. At the start of this year, we brought this experience to Workers developers by launching Miniflare 2.0: a local Cloudflare Workers simulator. Miniflare 2 came with features like step-through debugging support, detailed console.log
s, pretty source-mapped error pages, live reload and a highly-configurable unit testing environment. Not only that, but we also incorporated Miniflare into Wrangler, our Workers CLI, to enable wrangler dev
’s --local
mode.
Today, we’re taking local development to the next level! In addition to introducing new support for migrating existing projects to your local development environment, we're making it easier to work with your remote data—locally! Most importantly, we're releasing a much more accurate Miniflare 3, powered by the recently open-sourced workerd
runtime—the same runtime used by Cloudflare Workers!
One of the superpowers of having a local development environment is that you can test changes without affecting users in production. A great local environment offers a level of fidelity on par with production.
The way we originally approached local development was with Miniflare 2, which reimplemented Workers runtime APIs in JavaScript. Unfortunately, there were subtle behavior mismatches between these re-implementations and the real Workers runtime. These types of issues are really difficult for developers to debug, as they don’t appear locally, and step-through debugging of deployed Workers isn’t possible yet. For example, the following Worker returns responses successfully in Miniflare 2, so we might assume it’s safe to publish:
let cachedResponsePromise;
export default {
async fetch(request, env, ctx) {
// Let's imagine this fetch takes a few seconds. To speed up our worker, we
// decide to only fetch on the first request, and reuse the result later.
// This works fine in Miniflare 2, so we must be good right?
cachedResponsePromise ??= fetch("https://example.com");
return (await cachedResponsePromise).clone();
},
};
However, as soon as we send multiple requests to our deployed Worker, it fails with Error: Cannot perform I/O on behalf of a different request
. The problem here is that response bodies created in one request’s handler cannot be accessed from a different request's handler. This limitation allows Cloudflare to improve overall Worker performance, but it was almost impossible for Miniflare 2 to detect these types of issues locally. In this particular case, the best solution is to cache using fetch
itself.
Additionally, because the Workers runtime uses a very recent version of V8, it supports some JavaScript features that aren’t available in all versions of Node.js. This meant a few features implemented in Workers, like Array#findLast
, weren’t always available in Miniflare 2.
With the Workers runtime now open-sourced, Miniflare 3 can leverage the same implementations that are deployed on Cloudflare’s network, giving bug-for-bug compatibility and practically eliminating behavior mismatches. 🎉
This radically simplifies our implementation too. We were able to remove over 50,000 lines of code from Miniflare 2. Of course, we still kept all the Miniflare special-sauce that makes development fun like live reload and detailed logging. 🙂
We know that many developers choose to test their Workers remotely on the Cloudflare network as it gives them the ability to test against real data. Testing against fake data in staging and local environments is sometimes difficult, as it never quite matches the real thing.
With Miniflare 3, we’re blurring the lines between local and remote development, by bringing real data to your machine as an experimental opt-in feature. If enabled, Miniflare will read and write data to namespaces on the Cloudflare network, as your Worker would when deployed. This is only supported with Workers KV for now, but we’re exploring similar solutions for R2 and D1.
With Miniflare 3 now effectively as accurate as the real Workers environment, and the ability to access real data locally, we’re revisiting the decision to make remote development the initial Wrangler experience. In a future update, wrangler dev --local
will become the default. --local
will no longer be required. Benchmarking suggests this will bring an approximate 10x reduction to startup and a massive 60x reduction to script reload times! Over the next few weeks, we’ll be focusing on further optimizing Wrangler’s performance to bring you the fastest Workers development experience yet!
wrangler init --from-dash
We want all developers to be able to take advantage of the improved local experience, so we’re making it easy to start a local Wrangler project from an existing Worker that’s been developed in the Cloudflare dashboard. With Node.js installed, run
in your terminal to set up a new project with all your existing code and bindings such as KV namespaces configured. You can now seamlessly continue development of your application locally, taking advantage of all the developer experience improvements Wrangler and Miniflare provide. When you’re ready to deploy your worker, run npx wrangler init
--from-dash <your_worker_name>
npx wrangler publish
.
Over the next few months, the Workers team is planning to further improve the local development experience with a specific focus on automated testing. Already, we’ve released a preliminary API for programmatic end-to-end tests with wrangler dev
, but we’re also investigating ways of bringing Miniflare 2’s Jest/Vitest environments to workerd
. We’re also considering creating extensions for popular IDEs to make developing workers even easier. 👀
Miniflare 3.0 is now included in Wrangler! Try it out by running npx wrangler@latest dev --experimental-local
. Let us know what you think in the #wrangler
channel on the Cloudflare Developers Discord, and please open a GitHub issue if you hit any unexpected behavior.
Today, we are announcing the general availability of OpenAPI Schemas for the Cloudflare API. These are published via GitHub and will be updated regularly as Cloudflare adds and updates APIs. OpenAPI is the widely adopted standard for defining APIs in a machine-readable format. OpenAPI Schemas allow for the ability to plug our API into a wide breadth of tooling to accelerate development for ourselves and customers. Internally, it will make it easier for us to maintain and update our APIs. Before getting into those benefits, let’s start with the basics.
Much of the Internet is built upon APIs (Application Programming Interfaces) or provides them as services to clients all around the world. This allows computers to talk to each other in a standardized fashion. OpenAPI is a widely adopted standard for how to define APIs. This allows other machines to reliably parse those definitions and use them in interesting ways. Cloudflare’s own API Shield product uses OpenAPI schemas to provide schema validation to ensure only well-formed API requests are sent to your origin.
Cloudflare itself has an API that customers can use to interface with our security and performance products from other places on the Internet. How do we define our own APIs? In the past we used a standard called JSON Hyper-Schema. That had served us well, but as time went on we wanted to adopt more tooling that could both benefit ourselves internally and make our customer’s lives easier. The OpenAPI community has flourished over the past few years providing many capabilities as we will discuss that were unavailable while we used JSON Hyper-Schema. As of today we now use OpenAPI.
You can learn more about OpenAPI itself here. Having an open, well-understood standard for defining our APIs allows for shared tooling and infrastructure to be used that can read these standard definitions. Let’s take a look at a few examples.
Most customers won’t need to use the schemas themselves to see value. The first system leveraging OpenAPI schemas is our new API Docs that were announced today. Because we now have OpenAPI schemas, we leverage the open source tool Stoplight Elements to aid in generating this new doc site. This allowed us to retire our previously custom-built site that was hard to maintain. Additionally, many engineers at Cloudflare are familiar with OpenAPI, so we gain teams can write new schemas more quickly and are less likely to make mistakes by using a standard that teams understand when defining new APIs.
There are ways to leverage the schemas directly, however. The OpenAPI community has a huge number of tools that only require a set of schemas to be able to use. Two such examples are mocking APIs and library generation.
Say you have code that calls Cloudflare’s API and you want to be able to easily run unit tests locally or integration tests in your CI/CD pipeline. While you could just call Cloudflare’s API in each run, you may not want to for a few reasons. First, you may want to run tests frequently enough that managing the creation and tear down of resources becomes a pain. Also, in many of these tests you aren’t trying to validate logic in Cloudflare necessarily, but your own system’s behavior. In this case, mocking Cloudflare’s API would be ideal since you can gain confidence that you aren’t violating Cloudflare’s API contract, but without needing to worry about specifics of managing real resources. Additionally, mocking allows you to simulate different scenarios, like being rate limited or receiving 500 errors. This allows you to test your code for typically rare circumstances that can end up having a serious impact.
As an example, Spotlight Prism could be used to mock Cloudflare’s API for testing purposes. With a local copy of Cloudflare’s API Schemas you can run the following command to spin up a local mock server:
$ docker run --init --rm \
-v /home/user/git/api-schemas/openapi.yaml:/tmp/openapi.yaml \
-p 4010:4010 stoplight/prism:4 \
mock -h 0.0.0.0 /tmp/openapi.yaml
Then you can send requests to the mock server in order to validate that your use of Cloudflare’s API doesn’t violate the API contract locally:
$ curl -sX PUT localhost:4010/zones/f00/activation_check \
-Hx-auth-email:foo@bar.com -Hx-auth-key:foobarbaz | jq
{
"success": true,
"errors": [],
"messages": [],
"result": {
"id": "023e105f4ecef8ad9ca31a8372d0c353"
}
}
This means faster development and shorter test runs while still catching API contract issues early before they get merged or deployed.
Cloudflare has libraries in many programming languages like Terraform and Go, but we don’t support every possible programming language. Fortunately, using a tool like openapi generator, you can feed in Cloudflare’s API schemas and generate a library in a wide range of languages to then use in your code to talk to Cloudflare’s API. For example, you could generate a Java library using the following commands:
git clone https://github.com/openapitools/openapi-generator
cd openapi-generator
mvn clean package
java -jar modules/openapi-generator-cli/target/openapi-generator-cli.jar generate \
-i https://raw.githubusercontent.com/cloudflare/api-schemas/main/openapi.yaml \
-g java \
-o /var/tmp/java_api_client
And then start using that client in your Java code to talk to Cloudflare’s API.
As mentioned earlier, we previously used JSON Hyper-Schema to define our APIs. We have roughly 600 endpoints that were already defined in the schemas. Here is a snippet of what one endpoint looks like in JSON Hyper-Schema:
{
"title": "List Zones",
"description": "List, search, sort, and filter your zones.",
"rel": "collection",
"href": "zones",
"method": "GET",
"schema": {
"$ref": "definitions/zone.json#/definitions/collection_query"
},
"targetSchema": {
"$ref": "#/definitions/response_collection"
},
"cfOwnership": "www",
"cfPlanAvailability": {
"free": true,
"pro": true,
"business": true,
"enterprise": true
},
"cfPermissionsRequired": {
"enum": [
"#zone:read"
]
}
}
Let’s look at the same endpoint in OpenAPI:
/zones:
get:
description: List, search, sort, and filter your zones.
operationId: zone-list-zones
responses:
4xx:
content:
application/json:
schema:
allOf:
- $ref: '#/components/schemas/components-schemas-response_collection'
- $ref: '#/components/schemas/api-response-common-failure'
description: List Zones response failure
"200":
content:
application/json:
schema:
$ref: '#/components/schemas/components-schemas-response_collection'
description: List Zones response
security:
- api_email: []
api_key: []
summary: List Zones
tags:
- Zone
x-cfPermissionsRequired:
enum:
- '#zone:read'
x-cfPlanAvailability:
business: true
enterprise: true
free: true
pro: true
You can see that the two look fairly similar and for the most part the same information is contained in each including method type, a description, and request and response definitions (although those are linked in $refs). The value of migrating from one to the other isn’t the change in how we define the schemas themselves, but in what we can do with these schemas. Numerous tools can parse the latter, the OpenAPI, while much fewer can parse the former, the JSON Hyper-Schema.
If this one API was all that made up the Cloudflare API, it would be easy to just convert the JSON Hyper-Schema into the OpenAPI Schema by hand and call it a day. Doing this 600 times, however, was going to be a huge undertaking. When considering that teams are constantly adding new endpoints, it would be impossible to keep up. It was also the case that our existing API docs used the existing JSON Hyper-Schema, so that meant that we would need to keep both schemas up to date during any transition period. There had to be a better way.
Given both JSON Hyper-Schema and OpenAPI are standards, it reasons that it should be possible to take a file in one format and convert to the other, right? Luckily the answer is yes! We built a tool that took all existing JSON Hyper-Schema and output fully compliant OpenAPI schemas. This of course didn’t happen overnight, but because of existing OpenAPI tooling, we could iteratively improve the auto convertor and run OpenAPI validation tooling over the output schemas to see what issues the conversion tool still had.
After many iterations and improvements to the conversion tool, we finally had fully compliant OpenAPI Spec schemas being auto-generated from our existing JSON Hyper-Schema. While we were building this tool, teams kept adding and updating the existing schemas and our Product Content team was also updating text in the schemas to make our API docs easier to use. The benefit of this process is we didn’t have to slow any of that work down since anything that changed in the old schemas was automatically reflected in the new schemas!
Once the tool was ready, the remaining step was to decide when and how we would stop making updates to the JSON Hyper-Schemas and move all teams to the OpenAPI Schemas. The (now old) API docs were the biggest concern, given they only understood JSON Hyper-Schema. Thanks to the help of our Developer Experience and Product Content teams, we were able to launch the new API docs today and can officially cut over to OpenAPI today as well!
Now that we have fully moved over to OpenAPI, more opportunities become available. Internally, we will be investigating what tooling we can adopt in order to help reduce the effort of individual teams and speed up API development. One idea we are exploring is automatically creating openAPI schemas from code notations. Externally, we now have the foundational tools necessary to begin exploring how to auto generate and support more programming language libraries for customers to use. We are also excited to see what you may do with the schemas yourself, so if you do something cool or have ideas, don’t hesitate to share them with us!
]]>Duel Corp. looks so pretty that you’d be forgiving for thinking that it’s an upcoming RPG from Square Enix. But it’s actually made by some indie developers who decided to make a Soulslike from retro pixel art. And the effect is incredible.
]]>gRPC is a modern open source high performance Remote Procedure Call (RPC) framework that can run in any environment. It plays a critical role in efficiently connecting microservices in and across data centers with pluggable support for load balancing, tracing, health checking, authentication and other cross-cutting features. It may also be applied in the last mile of distributed computing to connect devices, mobile applications and browsers to backend services hosted on the public cloud. This unique position in the software stack can provide a clear end-to-end view of the whole system. A new gRPC observability feature provides this clarity for workloads running on, and/or able to connect to, Google Cloud.
gRPC observability provides three different types of data:
1. Logs for key RPC events, including:
When the client/server sends or receives the metadata of an RPC
When the client/server sends or receives the message payload of an RPC
When the client/server finishes an RPC with a final status (OK, or errors)
2. Metrics (or statistical data) for key RPC events, including:
How many bytes the client/server sent or received
How many RPCs the client/server started or completed
How long RPCs take to complete between the client and server (known as round trip latency)
3. Distributed traces for RPCs and their fanout RPCs across the system. For example, when serving an RPC from upstream, a server may need to create multiple RPCs to its own backends. The distributed trace helps the user understand the relationships between these RPCs, the latency for each of them, and key events happening throughout the system.
When developers enable the gRPC observability feature in their binaries, the gRPC library will report the logging, metrics, and tracing data to Google Cloud’s operations suite. Once the observability data is collected, users can leverage the Google Cloud console to:
Visualize the observability data
Export the observability data out of the operations tools for further analysis with other tools.
Logging
gRPC observability provides logs for key RPC events with information to help developers understand the context when these events occur. This contextual information can include which gRPC service/method is being invoked, whether the events happen on the client side or server side, whether it’s sending metadata or payloads, the size of the corresponding data, and even the concrete content of the metadata and/or payloads. These log entries are then presented in Cloud Logging with helpers to filter and even customize the query to search related logs.
Metrics
gRPC observability provides several metrics: the round trip latency of RPCs, how many RPCs were started and finished during a specific period of time, and even the number of bytes sent/received over the wire. All these metrics can be grouped by a few important parameters, including service/method name and final status. Platform-specific metrics can be included as well, depending on the Google Cloud environment and the gRPC payload actually running. For example, on the Google Kubernetes Engine (GKE) platform, developers can group/filter by namespace, container, and pod information fields to dig into more granular statistical data. With these metrics, Cloud Monitoring enables users to identify problems including:
Which container is having higher than normal latency
Which pod is having higher than normal error rates
And others.
Tracing
gRPC observability also allows developers to configure the sampling rate of RPCs. The sampling decision is propagated across the whole system, thus no matter where the RPCs actually happen, developers can always see a complete, end-to-end distributed trace for their processing logic. Sampled RPCs and any further RPCs triggered by them are displayed in Cloud Trace as parent/children spans.
With gRPC observability, telemetry data (logs, metrics, traces) of gRPC workloads can be collected and reported to the Google Cloud operations suite. It helps developers get a better understanding of their systems and enables them to diagnose problems such as:
Which microservices have suddenly become abnormally slow (long processing latency on the server side)?
Which microservices suddenly process less QPS, and is there a pattern?
Whether there’s a potential network issue for a particular microservice, as high latency is measured on the client side, but normal latency on the server side? If so, can we locate the problem in a particular cluster, or even a particular node/pod?
To get started with gRPC observability, see our user guide.
Early on when we learn to program, we get introduced to the concept of recursion. And that it is handy for computing, among other things, sequences defined in terms of recurrences. Such as the famous Fibonnaci numbers - Fn = Fn-1 + Fn-2.
Later on, perhaps when diving into multithreaded programming, we come to terms with the fact that the stack space for call frames is finite. And that there is an “okay” way and a “cool” way to calculate the Fibonacci numbers using recursion:
// fib_okay.c
#include <stdint.h>
uint64_t fib(uint64_t n)
{
if (n == 0 || n == 1)
return 1;
return fib(n - 1) + fib(n - 2);
}
Listing 1. An okay Fibonacci number generator implementation
// fib_cool.c
#include <stdint.h>
static uint64_t fib_tail(uint64_t n, uint64_t a, uint64_t b)
{
if (n == 0)
return a;
if (n == 1)
return b;
return fib_tail(n - 1, b, a + b);
}
uint64_t fib(uint64_t n)
{
return fib_tail(n, 1, 1);
}
Listing 2. A better version of the same
If we take a look at the machine code the compiler produces, the “cool” variant translates to a nice and tight sequence of instructions:
⚠ DISCLAIMER: This blog post is assembly-heavy. We will be looking at assembly code for x86-64, arm64 and BPF architectures. If you need an introduction or a refresher, I can recommend “Low-Level Programming” by Igor Zhirkov for x86-64, and “Programming with 64-Bit ARM Assembly Language” by Stephen Smith for arm64. For BPF, see the Linux kernel documentation.
Listing 3. fib_cool.c
compiled for x86-64 and arm64
The “okay” variant, disappointingly, leads to more instructions than a listing can fit. It is a spaghetti of basic blocks.
But more importantly, it is not free of x86 call instructions.
$ objdump -d fib_okay.o | grep call
10c: e8 00 00 00 00 call 111 <fib+0x111>
$ objdump -d fib_cool.o | grep call
$
This has an important consequence - as fib recursively calls itself, the stacks keep growing. We can observe it with a bit of help from the debugger.
$ gdb --quiet --batch --command=trace_rsp.gdb --args ./fib_okay 6
Breakpoint 1 at 0x401188: file fib_okay.c, line 3.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
n = 6, %rsp = 0xffffd920
n = 5, %rsp = 0xffffd900
n = 4, %rsp = 0xffffd8e0
n = 3, %rsp = 0xffffd8c0
n = 2, %rsp = 0xffffd8a0
n = 1, %rsp = 0xffffd880
n = 1, %rsp = 0xffffd8c0
n = 2, %rsp = 0xffffd8e0
n = 1, %rsp = 0xffffd8c0
n = 3, %rsp = 0xffffd900
n = 2, %rsp = 0xffffd8e0
n = 1, %rsp = 0xffffd8c0
n = 1, %rsp = 0xffffd900
13
[Inferior 1 (process 50904) exited normally]
$
While the “cool” variant makes no use of the stack.
$ gdb --quiet --batch --command=trace_rsp.gdb --args ./fib_cool 6
Breakpoint 1 at 0x40118a: file fib_cool.c, line 13.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
n = 6, %rsp = 0xffffd938
13
[Inferior 1 (process 50949) exited normally]
$
calls
go?The smart compiler turned the last function call in the body into a regular jump. Why was it allowed to do that?
It is the last instruction in the function body we are talking about. The caller stack frame is going to be destroyed right after we return anyway. So why keep it around when we can reuse it for the callee’s stack frame?
This optimization, known as tail call elimination, leaves us with no function calls in the “cool” variant of our fib implementation. There was only one call to eliminate - right at the end.
Once applied, the call becomes a jump (loop). If assembly is not your second language, decompiling the fib_cool.o object file with Ghidra helps see the transformation:
long fib(ulong param_1)
{
long lVar1;
long lVar2;
long lVar3;
if (param_1 < 2) {
lVar3 = 1;
}
else {
lVar3 = 1;
lVar2 = 1;
do {
lVar1 = lVar3;
param_1 = param_1 - 1;
lVar3 = lVar2 + lVar1;
lVar2 = lVar1;
} while (param_1 != 1);
}
return lVar3;
}
Listing 4. fib_cool.o
decompiled by Ghidra
This is very much desired. Not only is the generated machine code much shorter. It is also way faster due to lack of calls, which pop up on the profile for fib_okay.
But I am no performance ninja and this blog post is not about compiler optimizations. So why am I telling you about it?
The concept of tail call elimination made its way into the BPF world. Although not in the way you might expect. Yes, the LLVM compiler does get rid of the trailing function calls when building for -target bpf. The transformation happens at the intermediate representation level, so it is backend agnostic. This can save you some BPF-to-BPF function calls, which you can spot by looking for call -N instructions in the BPF assembly.
However, when we talk about tail calls in the BPF context, we usually have something else in mind. And that is a mechanism, built into the BPF JIT compiler, for chaining BPF programs.
We first adopted BPF tail calls when building our XDP-based packet processing pipeline. Thanks to it, we were able to divide the processing logic into several XDP programs. Each responsible for doing one thing.
BPF tail calls have served us well since then. But they do have their caveats. Until recently it was impossible to have both BPF tails calls and BPF-to-BPF function calls in the same XDP program on arm64, which is one of the supported architectures for us.
Why? Before we get to that, we have to clarify what a BPF tail call actually does.
BPF exposes the tail call mechanism through the bpf_tail_call helper, which we can invoke from our BPF code. We don’t directly point out which BPF program we would like to call. Instead, we pass it a BPF map (a container) capable of holding references to BPF programs (BPF_MAP_TYPE_PROG_ARRAY), and an index into the map.
long bpf_tail_call(void *ctx, struct bpf_map *prog_array_map, u32 index)
Description
This special helper is used to trigger a "tail call", or
in other words, to jump into another eBPF program. The
same stack frame is used (but values on stack and in reg‐
isters for the caller are not accessible to the callee).
This mechanism allows for program chaining, either for
raising the maximum number of available eBPF instructions,
or to execute given programs in conditional blocks. For
security reasons, there is an upper limit to the number of
successive tail calls that can be performed.
At first glance, this looks somewhat similar to the execve(2) syscall. It is easy to mistake it for a way to execute a new program from the current program context. To quote the excellent BPF and XDP Reference Guide from the Cilium project documentation:
Tail calls can be seen as a mechanism that allows one BPF program to call another, without returning to the old program. Such a call has minimal overhead as unlike function calls, it is implemented as a long jump, reusing the same stack frame.
But once we add BPF function calls into the mix, it becomes clear that the BPF tail call mechanism is indeed an implementation of tail call elimination, rather than a way to replace one program with another:
Tail calls, before the actual jump to the target program, will unwind only its current stack frame. As we can see in the example above, if a tail call occurs from within the sub-function, the function’s (func1) stack frame will be present on the stack when a program execution is at func2. Once the final function (func3) function terminates, all the previous stack frames will be unwinded and control will get back to the caller of BPF program caller.
Alas, one with sometimes slightly surprising semantics. Consider the code like below, where a BPF function calls the bpf_tail_call()
helper:
struct {
__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
__uint(max_entries, 1);
__uint(key_size, sizeof(__u32));
__uint(value_size, sizeof(__u32));
} bar SEC(".maps");
SEC("tc")
int serve_drink(struct __sk_buff *skb __unused)
{
return 0xcafe;
}
static __noinline
int bring_order(struct __sk_buff *skb)
{
bpf_tail_call(skb, &bar, 0);
return 0xf00d;
}
SEC("tc")
int server1(struct __sk_buff *skb)
{
return bring_order(skb);
}
SEC("tc")
int server2(struct __sk_buff *skb)
{
__attribute__((musttail)) return bring_order(skb);
}
We have two seemingly not so different BPF programs - server1()
and server2()
. They both call the same BPF function bring_order()
. The function tail calls into the serve_drink()
program, if the bar[0]
map entry points to it (let’s assume that).
Do both server1
and server2
return the same value? Turns out that - no, they don’t. We get a hex 🍔 from server1
, and a ☕ from server2
. How so?
First thing to notice is that a BPF tail call unwinds just the current function stack frame. Code past the bpf_tail_call()
invocation in the function body never executes, providing the tail call is successful (the map entry was set, and the tail call limit has not been reached).
When the tail call finishes, control returns to the caller of the function which made the tail call. Applying this to our example, the control flow is serverX() --> bring_order() --> bpf_tail_call() --> serve_drink() -return-> serverX()
for both programs.
The second thing to keep in mind is that the compiler does not know that the bpf_tail_call() helper changes the control flow. Hence, the unsuspecting compiler optimizes the code as if the execution would continue past the BPF tail call.
In our case, the compiler thinks it is okay to propagate the constant which bring_order()
returns to server1()
. Possibly catching us by surprise, if we didn’t check the generated BPF assembly.
We can prevent it by forcing the compiler to make a tail call to bring_order()
. This way we ensure that whatever bring_order()
returns will be used as the server2()
program result.
🛈 General rule - for least surprising results, use musttail attribute
when calling a function that contain a BPF tail call.
How does the bpf_tail_call()
work underneath then? And why the BPF verifier wouldn’t let us mix the function calls with tail calls on arm64? Time to dig deeper.
What does a bpf_tail_call()
helper call translate to after BPF JIT for x86-64 has compiled it? How does the implementation guarantee that we don’t end up in a tail call loop forever?
To find out we will need to piece together a few things.
First, there is the BPF JIT compiler source code, which lives in arch/x86/net/bpf_jit_comp.c
. Its code is annotated with helpful comments. We will focus our attention on the following call chain within the JIT:
do_jit() 🔗
emit_prologue() 🔗
push_callee_regs() 🔗
for (i = 1; i <= insn_cnt; i++, insn++) {
switch (insn->code) {
case BPF_JMP | BPF_CALL:
/* emit function call */ 🔗
case BPF_JMP | BPF_TAIL_CALL:
emit_bpf_tail_call_direct() 🔗
case BPF_JMP | BPF_EXIT:
/* emit epilogue */ 🔗
}
}
It is sometimes hard to visualize the generated instruction stream just from reading the compiler code. Hence, we will also want to inspect the input - BPF instructions - and the output - x86-64 instructions - of the JIT compiler.
To inspect BPF and x86-64 instructions of a loaded BPF program, we can use bpftool prog dump
. However, first we must populate the BPF map used as the tail call jump table. Otherwise, we might not be able to see the tail call jump!
This is due to optimizations that use instruction patching when the index into the program array is known at load time.
# bpftool prog loadall ./tail_call_ex1.o /sys/fs/bpf pinmaps /sys/fs/bpf
# bpftool map update pinned /sys/fs/bpf/jmp_table key 0 0 0 0 value pinned /sys/fs/bpf/target_prog
# bpftool prog dump xlated pinned /sys/fs/bpf/entry_prog
int entry_prog(struct __sk_buff * skb):
; bpf_tail_call(skb, &jmp_table, 0);
0: (18) r2 = map[id:24]
2: (b7) r3 = 0
3: (85) call bpf_tail_call#12
; return 0xf00d;
4: (b7) r0 = 61453
5: (95) exit
# bpftool prog dump jited pinned /sys/fs/bpf/entry_prog
int entry_prog(struct __sk_buff * skb):
bpf_prog_4f697d723aa87765_entry_prog:
; bpf_tail_call(skb, &jmp_table, 0);
0: nopl 0x0(%rax,%rax,1)
5: xor %eax,%eax
7: push %rbp
8: mov %rsp,%rbp
b: push %rax
c: movabs $0xffff888102764800,%rsi
16: xor %edx,%edx
18: mov -0x4(%rbp),%eax
1e: cmp $0x21,%eax
21: jae 0x0000000000000037
23: add $0x1,%eax
26: mov %eax,-0x4(%rbp)
2c: nopl 0x0(%rax,%rax,1)
31: pop %rax
32: jmp 0xffffffffffffffe3 // bug? 🤔
; return 0xf00d;
37: mov $0xf00d,%eax
3c: leave
3d: ret
There is a caveat. The target addresses for tail call jumps in bpftool prog dump jited
output will not make any sense. To discover the real jump targets, we have to peek into the kernel memory. That can be done with gdb
after we find the address of our JIT’ed BPF programs in /proc/kallsyms
:
# tail -2 /proc/kallsyms
ffffffffa0000720 t bpf_prog_f85b2547b00cbbe9_target_prog [bpf]
ffffffffa0000748 t bpf_prog_4f697d723aa87765_entry_prog [bpf]
# gdb -q -c /proc/kcore -ex 'x/18i 0xffffffffa0000748' -ex 'quit'
[New process 1]
Core was generated by `earlyprintk=serial,ttyS0,115200 console=ttyS0 psmouse.proto=exps "virtme_stty_c'.
#0 0x0000000000000000 in ?? ()
0xffffffffa0000748: nopl 0x0(%rax,%rax,1)
0xffffffffa000074d: xor %eax,%eax
0xffffffffa000074f: push %rbp
0xffffffffa0000750: mov %rsp,%rbp
0xffffffffa0000753: push %rax
0xffffffffa0000754: movabs $0xffff888102764800,%rsi
0xffffffffa000075e: xor %edx,%edx
0xffffffffa0000760: mov -0x4(%rbp),%eax
0xffffffffa0000766: cmp $0x21,%eax
0xffffffffa0000769: jae 0xffffffffa000077f
0xffffffffa000076b: add $0x1,%eax
0xffffffffa000076e: mov %eax,-0x4(%rbp)
0xffffffffa0000774: nopl 0x0(%rax,%rax,1)
0xffffffffa0000779: pop %rax
0xffffffffa000077a: jmp 0xffffffffa000072b
0xffffffffa000077f: mov $0xf00d,%eax
0xffffffffa0000784: leave
0xffffffffa0000785: ret
# gdb -q -c /proc/kcore -ex 'x/7i 0xffffffffa0000720' -ex 'quit'
[New process 1]
Core was generated by `earlyprintk=serial,ttyS0,115200 console=ttyS0 psmouse.proto=exps "virtme_stty_c'.
#0 0x0000000000000000 in ?? ()
0xffffffffa0000720: nopl 0x0(%rax,%rax,1)
0xffffffffa0000725: xchg %ax,%ax
0xffffffffa0000727: push %rbp
0xffffffffa0000728: mov %rsp,%rbp
0xffffffffa000072b: mov $0xcafe,%eax
0xffffffffa0000730: leave
0xffffffffa0000731: ret
#
Lastly, it will be handy to have a cheat sheet of mapping between BPF registers (r0
, r1
, …) to hardware registers (rax
, rdi
, …) that the JIT compiler uses.
BPF | x86-64 |
---|---|
r0 | rax |
r1 | rdi |
r2 | rsi |
r3 | rdx |
r4 | rcx |
r5 | r8 |
r6 | rbx |
r7 | r13 |
r8 | r14 |
r9 | r15 |
r10 | rbp |
internal | r9-r12 |
Now we are prepared to work out what happens when we use a BPF tail call.
In essence, bpf_tail_call()
emits a jump into another function, reusing the current stack frame. It is just like a regular optimized tail call, but with a twist.
Because of the BPF security guarantees - execution terminates, no stack overflows - there is a limit on the number of tail calls we can have (MAX_TAIL_CALL_CNT = 33
).
Counting the tail calls across BPF programs is not something we can do at load-time. The jump table (BPF program array) contents can change after the program has been verified. Our only option is to keep track of tail calls at run-time. That is why the JIT’ed code for the bpf_tail_call()
helper checks and updates the tail_call_cnt
counter.
The updated count is then passed from one BPF program to another, and from one BPF function to another, as we will see, through the rax register
(r0
in BPF).
Luckily for us, the x86-64 calling convention dictates that the rax
register does not partake in passing function arguments, but rather holds the function return value. The JIT can repurpose it to pass an additional - hidden - argument.
The function body is, however, free to make use of the r0/rax
register in any way it pleases. This explains why we want to save the tail_call_cnt
passed via rax
onto stack right after we jump to another program. bpf_tail_call()
can later load the value from a known location on the stack.
This way, the code emitted for each bpf_tail_call()
invocation, and the BPF function prologue work in tandem, keeping track of tail call count across BPF program boundaries.
But what if our BPF program is split up into several BPF functions, each with its own stack frame? What if these functions perform BPF tail calls? How is the tail call count tracked then?
BPF has its own terminology when it comes to functions and calling them, which is influenced by the internal implementation. Function calls are referred to as BPF to BPF calls. Also, the main/entry function in your BPF code is called “the program”, while all other functions are known as “subprograms”.
Each call to subprogram allocates a stack frame for local state, which persists until the function returns. Naturally, BPF subprogram calls can be nested creating a call chain. Just like nested function calls in user-space.
BPF subprograms are also allowed to make BPF tail calls. This, effectively, is a mechanism for extending the call chain to another BPF program and its subprograms.
If we cannot track how long the call chain can be, and how much stack space each function uses, we put ourselves at risk of overflowing the stack. We cannot let this happen, so BPF enforces limitations on when and how many BPF tail calls can be done:
static int check_max_stack_depth(struct bpf_verifier_env *env)
{
…
/* protect against potential stack overflow that might happen when
* bpf2bpf calls get combined with tailcalls. Limit the caller's stack
* depth for such case down to 256 so that the worst case scenario
* would result in 8k stack size (32 which is tailcall limit * 256 =
* 8k).
*
* To get the idea what might happen, see an example:
* func1 -> sub rsp, 128
* subfunc1 -> sub rsp, 256
* tailcall1 -> add rsp, 256
* func2 -> sub rsp, 192 (total stack size = 128 + 192 = 320)
* subfunc2 -> sub rsp, 64
* subfunc22 -> sub rsp, 128
* tailcall2 -> add rsp, 128
* func3 -> sub rsp, 32 (total stack size 128 + 192 + 64 + 32 = 416)
*
* tailcall will unwind the current stack frame but it will not get rid
* of caller's stack as shown on the example above.
*/
if (idx && subprog[idx].has_tail_call && depth >= 256) {
verbose(env,
"tail_calls are not allowed when call stack of previous frames is %d bytes. Too large\n",
depth);
return -EACCES;
}
…
}
While the stack depth can be calculated by the BPF verifier at load-time, we still need to keep count of tail call jumps at run-time. Even when subprograms are involved.
This means that we have to pass the tail call count from one BPF subprogram to another, just like we did when making a BPF tail call, so we yet again turn to value passing through the rax register
.
🛈 To keep things simple, BPF code in our examples does not allocate anything on stack. I encourage you to check how the JIT’ed code changes when you add some local variables. Just make sure the compiler does not optimize them out.
To make it work, we need to:
① load the tail call count saved on stack into rax
before call
’ing the subprogram,
② adjust the subprogram prologue, so that it does not reset the rax
like the main program does,
③ save the passed tail call count on subprogram’s stack for the bpf_tail_call()
helper to consume it.
A bpf_tail_call()
within our suprogram will then:
④ load the tail call count from stack,
⑤ unwind the BPF stack, but keep the current subprogram’s stack frame in tact, and
⑥ jump to the target BPF program.
Now we have seen how all the pieces of the puzzle fit together to make BPF tail work on x86-64 safely. The only open question is does it work the same way on other platforms like arm64? Time to shift gears and dive into a completely different BPF JIT implementation.
If you try loading a BPF program that uses both BPF function calls (aka BPF to BPF calls) and BPF tail calls on an arm64 machine running the latest 5.15 LTS kernel, or even the latest 5.19 stable kernel, the BPF verifier will kindly ask you to reconsider your choice:
# uname -rm
5.19.12 aarch64
# bpftool prog loadall tail_call_ex2.o /sys/fs/bpf
libbpf: prog 'entry_prog': BPF program load failed: Invalid argument
libbpf: prog 'entry_prog': -- BEGIN PROG LOAD LOG --
0: R1=ctx(off=0,imm=0) R10=fp0
; __attribute__((musttail)) return sub_func(skb);
0: (85) call pc+1
caller:
R10=fp0
callee:
frame1: R1=ctx(off=0,imm=0) R10=fp0
; bpf_tail_call(skb, &jmp_table, 0);
2: (18) r2 = 0xffffff80c38c7200 ; frame1: R2_w=map_ptr(off=0,ks=4,vs=4,imm=0)
4: (b7) r3 = 0 ; frame1: R3_w=P0
5: (85) call bpf_tail_call#12
tail_calls are not allowed in non-JITed programs with bpf-to-bpf calls
processed 4 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
…
#
That is a pity! We have been looking forward to reaping the benefits of code sharing with BPF to BPF calls in our lengthy machine generated BPF programs. So we asked - how hard could it be to make it work?
After all, BPF JIT for arm64 already can handle BPF tail calls and BPF to BPF calls, when used in isolation.
It is “just” a matter of understanding the existing JIT implementation, which lives in arch/arm64/net/bpf_jit_comp.c
, and identifying the missing pieces.
To understand how BPF JIT for arm64 works, we will use the same method as before - look at its code together with sample input (BPF instructions) and output (arm64 instructions).
We don’t have to read the whole source code. It is enough to zero in on a few particular code paths:
bpf_int_jit_compile() 🔗
build_prologue() 🔗
build_body() 🔗
for (i = 0; i < prog->len; i++) {
build_insn() 🔗
switch (code) {
case BPF_JMP | BPF_CALL:
/* emit function call */ 🔗
case BPF_JMP | BPF_TAIL_CALL:
emit_bpf_tail_call() 🔗
}
}
build_epilogue() 🔗
One thing that the arm64 architecture, and RISC architectures in general, are known for is that it has a plethora of general purpose registers (x0-x30
). This is a good thing. We have more registers to allocate to JIT internal state, like the tail call count. A cheat sheet of what roles the hardware registers play in the BPF JIT will be helpful:
BPF | arm64 |
---|---|
r0 | x7 |
r1 | x0 |
r2 | x1 |
r3 | x2 |
r4 | x3 |
r5 | x4 |
r6 | x19 |
r7 | x20 |
r8 | x21 |
r9 | x22 |
r10 | x25 |
internal | x9-x12, x26 (tail_call_cnt), x27 |
Now let’s try to understand the state of things by looking at the JIT’s input and output for two particular scenarios: (1) a BPF tail call, and (2) a BPF to BPF call.
It is hard to read assembly code selectively. We will have to go through all instructions one by one, and understand what each one is doing.
⚠ Brace yourself. Time to decipher a bit of ARM64 assembly. If this will be your first time reading ARM64 assembly, you might want to at least skim through this Guide to ARM64 / AArch64 Assembly on Linux before diving in.
Scenario #1: A single BPF tail call - tail_call_ex1.bpf.c
Input: BPF assembly (bpftool prog dump xlated
)
0: (18) r2 = map[id:4] // jmp_table map
2: (b7) r3 = 0
3: (85) call bpf_tail_call#12
4: (b7) r0 = 61453 // 0xf00d
5: (95) exit
Output: ARM64 assembly (bpftool prog dump jited
)
0: paciasp // Sign LR (ROP protection) ①
4: stp x29, x30, [sp, #-16]! // Save FP and LR registers ②
8: mov x29, sp // Set up Frame Pointer
c: stp x19, x20, [sp, #-16]! // Save callee-saved registers ③
10: stp x21, x22, [sp, #-16]! // ⋮
14: stp x25, x26, [sp, #-16]! // ⋮
18: stp x27, x28, [sp, #-16]! // ⋮
1c: mov x25, sp // Set up BPF stack base register (r10)
20: mov x26, #0x0 // Initialize tail_call_cnt ④
24: sub x27, x25, #0x0 // Calculate FP bottom ⑤
28: sub sp, sp, #0x200 // Set up BPF program stack ⑥
2c: mov x1, #0xffffff80ffffffff // r2 = map[id:4] ⑦
30: movk x1, #0xc38c, lsl #16 // ⋮
34: movk x1, #0x7200 // ⋮
38: mov x2, #0x0 // r3 = 0
3c: mov w10, #0x24 // = offsetof(struct bpf_array, map.max_entries) ⑧
40: ldr w10, [x1, x10] // Load array->map.max_entries
44: add w2, w2, #0x0 // = index (0)
48: cmp w2, w10 // if (index >= array->map.max_entries)
4c: b.cs 0x0000000000000088 // goto out;
50: mov w10, #0x21 // = MAX_TAIL_CALL_CNT (33)
54: cmp x26, x10 // if (tail_call_cnt >= MAX_TAIL_CALL_CNT)
58: b.cs 0x0000000000000088 // goto out;
5c: add x26, x26, #0x1 // tail_call_cnt++;
60: mov w10, #0x110 // = offsetof(struct bpf_array, ptrs)
64: add x10, x1, x10 // = &array->ptrs
68: lsl x11, x2, #3 // = index * sizeof(array->ptrs[0])
6c: ldr x11, [x10, x11] // prog = array->ptrs[index];
70: cbz x11, 0x0000000000000088 // if (prog == NULL) goto out;
74: mov w10, #0x30 // = offsetof(struct bpf_prog, bpf_func)
78: ldr x10, [x11, x10] // Load prog->bpf_func
7c: add x10, x10, #0x24 // += PROLOGUE_OFFSET * AARCH64_INSN_SIZE (4)
80: add sp, sp, #0x200 // Unwind BPF stack
84: br x10 // goto *(prog->bpf_func + prologue_offset)
88: mov x7, #0xf00d // r0 = 0xf00d
8c: add sp, sp, #0x200 // Unwind BPF stack ⑨
90: ldp x27, x28, [sp], #16 // Restore used callee-saved registers
94: ldp x25, x26, [sp], #16 // ⋮
98: ldp x21, x22, [sp], #16 // ⋮
9c: ldp x19, x20, [sp], #16 // ⋮
a0: ldp x29, x30, [sp], #16 // ⋮
a4: add x0, x7, #0x0 // Set return value
a8: autiasp // Authenticate LR
ac: ret // Return to caller
① BPF program prologue starts with Pointer Authentication Code (PAC), which protects against Return Oriented Programming attacks. PAC instructions are emitted by JIT only if CONFIG_ARM64_PTR_AUTH_KERNEL is enabled.
② Arm 64 Architecture Procedure Call Standard mandates that the Frame Pointer (register X29) and the Link Register (register X30), aka the return address, of the caller should be recorded onto the stack.
③ Registers X19 to X28, and X29 (FP) plus X30 (LR), are callee saved. ARM64 BPF JIT does not use registers X23 and X24 currently, so they are not saved.
④ We track the tail call depth in X26. No need to save it onto stack since we use a register dedicated just for this purpose.
⑤ FP bottom is an optimization that allows store/loads to BPF stack with a single instruction and an immediate offset value.
⑥ Reserve space for the BPF program stack. The stack layout is now as shown in a diagram in build_prologue()
source code.
⑦ The BPF function body starts here.
⑧ bpf_tail_call()
instructions start here.
⑨ The epilogue starts here.
Whew! That was a handful 😅.
Notice that the BPF tail call implementation on arm64 is not as optimized as on x86-64. There is no code patching to make direct jumps when the target program index is known at the JIT-compilation time. Instead, the target address is always loaded from the BPF program array.
Ready for the second scenario? I promise it will be shorter. Function prologue and epilogue instructions will look familiar, so we are going to keep annotations down to a minimum.
Scenario #2: A BPF to BPF call - sub_call_ex1.bpf.c
Input: BPF assembly (bpftool prog dump xlated
)
int entry_prog(struct __sk_buff * skb):
0: (85) call pc+1#bpf_prog_a84919ecd878b8f3_sub_func
1: (95) exit
int sub_func(struct __sk_buff * skb):
2: (b7) r0 = 61453 // 0xf00d
3: (95) exit
Output: ARM64 assembly
int entry_prog(struct __sk_buff * skb):
bpf_prog_163e74e7188910f2_entry_prog:
0: paciasp // Begin prologue
4: stp x29, x30, [sp, #-16]! // ⋮
8: mov x29, sp // ⋮
c: stp x19, x20, [sp, #-16]! // ⋮
10: stp x21, x22, [sp, #-16]! // ⋮
14: stp x25, x26, [sp, #-16]! // ⋮
18: stp x27, x28, [sp, #-16]! // ⋮
1c: mov x25, sp // ⋮
20: mov x26, #0x0 // ⋮
24: sub x27, x25, #0x0 // ⋮
28: sub sp, sp, #0x0 // End prologue
2c: mov x10, #0xffffffffffff5420 // Build sub_func()+0x0 address
30: movk x10, #0x8ff, lsl #16 // ⋮
34: movk x10, #0xffc0, lsl #32 // ⋮
38: blr x10 ------------------. // Call sub_func()+0x0
3c: add x7, x0, #0x0 <----------. // r0 = sub_func()
40: mov sp, sp | | // Begin epilogue
44: ldp x27, x28, [sp], #16 | | // ⋮
48: ldp x25, x26, [sp], #16 | | // ⋮
4c: ldp x21, x22, [sp], #16 | | // ⋮
50: ldp x19, x20, [sp], #16 | | // ⋮
54: ldp x29, x30, [sp], #16 | | // ⋮
58: add x0, x7, #0x0 | | // ⋮
5c: autiasp | | // ⋮
60: ret | | // End epilogue
| |
int sub_func(struct __sk_buff * skb): | |
bpf_prog_a84919ecd878b8f3_sub_func: | |
0: paciasp <---------------------' | // Begin prologue
4: stp x29, x30, [sp, #-16]! | // ⋮
8: mov x29, sp | // ⋮
c: stp x19, x20, [sp, #-16]! | // ⋮
10: stp x21, x22, [sp, #-16]! | // ⋮
14: stp x25, x26, [sp, #-16]! | // ⋮
18: stp x27, x28, [sp, #-16]! | // ⋮
1c: mov x25, sp | // ⋮
20: mov x26, #0x0 | // ⋮
24: sub x27, x25, #0x0 | // ⋮
28: sub sp, sp, #0x0 | // End prologue
2c: mov x7, #0xf00d | // r0 = 0xf00d
30: mov sp, sp | // Begin epilogue
34: ldp x27, x28, [sp], #16 | // ⋮
38: ldp x25, x26, [sp], #16 | // ⋮
3c: ldp x21, x22, [sp], #16 | // ⋮
40: ldp x19, x20, [sp], #16 | // ⋮
44: ldp x29, x30, [sp], #16 | // ⋮
48: add x0, x7, #0x0 | // ⋮
4c: autiasp | // ⋮
50: ret ----------------------------' // End epilogue
We have now seen what a BPF tail call and a BPF function/subprogram call compiles down to. Can you already spot what would go wrong if mixing the two was allowed?
That’s right! Every time we enter a BPF subprogram, we reset the X26 register, which holds the tail call count, to zero (mov x26
, #0x0
). This is bad. It would let users create program chains longer than the MAX_TAIL_CALL_CNT
limit.
How about we just skip this step when emitting the prologue for BPF subprograms?
@@ -246,6 +246,7 @@ static bool is_lsi_offset(int offset, int scale)
static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
{
const struct bpf_prog *prog = ctx->prog;
+ const bool is_main_prog = prog->aux->func_idx == 0;
const u8 r6 = bpf2a64[BPF_REG_6];
const u8 r7 = bpf2a64[BPF_REG_7];
const u8 r8 = bpf2a64[BPF_REG_8];
@@ -299,7 +300,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
/* Set up BPF prog stack base register */
emit(A64_MOV(1, fp, A64_SP), ctx);
- if (!ebpf_from_cbpf) {
+ if (!ebpf_from_cbpf && is_main_prog) {
/* Initialize tail_call_cnt */
emit(A64_MOVZ(1, tcc, 0, 0), ctx);
Believe it or not. This is everything that was missing to get BPF tail calls working with function calls on arm64. The feature will be enabled in the upcoming Linux 6.0 release.
From recursion to tweaking the BPF JIT. How did we get here? Not important. It’s all about the journey.
Along the way we have unveiled a few secrets behind BPF tails calls, and hopefully quenched your thirst for low-level programming. At least for today.
All that is left is to sit back and watch the fruits of our work. With GDB hooked up to a VM, we can observe how a BPF program calls into a BPF function, and from there tail calls to another BPF program:
demo-gdb-step-thru-bpf.pages.dev/
Until next time 🖖.
]]>This post is also available in 简体中文, 日本語 and Deutsch.
From our earliest days, Cloudflare has stood for helping build a better Internet that’s accessible to all. It’s core to our mission that anyone who wants to start building on the Internet should be able to do so easily, and without the barriers of prohibitively expensive or difficult to use infrastructure.
Nowhere is this philosophy more important – and more impactful to the Internet – than with our developer platform, Cloudflare Workers. Workers is, quite simply, where developers and entrepreneurs start on Day 1. It’s a full developer platform that includes cloud storage; website hosting; SQL databases; and of course, the industry’s leading serverless product. The platform’s ease-of-use and accessible pricing (all the way down to free) are critical in advancing our mission. For startups, this translates into fast, easy deployment and iteration, that scales seamlessly with predictable, transparent and cost-effective pricing. Building a great business from scratch is hard enough – we ought to know! – and so we’re aiming to take all the complexity out of your application infrastructure.
Today, we’re taking things a step further and making it easier for startups to build the business of their dreams. We’re announcing a $1.25 billion Workers Launchpad funding program in partnership with some of the world’s leading venture capital firms. Any startup built on Workers can apply. As is the case with the Workers Platform itself, we’ve tried to make applying dead simple: it should take you less than five minutes to submit your application through the Workers Launchpad portal.
How does it work? The only requirement for being eligible for the funding program is that you’ve built your core infrastructure on Workers. If you’re new to Cloudflare and Cloudflare Workers, check out our Startup Plan to get started. We hope these resources will be helpful to all startups and help level the playing field, no matter where in the world you might be.
Once you submit your application, it will be reviewed by our Launchpad team, several of whom are former entrepreneurs and venture capital folks themselves. They’ll match promising applicants with our VC partners who have the most expertise in your space (more on them below). Every quarter, we’ll announce the winners of our Launchpad program. Winners, our “Workers Founders”, will be guaranteed the opportunity to pitch the VC partner(s) that we’ve determined would be a good match for your business. It’s a win-win all around. VCs get the opportunity to invest in businesses they know are being built on a forward-looking, world-class, development platform. Entrepreneurs get connected to world-class VCs. And for the first class of winners, we’ll have a few added perks that we describe in more detail below.
When we approached our friends in the venture community with our vision for the Workers Launchpad, we received incredibly positive feedback and excitement. Many have seen firsthand the competitive advantages of building on Workers through their own portfolio companies. Moreover, Cloudflare is home to one of the largest developer communities on Earth with approximately 20% of the world’s websites on our network. As such, we can play a unique role in matching great entrepreneurs with great VCs to further not only the Workers platform, but also the Internet ecosystem, for everyone.
We’re honored to announce a world-class group of VC Launch Partners supporting this program and the ecosystem of Workers-based startups:
So why are we doing this? The simple answer is we’re proud of our Workers Developer Platform and think that everyone should be using it. Entrepreneurs who develop on Workers can ship faster; more easily and cost-effectively; and in a way that future proofs your infrastructure:
Speed. Development velocity isn’t just a convenience for an entrepreneur. It’s a massive competitive advantage. In fact, development velocity is one of Cloudflare’s competitive advantages – we’re able to develop quickly because we build on Workers. When you develop on Workers, you don’t need to spend time configuring DNS records, maintaining certificates, scaling up clusters, or building complex deployment pipelines. Focus on developing your application, and Cloudflare will handle the rest.
Ease of use. Startup teams and founders are some of the busiest people on earth. You shouldn’t have to think about – or make complicated decisions about – IT infrastructure. Questions like: “Which availability zone should I choose?”, or “Will I be able to scale up my infrastructure in time for our next viral marketing campaign?” shouldn’t have to cross your mind! And on the Workers Platform, they don’t. The code you and your team writes automatically deploys quickly and consistently across Cloudflare’s global network in 275+ cities in over 100 countries. Cloudflare securely and scalably connects your users to your applications, regardless of where those applications are hosted or how many users suddenly sign up for your product. Developers can easily manage globally distributed applications with a programmable network that easily connects to whatever services they need to talk to.
Future-proofing your infrastructure and your wallet. Cloudflare’s massive global network – that’s distributed across 275+ cities in over 100 countries – is able to scale with your business, no matter how large it grows to become. We also help you remain compliant with local laws and regulations as you expand around the world, with capabilities like Workers’ Jurisdictional Restrictions for Durable Objects. You can sleep soundly at night instead of worrying about how to level up your infrastructure in the midst of shifting regulations, and equally importantly, knowing that you will not wake up to any surprise bills. Many of us have had the experience of being charged unexpected and / or exorbitant fees from our cloud providers. For example, providers will often make it easy and free to onboard your application or data, but charge exorbitant rates when you want to move them out (i.e. egress fees). Cloudflare will never charge for egress. Our pricing is simple, and we constantly aim to be the low-cost provider, no matter how large your business grows to be.
We’re excited about Workers not only because we’ve built our own infrastructure on it, but also because we’re seeing the incredible things others have built on it. In fact, we acquired a company built entirely on Workers at the end of last year, an Israeli start-up named Zaraz, which secures and accelerates third party web tools. Workers allowed Zaraz to replace the multiple network requests of each tool running on a website with one single request, effectively streamlining a messy web of extensions into a single lightweight application. This acquisition opened our eyes to the power of the global community that’s built on our platform, and left us motivated to help startups built on Workers find the funding, mentorship, and support needed to grow.
To make it even easier for startups to take advantage of all the benefits that Workers has to offer, applicants to the Workers Launchpad program who have raised less than $3 million in total external funding will automatically have the option to receive Cloudflare’s Startup Plan. This plan includes all the elements of Cloudflare’s Pro and Business Plans ($2,400 annual value) plus higher tiers of our Stream video product, our Teams Zero Trust security suite and the Workers platform. To make sure the full range of our developer platform is accessible to startups, we recently more than tripled the number of products available in this plan, which now includes email security, R2, Pages, KV, and many others.
Furthermore, all startups that apply by October 31, 2022, will be eligible to be selected for the Winter 2022 class of Workers Founders, which will unlock additional support, mentorship, and marketing opportunities. Being selected as a Workers Founder will get you a chance to practice your pitch with investors, engage with leaders from Cloudflare, and get advice on how to build a successful business from topics like recruiting to marketing, sales, and beyond during a virtual Workers Founders Bootcamp Week. The program will culminate in a virtual Demo Day, so you can show the world what you’ve been building. We’re leaning in to help promising entrepreneurs join us in our mission to help build a better Internet.
Accessibility and ease of use are core to everything we do at Cloudflare. We will always make our products and platforms so easy to use that even the smallest business or hobbyist can easily use them. We hope the Workers Launchpad funding program encourages entrepreneurs from all around the world, and from all backgrounds, to start building on Workers, and makes it easier for you to find the funding you need to build the business of your dreams.
Head to the Workers Launchpad page to apply and join the Cloudflare Developer Discord to engage with the Workers community. If you’re a VC that is interested in supporting the program, reach out to Workers-Launchpad@cloudflare.com.
Cloudflare is not providing any funding or making any funding decisions, and there is no guarantee that any particular company will receive funding through the program. All funding decisions will be made by the venture capital firms that participate in the program. Cloudflare is not a registered broker-dealer, investment adviser, or other similar intermediary.
]]>One way to get started is to raise seed funding.
Startups looking for funding rarely have positive net cash flow, and many only have a few founders and a great plan. Only 2 in 5 startups are profitable at any point. And most startups struggle to find profits early. According to some sources, it takes around 2 to 3 years for the average startup to start generating profits. Many founders do not have the personal wealth to bootstrap their startups for years.
Because traditional funding levers (e.g., private equity firms, investment banks, hedge funds, etc.) aren't willing to fund unproven businesses, startups often look for a special type of funding called "seed" funding. As the name suggests, investors provide a seed (funding) in the hopes that the startup nurtures that seed money into a healthy tree (a successful startup). In return, the startup provides equity (usually between 5 to 20 percent) or convertible debt.
So, where do you find seed funding?
Generally, seed funding comes from one of the following sources:
Each of these funding levers provides unique benefits. Friends and family are often the easiest way to get quick funding. But at the risk of damaged relationships. Angels, incubators, and accelerators all provide a wealth of intangible benefits, but they can be challenging to get into, depending on the startup's business plan. And venture capital usually works best when your startup already has a well-established framework.
Understanding who provides seed funding isn't the same as actually getting seed funding. Many startups exist outside of massive cities, and not all founders are well-connected. Thankfully, a significant amount of funding is now done digitally. There are many websites and services aimed at providing startups with the resources and experience they need to grow a successful company. These include:
Angel-finding websites: There are a wide variety of angel funding websites that directly connect founders with angels across the globe. Some of the most popular include:
Incubators: There are thousands of incubators across the United States alone, but some stand out more than others. The most successful incubators include:
Crowdfunding websites: Crowdfunding is a great way to quickly raise funds for your business. You essentially get funded by many different anonymous people in return for equity. While GoFundMe and Kickstarter are great for product-based startups, most new startups go through seed-based crowdfunding rounds on websites like:
Loans: You always have the option to take out loans and microloans. Small businesses can use SBA loans to get some quick capital, and there are a wide variety of unique loan options for startups. For example, solutions like Pipe give you up-front capital in return for recurring revenue (which may be ideal for some SaaS companies looking for hands-off funding). You can find loan options at your bank or through a variety of different websites.
In addition to these funding options, there are many resources for startups that aren't related to capital. Startup events, networking websites, forums, social media, and a wide variety of websites exist to help founders connect to other founders and business leaders. Growing a successful business takes more than money. You should always look to build connections and gain insights from industry leaders along the way.
DigitalOcean provides founders a variety of resources through Hatch, our global startup program. Sign up to learn how we can help you build and grow your startup through intelligent infrastructure solutions.
]]>As a followup, I thought it might be fun to implement a program that’s like a tiny ssh server, but without the security. You can find it on github here, and I’ll explain how it works in this blog post.
Our goal is to be able to login to a remote computer and run commands, like you do with SSH or telnet.
The biggest difference between this program and SSH is that there’s literally no security (not even a password) – anyone who can make a TCP connection to the server can get a shell and run commands.
Obviously this is not a useful program in real life, but our goal is to learn a little more about how terminals works, not to write a useful program.
(I will run a version of it on the public internet for the next week though, you can see how to connect to it at the end of this blog post)
We’re also going to write a client, but the server is the interesting part, so let’s start there. We’re going to write a server that listens on a TCP port (I picked 7777) and creates remote terminals for any client that connects to it to use.
When the server receives a new connection it needs to:
bash
shell process for the client to usebash
to the pseudoterminalI just said the word “pseudoterminal” a lot, so let’s talk about what that means.
Okay, what the heck is a pseudoterminal?
A pseudoterminal is a lot like a bidirectional pipe or a socket – you have two ends, and they can both send and receive information. You can read more about the information being sent and received in what happens if you press a key in your terminal
Basically the idea is that on one end, we have a TCP connection, and on the
other end, we have a bash
shell. So we need to hook one part of the
pseudoterminal up to the TCP connection and the other end to bash.
The two parts of the pseudoterminal are called:
stdout
, stderr
, and stdin
to this.Once they’re conected, we can communicate with bash
over our TCP connection
and we’ll have a remote shell!
You might be wondering – Julia, if a pseudoterminal is kind of like a socket,
why can’t we just set our bash shell’s stdout
/ stderr
/ stdin
to the TCP
socket?
And you can! We could write a TCP connection handler like this that does exactly that, it’s not a lot of code (server-notty.go).
func handle(conn net.Conn) {
tty, _ := conn.(*net.TCPConn).File()
// start bash with tcp connection as stdin/stdout/stderr
cmd := exec.Command("bash")
cmd.Stdin = tty
cmd.Stdout = tty
cmd.Stderr = tty
cmd.Start()
}
It even kind of works – if we connect to it with nc localhost 7778
, we can
run commands and look at their output.
But there are a few problems. I’m not going to list all of them, just two.
problem 1: Ctrl + C doesn’t work
The way Ctrl + C works in a remote login session is
0x03
and sent through the TCP connectionSIGINT
to the appropriate process (more on what the “appropriate process” is exactly later)If the “terminal” is just a TCP connection, this doesn’t work, because when you
send 0x04
to a TCP connection, Linux won’t magically send SIGINT
to any
process.
problem 2: top
doesn’t work
When I try to run top
in this shell, I get the error message top: failed tty get
. If we strace it, we see this system call:
ioctl(2, TCGETS, 0x7ffec4e68d60) = -1 ENOTTY (Inappropriate ioctl for device)
So top
is running an ioctl
on its output file descriptor (2) to get some
information about the terminal. But Linux is like “hey, this isn’t a terminal!”
and returns an error.
There are a bunch of other things that go wrong, but hopefully at this point you’re convinced that we actually need to set bash’s stdout/stderr to be a terminal, not some other thing like a socket.
So let’s start looking at the server code and see what creating a pseudoterminal actually looks like.
Here’s some Go code to create a pseudoterminal on Linux. This is copied from github.com/creack/pty, but I removed some of the error handling to make the logic a bit easier to follow:
pty, _ := os.OpenFile("/dev/ptmx", os.O_RDWR, 0)
sname := ptsname(p)
unlockpt(p)
tty, _ := os.OpenFile(sname, os.O_RDWR|syscall.O_NOCTTY, 0)
In English, what we’re doing is:
/dev/ptmx
to get the “pseudoterminal master” Again, that’s the part we’re going to hook up to the TCP connection/dev/pts/13
or something./dev/pts/13
(or whatever number we got from ptsname
) to get the “slave pseudoterminal device”What do those ptsname
and unlockpt
functions do? They just make some
ioctl
system calls to the Linux kernel. All of the communication with the
Linux kernel about terminals seems to be through various ioctl
system calls.
Here’s the code, it’s pretty short: (again, I just copied it from creack/pty)
func ptsname(f *os.File) string {
var n uint32
ioctl(f.Fd(), syscall.TIOCGPTN, uintptr(unsafe.Pointer(&n)))
return "/dev/pts/" + strconv.Itoa(int(n))
}
func unlockpt(f *os.File) {
var u int32
// use TIOCSPTLCK with a pointer to zero to clear the lock
ioctl(f.Fd(), syscall.TIOCSPTLCK, uintptr(unsafe.Pointer(&u)))
}
bash
The next thing we have to do is connect the pseudoterminal to bash
. Luckily,
that’s really easy – here’s the Go code for it! We just need to start a new
process and set the stdin, stdout, and stderr to tty
.
cmd := exec.Command("bash")
cmd.Stdin = tty
cmd.Stdout = tty
cmd.Stderr = tty
cmd.SysProcAttr = &syscall.SysProcAttr{
Setsid: true,
}
cmd.Start()
Easy! Though – why do we need this Setsid: true
thing, you might ask? Well,
I tried commenting out that code to see what went wrong. It turns out that what
goes wrong is – Ctrl + C doesn’t work anymore!
Setsid: true
creates a new session for the new bash process. But why does
that make Ctrl + C
work? How does Linux know which process to send SIGINT
to when you press Ctrl + C
, and what does that have to do with sessions?
I found this pretty confusing, so I reached for my favourite book for learning about this kind of thing: the linux programming interface, specifically chapter 34 on process groups and sessions.
That chapter contains a few key facts: (#3, #4, and #5 are direct quotes from the book)
Ctrl+C
in a terminal, SIGINT gets sent to all the processes in the foreground process groupWhat’s a process group? Well, my understanding is that:
x | y | z
are in the same process groupx && y && z
) are in the same process groupI didn’t know most of this (I had no idea processes had a session ID!) so this was kind of a lot to absorb. I tried to draw a sketchy ASCII art diagram of the situation
(maybe) terminal --- session --- process group --- process
| |- process
| |- process
|- process group
|
|- process group
So when we press Ctrl+C in a terminal, here’s what I think happens:
\x04
gets written to the “pseudoterminal master” of a terminalSIGINT
If we don’t create a new session for our new bash process, our new pseudoterminal
actually won’t have any session associated with it, so nothing happens when
we press Ctrl+C
. But if we do create a new session, then the new
pseudoterminal will have the new session associated with it.
As a quick aside, if you want to get a list of all the sessions on your Linux machine, grouped by session, you can run:
$ ps -eo user,pid,pgid,sess,cmd | sort -k3
This includes the PID, process group ID, and session ID. As an example of the output, here are the two processes in the pipeline:
bork 58080 58080 57922 ps -eo user,pid,pgid,sess,cmd
bork 58081 58080 57922 sort -k3
You can see that they share the same process group ID and session ID, but of course they have different PIDs.
That was kind of a lot but that’s all we’re going to say about sessions and process groups in this post. Let’s keep going!
We need to tell the terminal how big to be!
Again, I just copied this from creack/pty
. I decided to hardcode the size to 80x24.
Setsize(tty, &Winsize{
Cols: 80,
Rows: 24,
})
Like with getting the terminal’s pts filename and unlocking it, setting the
size is just one ioctl
system call:
func Setsize(t *os.File, ws *Winsize) {
ioctl(t.Fd(), syscall.TIOCSWINSZ, uintptr(unsafe.Pointer(ws)))
}
Pretty simple! We could do something smarter and get the real window size, but I’m too lazy.
As a reminder, our rough steps to set up this remote login server were:
bash
shell processbash
to the pseudoterminalWe’ve done 1, 2, and 3, now we just need to ferry information between the TCP connection and the pseudoterminal.
There are two io.Copy
calls, one to copy the input from the tcp connection, and one to copy the output to the TCP connection. Here’s what the code looks like:
go func() {
io.Copy(pty, conn)
}()
io.Copy(conn, pty)
The first one is in a goroutine just so they can both run in parallel.
Pretty simple!
I also added a little bit of code to close the TCP connection when the command exits
go func() {
cmd.Wait()
conn.Close()
}()
And that’s it for the server! You can see all of the Go code here: server.go.
Next, we have to write a client. This is a lot easier than the server because we don’t need to do quite as much terminal setup. There are just 3 steps:
We need to put the client terminal into “raw” mode so that every time you press a key, it gets sent to the TCP connection immediately. If we don’t do this, everything will only get sent when you press enter.
“Raw mode” isn’t actually a single thing, it’s a bunch of flags that you want to turn off. There’s a good tutorial explaining all the flags we have to turn off called Entering raw mode.
Like everything else with terminals, this requires ioctl
system calls. In
this case we get the terminal’s current settings, modify them, and save the old
settings so that we can restore them later.
I figured out how to do this in Go by going to grep.app and typing in
syscall.TCSETS
to find some other Go code that was doing the same thing.
func MakeRaw(fd uintptr) syscall.Termios {
// from https://github.com/getlantern/lantern/blob/devel/archive/src/golang.org/x/crypto/ssh/terminal/util.go
var oldState syscall.Termios
ioctl(fd, syscall.TCGETS, uintptr(unsafe.Pointer(&oldState)))
newState := oldState
newState.Iflag &^= syscall.ISTRIP | syscall.INLCR | syscall.ICRNL | syscall.IGNCR | syscall.IXON | syscall.IXOFF
newState.Lflag &^= syscall.ECHO | syscall.ICANON | syscall.ISIG
ioctl(fd, syscall.TCSETS, uintptr(unsafe.Pointer(&newState)))
return oldState
}
This is exactly like what we did with the server. It’s very little code:
go func() {
io.Copy(conn, os.Stdin)
}()
io.Copy(os.Stdout, conn)
We can put the terminal back into the mode it started in like this (another ioctl
!):
func Restore(fd uintptr, oldState syscall.Termios) {
ioctl(fd, syscall.TCSETS, uintptr(unsafe.Pointer(&oldState)))
}
We have written a tiny remote login server that lets anyone log in! Hooray!
Obviously this has zero security so I’m not going to talk about that aspect.
For the next week or so I’m going to run a demo of this on the internet at
tetris.jvns.ca
. It runs tetris instead of a shell because I wanted to avoid
abuse, but if you want to try it with a shell you can run it on your own
computer :).
If you want to try it out, you can use netcat
as a client instead of the
custom Go client program we wrote, because copying information to/from a TCP
connection is what netcat does. Here’s how:
stty raw -echo && nc tetris.jvns.ca 7777 && stty sane
This will let you play a terminal tetris game called tint
.
You can also use the client.go program and run go run client.go tetris.jvns.ca 7777
.
This protocol where we just copy bytes from the TCP connection to the terminal and nothing else is not good because it doesn’t allow us to send over information information like the terminal or the actual window size of the terminal.
I thought about implementing telnet’s protocol so that we could use telnet as a client, but I didn’t feel like figuring out how telnet works so I didn’t. (the server 30% works with telnet as is, but a lot of things are broken, I don’t quite know why, and I didn’t feel like figuring it out)
As a warning: using this server to play tetris will probably mess up your terminal a bit because it sets the window size to 80x24. To fix that I just closed the terminal tab after running that command.
If we wanted to fix this for real, we’d need to restore the window size after we’re done, but then we’d need a slightly more real protocol than “just blindly copy bytes back and forth with TCP” and I didn’t feel like doing that.
Also it sometimes takes a second to disconnect after the program exits for some reason, I’m not sure why that is.
That’s all! There are a couple of other similar toy implementations of programs I’ve written here:
]]>Cloudflare has been using Kafka in production since 2014. We have come a long way since then, and currently run 14 distinct Kafka clusters, across multiple data centers, with roughly 330 nodes. Between them, over a trillion messages have been processed over the last eight years.
Cloudflare uses Kafka to decouple microservices and communicate the creation, change or deletion of various resources via a common data format in a fault-tolerant manner. This decoupling is one of many factors that enables Cloudflare engineering teams to work on multiple features and products concurrently.
We learnt a lot about Kafka on the way to one trillion messages, and built some interesting internal tools to ease adoption that will be explored in this blog post. The focus in this blog post is on inter-application communication use cases alone and not logging (we have other Kafka clusters that power the dashboards where customers view statistics that handle more than one trillion messages each day). I am an engineer on the Application Services team and our team has a charter to provide tools/services to product teams, so they can focus on their core competency which is delivering value to our customers.
In this blog I’d like to recount some of our experiences in the hope that it helps other engineering teams who are on a similar journey of adopting Kafka widely.
One of our Kafka clusters is creatively named Messagebus. It is the most general purpose cluster we run, and was created to:
To make it as easy to use as possible and to encourage adoption, the Application Services team created two internal projects. The first is unimaginatively named Messagebus-Client. Messagebus-Client is a Go library that wraps the fantastic Shopify Sarama library with an opinionated set of configuration options and the ability to manage the rotation of mTLS certificates.
The success of this project is also somewhat its downfall. By providing a ready-to-go Kafka client, we ensured teams got up and running quickly, but we also abstracted some core concepts of Kafka a little too much, meaning that small unassuming configuration changes could have a big impact.
One such example led to partition skew (a large portion of messages being directed towards a single partition, meaning we were not processing messages in real time; see the chart below). One drawback of Kafka is you can only have one consumer per partition, so when incidents do occur, you can’t trivially scale your way to faster throughput.
That also means before your service hits production it is wise to do some back of the napkin math to figure out what throughput might look like, otherwise you will need to add partitions later. We have since amended our library to make events like the below less likely.
The reception for the Messagebus-Client has been largely positive. We spent time as a team to understand what the predominant use cases were, and took the concept one step further to build out what we call the connector framework.
The connector framework is based on Kafka-connectors and allows our engineers to easily spin up a service that can read from a system of record and push it somewhere else (such as Kafka, or even Cloudflare’s own Quicksilver). To make this as easy as possible, we use Cookiecutter templating to allow engineers to enter a few parameters into a CLI and in return receive a ready to deploy service.
We provide the ability to configure data pipelines via environment variables. For simple use cases, we provide the functionality out of the box. However, extending the readers, writers and transformations is as simple as satisfying an interface and “registering” the new entry.
For example, adding the environment variables:
READER=kafka
TRANSFORMATIONS=topic_router:topic1,topic2|pf_edge
WRITER=quicksilver
will:
Connectors come readily baked with basic metrics and alerts, so teams know they can move to production quickly but with confidence.
Below is a diagram of how one team used our connector framework to read from the Messagebus cluster and write to various other systems. This is orchestrated by a system the Application Service team runs called Communication Preferences Service (CPS). Whenever a user opts in/out of marketing emails or changes their language preferences on cloudflare.com, they are calling CPS which ensures those settings are reflected in all the relevant systems.
Alongside the Messagebus-Client library, we also provide a repo called Messagebus Schema. This is a schema registry for all message types that will be sent over our Messagebus cluster. For message format, we use protobuf and have been very happy with that decision. Previously, our team had used JSON for some of our kafka schemas, but we found it much harder to enforce forward and backwards compatibility, as well as message sizes being substantially larger than the protobuf equivalent. Protobuf provides strict message schemas (including type safety), the forward and backwards compatibility we desired, the ability to generate code in multiple languages as well as the files being very human-readable.
We encourage heavy commentary before approving a merge. Once merged, we use prototool to do breaking change detection, enforce some stylistic rules and to generate code for various languages (at time of writing it's just Go and Rust, but it is trivial to add more).
Furthermore, in Messagebus Schema we store a mapping of proto messages to a team, alongside that team’s chat room in our internal communication tool. This allows us to escalate issues to the correct team easily when necessary.
One important decision we made for the Messagebus cluster is to only allow one proto message per topic. This is configured in Messagebus Schema and enforced by the Messagebus-Client. This was a good decision to enable easy adoption, but it has led to numerous topics existing. When you consider that for each topic we create, we add numerous partitions and replicate them with a replication factor of at least three for resilience, there is a lot of potential to optimize compute for our lower throughput topics.
Making it easy for teams to observe Kafka is essential for our decoupled engineering model to be successful. We therefore have automated metrics and alert creation wherever we can to ensure that all the engineering teams have a wealth of information available to them to respond to any issues that arise in a timely manner.
We use Salt to manage our infrastructure configuration and follow a Gitops style model, where our repo holds the source of truth for the state of our infrastructure. To add a new Kafka topic, our engineers make a pull request into this repo and add a couple of lines of YAML. Upon merge, the topic and an alert for high lag (where lag is defined as the difference in time between the last committed offset being read and the last produced offset being produced) will be created. Other alerts can (and should) be created, but this is left to the discretion of application teams. The reason we automatically generate alerts for high lag is that this simple alert is a great proxy for catching a high amount of issues including:
For metrics, we use Prometheus and display them with Grafana. For each new topic created, we automatically provide a view into production rate, consumption rate and partition skew by producer/consumer. If an engineering team is called out, within the alert message is a link to this Grafana view.
In our Messagebus-Client, we expose some metrics automatically and users get the ability to extend them further. The metrics we expose by default are:
For producers:
For consumer:
Some teams use these for alerting on a significant change in throughput, others use them to alert if no messages are produced/consumed in a given time frame.
As well as providing the Messagebus framework, the Application Services team looks for common concerns within Engineering and looks to solve them in a scalable, extensible way which means other engineering teams can utilize the system and not have to build their own (thus meaning we are not building lots of disparate systems that are only slightly different).
One example is the Alert Notification System (ANS). ANS is the backend service for the “Notifications” tab in the Cloudflare dashboard. You may have noticed over the past 12 months that new alert and policy types have been made available to customers very regularly. This is because we have made it very easy for other teams to do this. The approach is:
That’s it! The producer team now has a means for customers to configure granular alerting policies for their new alert that includes being able to dispatch them via Slack, Google Chat or a custom webhook, PagerDuty or email (by both API and dashboard). Retrying and dead letter messages are managed for them, and a whole host of metrics are made available, all by making some very small changes.
Usage of Kafka (and our Messagebus tools) is only going to increase at Cloudflare as we continue to grow, and as a team we are committed to making the tooling around Messagebus easy to use, customizable where necessary and (perhaps most importantly) easy to observe. We regularly take feedback from other engineers to help improve the Messagebus-Client (we are on the fifth version now) and are currently experimenting with abstracting the intricacies of Kafka away completely and allowing teams to use gRPC to stream messages to Kafka. Blog post on the success/failure of this to follow!
If you're interested in building scalable services and solving interesting technical problems, we are hiring engineers on our team in Austin, and Remote US.
]]>Today, we are announcing experimental support for WASI (the WebAssembly System Interface) on Cloudflare Workers and support within wrangler2 to make it a joy to work with. We continue to be incredibly excited about the entire WebAssembly ecosystem and are eager to adopt the standards as they are developed.
So what is WASI anyway? To understand WASI, and why we’re excited about it, it’s worth a quick recap of WebAssembly, and the ecosystem around it.
WebAssembly promised us a future in which code written in compiled languages could be compiled to a common binary format and run in a secure sandbox, at near native speeds. While WebAssembly was designed with the browser in mind, the model rapidly extended to server-side platforms such as Cloudflare Workers (which has supported WebAssembly since 2017).
WebAssembly was originally designed to run alongside Javascript, and requires developers to interface directly with Javascript in order to access the world outside the sandbox. To put it another way, WebAssembly does not provide any standard interface for I/O tasks such as interacting with files, accessing the network, or reading the system clock. This means if you want to respond to an event from the outside world, it's up to the developer to handle that event in JavaScript, and directly call functions exported from the WebAssembly module. Similarly, if you want to perform I/O from within WebAssembly, you need to implement that logic in Javascript and import it into the WebAssembly module.
Custom toolchains such as Emscripten or libraries such as wasm-bindgen have emerged to make this easier, but they are language specific and add a tremendous amount of complexity and bloat. We've even built our own library, workers-rs, using wasm-bindgen that attempts to make writing applications in Rust feel native within a Worker – but this has proven not only difficult to maintain, but requires developers to write code that is Workers specific, and is not portable outside the Workers ecosystem.
We need more.
WASI aims to provide a standard interface that any language compiling to WebAssembly can target. You can read the original post by Lin Clark here, which gives an excellent introduction – code cartoons and all. In a nutshell, Lin describes WebAssembly as an assembly language for a 'conceptual machine', whereas WASI is a systems interface for a ‘conceptual operating system.’
This standardization of the system interface has paved the way for existing toolchains to cross-compile existing codebases to the wasm32-wasi target. A tremendous amount of progress has already been made, specifically within Clang/LLVM via the wasi-sdk and Rust toolchains. These toolchains leverage a version of Libc, which provides POSIX standard API calls, that is built on top of WASI 'system calls.' There are even basic implementations in more fringe toolchains such as TinyGo and SwiftWasm.
Practically speaking, this means that you can now write applications that not only interoperate with any WebAssembly runtime implementing the standard, but also any POSIX compliant system! This means the exact same 'Hello World!' that runs on your local Linux/Mac/Windows WSL machine.
WASI sounds great, but does it actually make my life easier? You tell us. Let’s run through an example of how this would work in practice.
First, let’s generate a basic Rust “Hello, world!” application, compile, and run it.
$ cargo new hello_world
$ cd ./hello_world
$ cargo build --release
Compiling hello_world v0.1.0 (/Users/benyule/hello_world)
Finished release [optimized] target(s) in 0.28s
$ ./target/release/hello_world
Hello, world!
It doesn’t get much simpler than this. You’ll notice we only define a main() function followed by a println to stdout.
fn main() {
println!("Hello, world!");
}
Now, let’s take the exact same program and compile against the wasm32-wasi target, and run it in an ‘off the shelf’ wasm runtime such as Wasmtime.
$ cargo build --target wasm32-wasi --release
$ wasmtime target/wasm32-wasi/release/hello_world.wasm
Hello, world!
Neat! The same code compiles and runs in multiple POSIX environments.
Finally, let’s take the binary we just generated for Wasmtime, but instead publish it to Workers using Wrangler2.
$ npx wrangler@wasm dev target/wasm32-wasi/release/hello_world.wasm
$ curl http://localhost:8787/
Hello, world!
Unsurprisingly, it works! The same code is compatible in multiple POSIX environments and the same binary is compatible across multiple WASM runtimes.
The attentive reader may notice that we played a small trick with the HTTP request made via cURL. In this example, we actually stream stdin and stdout to/from the Worker using the HTTP request and response body respectively. This pattern enables some really interesting use cases, specifically, programs designed to run on the command line can be deployed as 'services' to the cloud.
‘Hexyl’ is an example that works completely out of the box. Here, we ‘cat’ a binary file on our local machine and ‘pipe’ the output to curl, which will then POST that output to our service and stream the result back. Following the steps we used to compile our 'Hello World!', we can compile hexyl.
$ git clone git@github.com:sharkdp/hexyl.git
$ cd ./hexyl
$ cargo build --target wasm32-wasi --release
And without further modification we were able to take a real-world program and create something we can now run or deploy. Again, let's tell wrangler2 to preview hexyl, but this time give it some input.
$ npx wrangler@wasm dev target/wasm32-wasi/release/hexyl.wasm
$ echo "Hello, world\!" | curl -X POST --data-binary @- http://localhost:8787
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 48 65 6c 6c 6f 20 77 6f ┊ 72 6c 64 21 0a │Hello wo┊rld!_ │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘
Give it a try yourself by hitting https://hexyl.examples.workers.dev.
echo "Hello world\!" | curl https://hexyl.examples.workers.dev/ -X POST --data-binary @- --output -
A more useful example, but requires a bit more work, would be to deploy a utility such as swc (swc.rs), to the cloud and use it as an on demand JavaScript/TypeScript transpilation service. Here, we have a few extra steps to ensure that the compiled output is as small as possible, but it otherwise runs out-of-the-box. Those steps are detailed in github.com/zebp/wasi-example-swc, but for now let’s gloss over that and interact with the hosted example.
$ echo "const x = (x, y) => x * y;" | curl -X POST --data-binary @- https://swc-wasi.examples.workers.dev/ --output -
var x=function(a,b){return a*b}
Finally, we can also do the same with C/C++, but requires a little more lifting to get our Makefile right. Here we show an example of compiling zstd and uploading it as a streaming compression service.
github.com/zebp/wasi-example-zstd
$ echo "Hello world\!" | curl https://zstd.examples.workers.dev/ -s -X POST --data-binary @- | file -
Wrangler can make it really easy to deploy code without having to worry about the Workers ecosystem, but in some cases you may actually want to invoke your WASI based WASM module from Javascript. This can be achieved with the following simple boilerplate. An updated README will be kept at github.com/cloudflare/workers-wasi.
import { WASI } from "@cloudflare/workers-wasi";
import demoWasm from "./demo.wasm";
export default {
async fetch(request, _env, ctx) {
// Creates a TransformStream we can use to pipe our stdout to our response body.
const stdout = new TransformStream();
const wasi = new WASI({
args: [],
stdin: request.body,
stdout: stdout.writable,
});
// Instantiate our WASM with our demo module and our configured WASI import.
const instance = new WebAssembly.Instance(demoWasm, {
wasi_snapshot_preview1: wasi.wasiImport,
});
// Keep our worker alive until the WASM has finished executing.
ctx.waitUntil(wasi.start(instance));
// Finally, let's reply with the WASM's output.
return new Response(stdout.readable);
},
};
Now with our JavaScript boilerplate and wasm, we can easily deploy our worker with Wrangler’s WASM feature.
$ npx wrangler publish
Total Upload: 473.89 KiB / gzip: 163.79 KiB
Uploaded wasi-javascript (2.75 sec)
Published wasi-javascript (0.30 sec)
wasi-javascript.zeb.workers.dev
For those of you who have been around for the better part of the past couple of decades, you may notice this looks very similar to RFC3875, better known as CGI (The Common Gateway Interface). While our example here certainly does not conform to the specification, you can imagine how this can be extended to turn the stdin of a basic 'command line' application into a full-blown http handler.
We are thrilled to learn where developers take this from here. Share what you build with us on Discord or Twitter!
...
We protect entire corporate networks, help customers build Internet-scale applications efficiently, accelerate any website or Internet application, ward off DDoS attacks, keep hackers at bay, and can help you on your journey to Zero Trust.
Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.
]]>Here at Cloudflare we're constantly working on improving our service. Our engineers are looking at hundreds of parameters of our traffic, making sure that we get better all the time.
One of the core numbers we keep a close eye on is HTTP request latency, which is important for many of our products. We regard latency spikes as bugs to be fixed. One example is the 2017 story of "Why does one NGINX worker take all the load?", where we optimized our TCP Accept queues to improve overall latency of TCP sockets waiting for accept().
Performance tuning is a holistic endeavor, and we monitor and continuously improve a range of other performance metrics as well, including throughput. Sometimes, tradeoffs have to be made. Such a case occurred in 2015, when a latency spike was discovered in our processing of HTTP requests. The solution at the time was to set tcp_rmem to 4 MiB, which minimizes the amount of time the kernel spends on TCP collapse processing. It was this collapse processing that was causing the latency spikes. Later in this post we discuss TCP collapse processing in more detail.
The tradeoff is that using a low value for tcp_rmem limits TCP throughput over high latency links. The following graph shows the maximum throughput as a function of network latency for a window size of 2 MiB. Note that the 2 MiB corresponds to a tcp_rmem value of 4 MiB due to the tcp_adv_win_scale setting in effect at the time.
For the Cloudflare products then in existence, this was not a major problem, as connections terminate and content is served from nearby servers due to our BGP anycast routing.
Since then, we have added new products, such as Magic WAN, WARP, Spectrum, Gateway, and others. These represent new types of use cases and traffic flows.
For example, imagine you're a typical Magic WAN customer. You have connected all of your worldwide offices together using the Cloudflare global network. While Time to First Byte still matters, Magic WAN office-to-office traffic also needs good throughput. For example, a lot of traffic over these corporate connections will be file sharing using protocols such as SMB. These are elephant flows over long fat networks. Throughput is the metric every eyeball watches as they are downloading files.
We need to continue to provide world-class low latency while simultaneously providing high throughput over high-latency connections.
Before we begin, let’s introduce the players in our game.
TCP receive window is the maximum number of unacknowledged user payload bytes the sender should transmit (bytes-in-flight) at any point in time. The size of the receive window can and does go up and down during the course of a TCP session. It is a mechanism whereby the receiver can tell the sender to stop sending if the sent packets cannot be successfully received because the receive buffers are full. It is this receive window that often limits throughput over high-latency networks.
net.ipv4.tcp_adv_win_scale is a (non-intuitive) number used to account for the overhead needed by Linux to process packets. The receive window is specified in terms of user payload bytes. Linux needs additional memory beyond that to track other data associated with packets it is processing.
The value of the receive window changes during the lifetime of a TCP session, depending on a number of factors. The maximum value that the receive window can be is limited by the amount of free memory available in the receive buffer, according to this table:
tcp_adv_win_scale | TCP window size |
---|---|
4 | 15/16 * available memory in receive buffer |
3 | ⅞ * available memory in receive buffer |
2 | ¾ * available memory in receive buffer |
1 | ½ * available memory in receive buffer |
0 | available memory in receive buffer |
-1 | ½ * available memory in receive buffer |
-2 | ¼ * available memory in receive buffer |
-3 | ⅛ * available memory in receive buffer |
We can intuitively (and correctly) understand that the amount of available memory in the receive buffer is the difference between the used memory and the maximum limit. But what is the maximum size a receive buffer can be? The answer is sk_rcvbuf.
sk_rcvbuf is a per-socket field that specifies the maximum amount of memory that a receive buffer can allocate. This can be set programmatically with the socket option SO_RCVBUF. This can sometimes be useful to do, for localhost TCP sessions, for example, but in general the use of SO_RCVBUF is not recommended.
So how is sk_rcvbuf set? The most appropriate value for that depends on the latency of the TCP session and other factors. This makes it difficult for L7 applications to know how to set these values correctly, as they will be different for every TCP session. The solution to this problem is Linux autotuning.
Linux autotuning is logic in the Linux kernel that adjusts the buffer size limits and the receive window based on actual packet processing. It takes into consideration a number of things including TCP session RTT, L7 read rates, and the amount of available host memory.
Autotuning can sometimes seem mysterious, but it is actually fairly straightforward.
The central idea is that Linux can track the rate at which the local application is reading data off of the receive queue. It also knows the session RTT. Because Linux knows these things, it can automatically increase the buffers and receive window until it reaches the point at which the application layer or network bottleneck links are the constraint on throughput (and not host buffer settings). At the same time, autotuning prevents slow local readers from having excessively large receive queues. The way autotuning does that is by limiting the receive window and its corresponding receive buffer to an appropriate size for each socket.
The values set by autotuning can be seen via the Linux “ss
” command from the iproute
package (e.g. “ss -tmi
”). The relevant output fields from that command are:
Recv-Q is the number of user payload bytes not yet read by the local application.
rcv_ssthresh is the window clamp, a.k.a. the maximum receive window size. This value is not known to the sender. The sender receives only the current window size, via the TCP header field. A closely-related field in the kernel, tp->window_clamp, is the maximum window size allowable based on the amount of available memory. rcv_ssthresh is the receiver-side slow-start threshold value.
skmem_r is the actual amount of memory that is allocated, which includes not only user payload (Recv-Q) but also additional memory needed by Linux to process the packet (packet metadata). This is known within the kernel as sk_rmem_alloc.
Note that there are other buffers associated with a socket, so skmem_r does not represent the total memory that a socket might have allocated. Those other buffers are not involved in the issues presented in this post.
skmem_rb is the maximum amount of memory that could be allocated by the socket for the receive buffer. This is higher than rcv_ssthresh to account for memory needed for packet processing that is not packet data. Autotuning can increase this value (up to tcp_rmem max) based on how fast the L7 application is able to read data from the socket and the RTT of the session. This is known within the kernel as sk_rcvbuf.
rcv_space is the high water mark of the rate of the local application reading from the receive buffer during any RTT. This is used internally within the kernel to adjust sk_rcvbuf.
Earlier we mentioned a setting called tcp_rmem. net.ipv4.tcp_rmem consists of three values, but in this document we are always referring to the third value (except where noted). It is a global setting that specifies the maximum amount of memory that any TCP receive buffer can allocate, i.e. the maximum permissible value that autotuning can use for sk_rcvbuf. This is essentially just a failsafe for autotuning, and under normal circumstances should play only a minor role in TCP memory management.
It’s worth mentioning that receive buffer memory is not preallocated. Memory is allocated based on actual packets arriving and sitting in the receive queue. It’s also important to realize that filling up a receive queue is not one of the criteria that autotuning uses to increase sk_rcvbuf. Indeed, preventing this type of excessive buffering (bufferbloat) is one of the benefits of autotuning.
The problem is that we must have a large TCP receive window for high BDP sessions. This is directly at odds with the latency spike problem mentioned above.
Something has to give. The laws of physics (speed of light in glass, etc.) dictate that we must use large window sizes. There is no way to get around that. So we are forced to solve the latency spikes differently.
Sometimes a TCP session will fill up its receive buffers. When that happens, the Linux kernel will attempt to reduce the amount of memory the receive queue is using by performing what amounts to a “defragmentation” of memory. This is called collapsing the queue. Collapsing the queue takes time, which is what drives up HTTP request latency.
We do not want to spend time collapsing TCP queues.
Why do receive queues fill up to the point where they hit the maximum memory limit? The usual situation is when the local application starts out reading data from the receive queue at one rate (triggering autotuning to raise the max receive window), followed by the local application slowing down its reading from the receive queue. This is valid behavior, and we need to handle it correctly.
Before exploring solutions, let’s first decide what we need as the maximum TCP window size.
As we have seen above in the discussion about BDP, the window size is determined based upon the RTT and desired throughput of the connection.
Because Linux autotuning will adjust correctly for sessions with lower RTTs and bottleneck links with lower throughput, all we need to be concerned about are the maximums.
For latency, we have chosen 300 ms as the maximum expected latency, as that is the measured latency between our Zurich and Sydney facilities. It seems reasonable enough as a worst-case latency under normal circumstances.
For throughput, although we have very fast and modern hardware on the Cloudflare global network, we don’t expect a single TCP session to saturate the hardware. We have arbitrarily chosen 3500 mbps as the highest supported throughput for our highest latency TCP sessions.
The calculation for those numbers results in a BDP of 131MB, which we round to the more aesthetic value of 128 MiB.
Recall that allocation of TCP memory includes metadata overhead in addition to packet data. The ratio of actual amount of memory allocated to user payload size varies, depending on NIC driver settings, packet size, and other factors. For full-sized packets on some of our hardware, we have measured average allocations up to 3 times the packet data size. In order to reduce the frequency of TCP collapse on our servers, we set tcp_adv_win_scale to -2. From the table above, we know that the max window size will be ¼ of the max buffer space.
We end up with the following sysctl values:
net.ipv4.tcp_rmem = 8192 262144 536870912
net.ipv4.tcp_wmem = 4096 16384 536870912
net.ipv4.tcp_adv_win_scale = -2
A tcp_rmem of 512MiB and tcp_adv_win_scale of -2 results in a maximum window size that autotuning can set of 128 MiB, our desired value.
Patient: Doctor, it hurts when we collapse the TCP receive queue.
Doctor: Then don’t do that!
Generally speaking, when a packet arrives at a buffer when the buffer is full, the packet gets dropped. In the case of these receive buffers, Linux tries to “save the packet” when the buffer is full by collapsing the receive queue. Frequently this is successful, but it is not guaranteed to be, and it takes time.
There are no problems created by immediately just dropping the packet instead of trying to save it. The receive queue is full anyway, so the local receiver application still has data to read. The sender’s congestion control will notice the drop and/or ZeroWindow and will respond appropriately. Everything will continue working as designed.
At present, there is no setting provided by Linux to disable the TCP collapse. We developed an in-house patch to the kernel to disable the TCP collapse logic.
The kernel patch for our first attempt was straightforward. At the top of tcp_try_rmem_schedule(), if the memory allocation fails, we simply return (after pred_flag = 0 and tcp_sack_reset()), thus completely skipping the tcp_collapse and related logic.
It didn’t work.
Although we eliminated the latency spikes while using large buffer limits, we did not observe the throughput we expected.
One of the realizations we made as we investigated the situation was that standard network benchmarking tools such as iperf3 and similar do not expose the problem we are trying to solve. iperf3 does not fill the receive queue. Linux autotuning does not open the TCP window large enough. Autotuning is working perfectly for our well-behaved benchmarking program.
We need application-layer software that is slightly less well-behaved, one that exercises the autotuning logic under test. So we wrote one.
Anomalies were seen during our “Attempt #1” that negatively impacted throughput. The anomalies were seen only under certain specific conditions, and we realized we needed a better benchmarking tool to detect and measure the performance impact of those anomalies.
This tool has turned into an invaluable resource during the development of this patch and raised confidence in our solution.
It consists of two Python programs. The reader opens a TCP session to the daemon, at which point the daemon starts sending user payload as fast as it can, and never stops sending.
The reader, on the other hand, starts and stops reading in a way to open up the TCP receive window wide open and then repeatedly causes the buffers to fill up completely. More specifically, the reader implemented this logic:
This has the effect of highlighting any issues in the handling of packets when the buffers repeatedly hit the limit.
Taking a step back, let’s look at the default Linux behavior. The following is kernel v5.15.16.
The Linux kernel is effective at freeing up space in order to make room for incoming packets when the receive buffer memory limit is hit. As documented previously, the cost for saving these packets (i.e. not dropping them) is latency.
However, the latency spikes, in milliseconds, for tcp_try_rmem_schedule(), are:
tcp_rmem 170 MiB, tcp_adv_win_scale +2 (170p2):
@ms:
[0] 27093 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[1] 0 |
[2, 4) 0 |
[4, 8) 0 |
[8, 16) 0 |
[16, 32) 0 |
[32, 64) 16 |
tcp_rmem 146 MiB, tcp_adv_win_scale +3 (146p3):
@ms:
(..., 16) 25984 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[16, 20) 0 |
[20, 24) 0 |
[24, 28) 0 |
[28, 32) 0 |
[32, 36) 0 |
[36, 40) 0 |
[40, 44) 1 |
[44, 48) 6 |
[48, 52) 6 |
[52, 56) 3 |
tcp_rmem 137 MiB, tcp_adv_win_scale +4 (137p4):
@ms:
(..., 16) 37222 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[16, 20) 0 |
[20, 24) 0 |
[24, 28) 0 |
[28, 32) 0 |
[32, 36) 0 |
[36, 40) 1 |
[40, 44) 8 |
[44, 48) 2 |
These are the latency spikes we cannot have on the Cloudflare global network.
So the “something” that was not working in Attempt #1 was that the receive queue memory limit was hit early on as the flow was just ramping up (when the values for sk_rmem_alloc and sk_rcvbuf were small, ~800KB). This occurred at about the two second mark for 137p4 test (about 2.25 seconds for 170p2).
In hindsight, we should have noticed that tcp_prune_queue() actually raises sk_rcvbuf when it can. So we modified the patch in response to that, added a guard to allow the collapse to execute when sk_rmem_alloc is less than the threshold value.
net.ipv4.tcp_collapse_max_bytes = 6291456
The next section discusses how we arrived at this value for tcp_collapse_max_bytes.
The patch is available here.
The results with the new patch are as follows:
oscil – 300ms tests
oscil – 20ms tests
oscil – 0ms tests
iperf3 – 300 ms tests
iperf3 – 20 ms tests
iperf3 – 0ms tests
All tests are successful.
In order to determine this setting, we need to understand what the biggest queue we can collapse without incurring unacceptable latency.
Using 6 MiB should result in a maximum latency of no more than 2 ms.
net.ipv4.tcp_rmem = 8192 2097152 16777216
net.ipv4.tcp_wmem = 4096 16384 33554432
net.ipv4.tcp_adv_win_scale = -2
net.ipv4.tcp_collapse_max_bytes = 0
net.ipv4.tcp_notsent_lowat = 4294967295
tcp_collapse_max_bytes of 0 means that the custom feature is disabled and that the vanilla kernel logic is used for TCP collapse processing.
net.ipv4.tcp_rmem = 8192 262144 536870912
net.ipv4.tcp_wmem = 4096 16384 536870912
net.ipv4.tcp_adv_win_scale = -2
net.ipv4.tcp_collapse_max_bytes = 6291456
net.ipv4.tcp_notsent_lowat = 131072
The tcp_notsent_lowat setting is discussed in the last section of this post.
The middle value of tcp_rmem was changed as a result of separate work that found that Linux autotuning was setting receive buffers too high for localhost sessions. This updated setting reduces TCP memory usage for those sessions, but does not change anything about the type of TCP sessions that is the focus of this post.
For the following benchmarks, we used non-Cloudflare host machines in Iowa, US, and Melbourne, Australia performing data transfers to the Cloudflare data center in Marseille, France. In Marseille, we have some hosts configured with the existing production settings, and others with the system settings described in this post. Software used is perf3 version 3.9, kernel 5.15.32.
RTT (ms) | Throughput with Current Settings (mbps) | Throughput with New Settings (mbps) | Increase Factor | |
---|---|---|---|---|
Iowa to Marseille | 121 | 276 | 6600 | 24x |
Melbourne to Marseille | 282 | 120 | 3800 | 32x |
Iowa-Marseille throughput
Iowa-Marseille receive window and bytes-in-flight
Melbourne-Marseille throughput
Melbourne-Marseille receive window and bytes-in-flight
Even with the new settings in place, the Melbourne to Marseille performance is limited by the receive window on the Cloudflare host. This means that further adjustments to these settings yield even higher throughput.
The Y-axis on these charts are the 99th percentile time for TCP collapse in seconds.
Cloudflare hosts in Marseille running the current production settings
Cloudflare hosts in Marseille running the new settings
The takeaway in looking at these graphs is that maximum TCP collapse time for the new settings is no worse than with the current production settings. This is the desired result.
What we have shown so far is that the receiver side seems to be working well, but what about the sender side?
As part of this work, we are setting tcp_wmem max to 512 MiB. For oscillating reader flows, this can cause the send buffer to become quite large. This represents bufferbloat and wasted kernel memory, both things that nobody likes or wants.
Fortunately, there is already a solution: tcp_notsent_lowat. This setting limits the size of unsent bytes in the write queue. More details can be found at https://lwn.net/Articles/560082.
The results are significant:
The RTT for these tests was 466ms. Throughput is not negatively affected. Throughput is at full wire speed in all cases (1 Gbps). Memory usage is as reported by /proc/net/sockstat, TCP mem.
Our web servers already set tcp_notsent_lowat to 131072 for its sockets. All other senders are using 4 GiB, the default value. We are changing the sysctl so that 131072 is in effect for all senders running on the server.
The goal of this work is to open the throughput floodgates for high BDP connections while simultaneously ensuring very low HTTP request latency.
We have accomplished that goal.
]]>For example, the global function fetch()
(which downloads online resources) asynchronously returns a Response which has a property .body
with a web stream.
This blog post covers web streams on Node.js, but most of what we learn applies to all web platforms that support them.
]]>In an interview published today, Bethesda’s creative director Todd Howard revealed the studio’s future plans, explaining that after the Elder Scrolls 6 comes out, the studio’s next game will be Fallout 5, the next main entry in the company’s post-apocalyptic open-world RPG franchise. However, considering how long it…
]]>I have found my happy place! Escape Simulator is such a lovely thing, a first-person simulacrum of escape rooms, built in 3D, with realistic physics. It is, as its title suggests, a simulation of attending a real-world escape room, in a way that almost all room-escape video games are not. Apart from when it’s in space.
Surviving in Horizon Forbidden West, an open-world game about what happens when Elon Musk funds one too many projects, is only partially contingent on skill. But if you want to thrive in the robo-dino apocalypse, you’ll need to equip yourself with the best gear around.
As we develop new products, we often push our operating system - Linux - beyond what is commonly possible. A common theme has been relying on eBPF to build technology that would otherwise have required modifying the kernel. For example, we’ve built DDoS mitigation and a load balancer and use it to monitor our fleet of servers.
This software usually consists of a small-ish eBPF program written in C, executed in the context of the kernel, and a larger user space component that loads the eBPF into the kernel and manages its lifecycle. We’ve found that the ratio of eBPF code to userspace code differs by an order of magnitude or more. We want to shed some light on the issues that a developer has to tackle when dealing with eBPF and present our solutions for building rock-solid production ready applications which contain eBPF.
For this purpose we are open sourcing the production tooling we’ve built for the sk_lookup hook we contributed to the Linux kernel, called tubular. It exists because we’ve outgrown the BSD sockets API. To deliver some products we need features that are just not possible using the standard API.
The source code for tubular is at github.com/cloudflare/tubular, and it allows you to do all the things mentioned above. Maybe the most interesting feature is that you can change the addresses of a service on the fly:
tubular
sits at a critical point in the Cloudflare stack, since it has to inspect every connection terminated by a server and decide which application should receive it.
Failure to do so will drop or misdirect connections hundreds of times per second. So it has to be incredibly robust during day to day operations. We had the following goals for tubular:
In the past we had built a proof-of-concept control plane for sk_lookup called inet-tool, which proved that we could get away without a persistent service managing the eBPF. Similarly, tubular has tubectl
: short-lived invocations make the necessary changes and persisting state is handled by the kernel in the form of eBPF maps. Following this design gave us crash resiliency by default, but left us with the task of mapping the user interface we wanted to the tools available in the eBPF ecosystem.
tubular consists of a BPF program that attaches to the sk_lookup hook in the kernel and userspace Go code which manages the BPF program. The tubectl
command wraps both in a way that is easy to distribute.
tubectl
manages two kinds of objects: bindings and sockets. A binding encodes a rule against which an incoming packet is matched. A socket is a reference to a TCP or UDP socket that can accept new connections or packets.
Bindings and sockets are "glued" together via arbitrary strings called labels. Conceptually, a binding assigns a label to some traffic. The label is then used to find the correct socket.
To create a binding that steers port 80 (aka HTTP) traffic destined for 127.0.0.1 to the label “foo” we use tubectl bind
:
$ sudo tubectl bind "foo" tcp 127.0.0.1 80
Due to the power of sk_lookup we can have much more powerful constructs than the BSD API. For example, we can redirect connections to all IPs in 127.0.0.0/24 to a single socket:
$ sudo tubectl bind "bar" tcp 127.0.0.0/24 80
A side effect of this power is that it's possible to create bindings that "overlap":
1: tcp 127.0.0.1/32 80 -> "foo"
2: tcp 127.0.0.0/24 80 -> "bar"
The first binding says that HTTP traffic to localhost should go to “foo”, while the second asserts that HTTP traffic in the localhost subnet should go to “bar”. This creates a contradiction, which binding should we choose? tubular resolves this by defining precedence rules for bindings:
Applying this to our example, HTTP traffic to all IPs in 127.0.0.0/24 will be directed to foo, except for 127.0.0.1 which goes to bar.
sk_lookup
needs a reference to a TCP or a UDP socket to redirect traffic to it. However, a socket is usually accessible only by the process which created it with the socket syscall. For example, an HTTP server creates a TCP listening socket bound to port 80. How can we gain access to the listening socket?
A fairly well known solution is to make processes cooperate by passing socket file descriptors via SCM_RIGHTS messages to a tubular daemon. That daemon can then take the necessary steps to hook up the socket with sk_lookup
. This approach has several drawbacks:
There is another way of getting at sockets by using systemd, provided socket activation is used. It works by creating an additional service unit with the correct Sockets setting. In other words: we can leverage systemd oneshot action executed on creation of a systemd socket service, registering the socket into tubular. For example:
[Unit]
Requisite=foo.socket
[Service]
Type=oneshot
Sockets=foo.socket
ExecStart=tubectl register "foo"
Since we can rely on systemd to execute tubectl
at the correct times we don't need a daemon of any kind. However, the reality is that a lot of popular software doesn't use systemd socket activation. Dealing with systemd sockets is complicated and doesn't invite experimentation. Which brings us to the final trick: pidfd_getfd:
The pidfd_getfd()
system call allocates a new file descriptor in the calling process. This new file descriptor is a duplicate of an existing file descriptor, targetfd, in the process referred to by the PID file descriptor pidfd.
We can use it to iterate all file descriptors of a foreign process, and pick the socket we are interested in. To return to our example, we can use the following command to find the TCP socket bound to 127.0.0.1 port 8080 in the httpd process and register it under the "foo" label:
$ sudo tubectl register-pid "foo" $(pidof httpd) tcp 127.0.0.1 8080
It's easy to wire this up using systemd's ExecStartPost if the need arises.
[Service]
Type=forking # or notify
ExecStart=/path/to/some/command
ExecStartPost=tubectl register-pid $MAINPID foo tcp 127.0.0.1 8080
As mentioned previously, tubular relies on the kernel to store state, using BPF key / value data structures also known as maps. Using the BPF_OBJ_PIN syscall we can persist them in /sys/fs/bpf:
/sys/fs/bpf/4026532024_dispatcher
├── bindings
├── destination_metrics
├── destinations
├── sockets
└── ...
The way the state is structured differs from how the command line interface presents it to users. Labels like “foo” are convenient for humans, but they are of variable length. Dealing with variable length data in BPF is cumbersome and slow, so the BPF program never references labels at all. Instead, the user space code allocates numeric IDs, which are then used in the BPF. Each ID represents a (label
, domain
, protocol
) tuple, internally called destination
.
For example, adding a binding for "foo" tcp 127.0.0.1
... allocates an ID for ("foo
", AF_INET
, TCP
). Including domain and protocol in the destination allows simpler data structures in the BPF. Each allocation also tracks how many bindings reference a destination so that we can recycle unused IDs. This data is persisted into the destinations hash table, which is keyed by (Label, Domain, Protocol) and contains (ID, Count). Metrics for each destination are tracked in destination_metrics in the form of per-CPU counters.
bindings
is a longest prefix match (LPM) trie which stores a mapping from (protocol
, port
, prefix
) to (ID
, prefix length
). The ID is used as a key to the sockets map which contains pointers to kernel socket structures. IDs are allocated in a way that makes them suitable as an array index, which allows using the simpler BPF sockmap (an array) instead of a socket hash table. The prefix length is duplicated in the value to work around shortcomings in the BPF API.
As discussed, bindings have a precedence associated with them. To repeat the earlier example:
1: tcp 127.0.0.1/32 80 -> "foo"
2: tcp 127.0.0.0/24 80 -> "bar"
The first binding should be matched before the second one. We need to encode this in the BPF somehow. One idea is to generate some code that executes the bindings in order of specificity, a technique we’ve used to great effect in l4drop:
1: if (mask(ip, 32) == 127.0.0.1) return "foo"
2: if (mask(ip, 24) == 127.0.0.0) return "bar"
...
This has the downside that the program gets longer the more bindings are added, which slows down execution. It's also difficult to introspect and debug such long programs. Instead, we use a specialised BPF longest prefix match (LPM) map to do the hard work. This allows inspecting the contents from user space to figure out which bindings are active, which is very difficult if we had compiled bindings into BPF. The LPM map uses a trie behind the scenes, so lookup has complexity proportional to the length of the key instead of linear complexity for the “naive” solution.
However, using a map requires a trick for encoding the precedence of bindings into a key that we can look up. Here is a simplified version of this encoding, which ignores IPv6 and uses labels instead of IDs. To insert the binding tcp 127.0.0.0/24 80
into a trie we first convert the IP address into a number.
127.0.0.0 = 0x7f 00 00 00
Since we're only interested in the first 24 bits of the address we, can write the whole prefix as
127.0.0.0/24 = 0x7f 00 00 ??
where “?” means that the value is not specified. We choose the number 0x01 to represent TCP and prepend it and the port number (80 decimal is 0x50 hex) to create the full key:
tcp 127.0.0.0/24 80 = 0x01 50 7f 00 00 ??
Converting tcp 127.0.0.1/32 80
happens in exactly the same way. Once the converted values are inserted into the trie, the LPM trie conceptually contains the following keys and values.
LPM trie:
0x01 50 7f 00 00 ?? = "bar"
0x01 50 7f 00 00 01 = "foo"
To find the binding for a TCP packet destined for 127.0.0.1:80, we again encode a key and perform a lookup.
input: 0x01 50 7f 00 00 01 TCP packet to 127.0.0.1:80
---------------------------
LPM trie:
0x01 50 7f 00 00 ?? = "bar"
y y y y y
0x01 50 7f 00 00 01 = "foo"
y y y y y y
---------------------------
result: "foo"
y = byte matches
The trie returns “foo” since its key shares the longest prefix with the input. Note that we stop comparing keys once we reach unspecified “?” bytes, but conceptually “bar” is still a valid result. The distinction becomes clear when looking up the binding for a TCP packet to 127.0.0.255:80.
input: 0x01 50 7f 00 00 ff TCP packet to 127.0.0.255:80
---------------------------
LPM trie:
0x01 50 7f 00 00 ?? = "bar"
y y y y y
0x01 50 7f 00 00 01 = "foo"
y y y y y n
---------------------------
result: "bar"
n = byte doesn't match
In this case "foo" is discarded since the last byte doesn't match the input. However, "bar" is returned since its last byte is unspecified and therefore considered to be a valid match.
Linux has the powerful ss tool (part of iproute2) available to inspect socket state:
$ ss -tl src 127.0.0.1
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 127.0.0.1:ipp 0.0.0.0:*
With tubular in the picture this output is not accurate anymore. tubectl
bindings makes up for this shortcoming:
$ sudo tubectl bindings tcp 127.0.0.1
Bindings:
protocol prefix port label
tcp 127.0.0.1/32 80 foo
Running this command requires super-user privileges, despite in theory being safe for any user to run. While this is acceptable for casual inspection by a human operator, it's a dealbreaker for observability via pull-based monitoring systems like Prometheus. The usual approach is to expose metrics via an HTTP server, which would have to run with elevated privileges and be accessible to the Prometheus server somehow. Instead, BPF gives us the tools to enable read-only access to tubular state with minimal privileges.
The key is to carefully set file ownership and mode for state in /sys/fs/bpf. Creating and opening files in /sys/fs/bpf uses BPF_OBJ_PIN and BPF_OBJ_GET. Calling BPF_OBJ_GET with BPF_F_RDONLY is roughly equivalent to open(O_RDONLY) and allows accessing state in a read-only fashion, provided the file permissions are correct. tubular gives the owner full access but restricts read-only access to the group:
$ sudo ls -l /sys/fs/bpf/4026532024_dispatcher | head -n 3
total 0
-rw-r----- 1 root root 0 Feb 2 13:19 bindings
-rw-r----- 1 root root 0 Feb 2 13:19 destination_metrics
It's easy to choose which user and group should own state when loading tubular:
$ sudo -u root -g tubular tubectl load
created dispatcher in /sys/fs/bpf/4026532024_dispatcher
loaded dispatcher into /proc/self/ns/net
$ sudo ls -l /sys/fs/bpf/4026532024_dispatcher | head -n 3
total 0
-rw-r----- 1 root tubular 0 Feb 2 13:42 bindings
-rw-r----- 1 root tubular 0 Feb 2 13:42 destination_metrics
There is one more obstacle, systemd mounts /sys/fs/bpf in a way that makes it inaccessible to anyone but root. Adding the executable bit to the directory fixes this.
$ sudo chmod -v o+x /sys/fs/bpf
mode of '/sys/fs/bpf' changed from 0700 (rwx------) to 0701 (rwx-----x)
Finally, we can export metrics without privileges:
$ sudo -u nobody -g tubular tubectl metrics 127.0.0.1 8080
Listening on 127.0.0.1:8080
^C
There is a caveat, unfortunately: truly unprivileged access requires unprivileged BPF to be enabled. Many distros have taken to disabling it via the unprivileged_bpf_disabled sysctl, in which case scraping metrics does require CAP_BPF.
tubular is distributed as a single binary, but really consists of two pieces of code with widely differing lifetimes. The BPF program is loaded into the kernel once and then may be active for weeks or months, until it is explicitly replaced. In fact, a reference to the program (and link, see below) is persisted into /sys/fs/bpf:
/sys/fs/bpf/4026532024_dispatcher
├── link
├── program
└── ...
The user space code is executed for seconds at a time and is replaced whenever the binary on disk changes. This means that user space has to be able to deal with an "old" BPF program in the kernel somehow. The simplest way to achieve this is to compare what is loaded into the kernel with the BPF shipped as part of tubectl. If the two don't match we return an error:
$ sudo tubectl bind foo tcp 127.0.0.1 80
Error: bind: can't open dispatcher: loaded program #158 has differing tag: "938c70b5a8956ff2" doesn't match "e007bfbbf37171f0"
tag
is the truncated hash of the instructions making up a BPF program, which the kernel makes available for every loaded program:
$ sudo bpftool prog list id 158
158: sk_lookup name dispatcher tag 938c70b5a8956ff2
...
By comparing the tag tubular asserts that it is dealing with a supported version of the BPF program. Of course, just returning an error isn't enough. There needs to be a way to update the kernel program so that it's once again safe to make changes. This is where the persisted link in /sys/fs/bpf comes into play. bpf_links
are used to attach programs to various BPF hooks. "Enabling" a BPF program is a two-step process: first, load the BPF program, next attach it to a hook using a bpf_link. Afterwards the program will execute the next time the hook is executed. By updating the link we can change the program on the fly, in an atomic manner.
$ sudo tubectl upgrade
Upgraded dispatcher to 2022.1.0-dev, program ID #159
$ sudo bpftool prog list id 159
159: sk_lookup name dispatcher tag e007bfbbf37171f0
…
$ sudo tubectl bind foo tcp 127.0.0.1 80
bound foo#tcp:[127.0.0.1/32]:80
Behind the scenes the upgrade procedure is slightly more complicated, since we have to update the pinned program reference in addition to the link. We pin the new program into /sys/fs/bpf:
/sys/fs/bpf/4026532024_dispatcher
├── link
├── program
├── program-upgrade
└── ...
Once the link is updated we atomically rename program-upgrade to replace program. In the future we may be able to use RENAME_EXCHANGE to make upgrades even safer.
So far we’ve completely neglected the fact that multiple invocations of tubectl
could modify the state in /sys/fs/bpf at the same time. It’s very hard to reason about what would happen in this case, so in general it’s best to prevent this from ever occurring. A common solution to this is advisory file locks. Unfortunately it seems like BPF maps don't support locking.
$ sudo flock /sys/fs/bpf/4026532024_dispatcher/bindings echo works!
flock: cannot open lock file /sys/fs/bpf/4026532024_dispatcher/bindings: Input/output error
This led to a bit of head scratching on our part. Luckily it is possible to flock the directory instead of individual maps:
$ sudo flock --exclusive /sys/fs/bpf/foo echo works!
works!
Each tubectl
invocation likewise invokes flock()
, thereby guaranteeing that only ever a single process is making changes.
tubular is in production at Cloudflare today and has simplified the deployment of Spectrum and our authoritative DNS. It allowed us to leave behind limitations of the BSD socket API. However, its most powerful feature is that the addresses a service is available on can be changed on the fly. In fact, we have built tooling that automates this process across our global network. Need to listen on another million IPs on thousands of machines? No problem, it’s just an HTTP POST away.
Interested in working on tubular and our L4 load balancer unimog? We are hiring in our European offices.
]]>Linux users on Tuesday got a major dose of bad news—a 12-year-old vulnerability in a system tool called Polkit gives attackers unfettered root privileges on machines running any major distribution of the open source operating system.
Previously called PolicyKit, Polkit manages system-wide privileges in Unix-like OSes. It provides a mechanism for nonprivileged processes to safely interact with privileged processes. It also allows users to execute commands with high privileges by using a component called pkexec, followed by the command.
Like most OSes, Linux provides a hierarchy of permission levels that controls when and what apps or users can interact with sensitive system resources. The design is intended to limit the damage that can happen if the app is hacked or malicious or if a user isn’t trusted to have administrative control of a network.
]]>Recently, we made an optimization to the Cloudflare Workers runtime which reduces the amount of time Workers need to spend in memory. We're passing the savings on to you for all your Unbound Workers.
Workers are often used to implement HTTP proxies, where JavaScript is used to rewrite an HTTP request before sending it on to an origin server, and then to rewrite the response before sending it back to the client. You can implement any kind of rewrite in a Worker, including both rewriting headers and bodies.
Many Workers, though, do not actually modify the response body, but instead simply allow the bytes to pass through from the origin to the client. In this case, the Worker's application code has finished executing as soon as the response headers are sent, before the body bytes have passed through. Historically, the Worker was nevertheless considered to be "in use" until the response body had fully finished streaming.
For billing purposes, under the Workers Unbound pricing model, we charge duration-memory (gigabyte-seconds) for the time in which the Worker is in use.
On December 15-16, we made a change to the way we handle requests that are streaming through the response without modifying the content. This change means that we can mark application code as “idle” as soon as the response headers are returned.
Since no further application code will execute on behalf of the request, the system does not need to keep the request state in memory – it only needs to track the low-level native sockets and pump the bytes through. So now, during this time, the Worker will be considered idle, and could even be evicted before the stream completes (though this would be unlikely unless the stream lasts for a very long time).
Visualized it looks something like this:
As a result of this change, we've seen that the time a Worker is considered "in use" by any particular request has dropped by an average of 70%. Of course, this number varies a lot depending on the details of each Worker. Some may see no benefit, others may see an even larger benefit.
This change is totally invisible to the application. To any external observer, everything behaves as it did before. But, since the system now considers a Worker to be idle during response streaming, the response streaming time will no longer be billed. So, if you saw a drop in your bill, this is why!
The change also applies to a few other frequently used scenarios, namely Websocket proxying, reading from the cache and streaming from KV.
WebSockets: once a Worker has arranged to proxy through a WebSocket, as long as it isn't handling individual messages in your Worker code, the Worker does not remain in use during the proxying. The change applies to regular stateless Workers, but not to Durable Objects, which are not usually used for proxying.
export default {
async fetch(request: Request) {
//Do anything before
const upgradeHeader = request.headers.get('Upgrade')
if (upgradeHeader || upgradeHeader === 'websocket') {
return await fetch(request)
}
//Or with other requests
}
}
Reading from Cache: If you return the response from a cache.match
call, the Worker is considered idle as soon as the response headers are returned.
export default {
async fetch(request: Request) {
let response = await caches.default.match('https://example.com')
if (response) {
return response
}
// get/create response and put into cache
}
}
Streaming from KV: And lastly, when you stream from KV. This one is a bit trickier to get right, because often people retrieve the value from KV as a string, or JSON object and then create a response with that value. But if you fetch the value as a stream, as done in the example below, you can create a Response with the ReadableStream.
interface Env {
MY_KV_NAME: KVNamespace
}
export default {
async fetch(request: Request, env: Env) {
const readableStream = await env.MY_KV_NAME.get('hello_world.pdf', { type: 'stream' })
if (readableStream) {
return new Response(readableStream, { headers: { 'content-type': 'application/pdf' } })
}
},
}
If you are already using Unbound, your bill will have automatically dropped already.
Now is a great time to check out Unbound if you haven’t already, especially since recently, we’ve also removed the egress fees. Unbound allows you to build more complex workloads on our platform and only pay for what you use.
We are always looking for opportunities to make Workers better. Often that improvement takes the form of powerful new features such as the soon-to-be released Service Bindings and, of course, performance enhancements. This time, we are delighted to make Cloudflare Workers even cheaper than they already were.
]]>In the world of video games, 2021 may forever be remembered as the year of COVID's great reckoning. 2020 was already rough, but many of its biggest games were mostly completed in a normal development cycle. Projects slated for the following year weren't as lucky.
Thus, this year's gaming news was rich with delays, piping-hot launches, unfinished messes, and game publishers scrambling to fill their schedules with undercooked backup plans. And that says nothing about gamers themselves, wondering if crucial chips and parts might ever be plentiful enough again so they can buy the latest in console and PC gear.
Yet against all odds, fantastic games still crossed 2021's finish line, ranging from big-budget behemoths to surprising indies. This year, in an effort to reduce ranking-based ire and celebrate every game on our list, we're removing numbered rankings, with the exception of crowning a formal Ars Technica pick for Best Video Game of 2021 at the list's very end.
]]>Before Karpenter, Kubernetes users needed to dynamically adjust the compute capacity of their clusters to support applications using Amazon EC2 Auto Scaling groups and the Kubernetes Cluster Autoscaler. Nearly half of Kubernetes customers on AWS report that configuring cluster auto scaling using the Kubernetes Cluster Autoscaler is challenging and restrictive.
When Karpenter is installed in your cluster, Karpenter observes the aggregate resource requests of unscheduled pods and makes decisions to launch new nodes and terminate them to reduce scheduling latencies and infrastructure costs. Karpenter does this by observing events within the Kubernetes cluster and then sending commands to the underlying cloud provider’s compute service, such as Amazon EC2.
Karpenter is an open-source project licensed under the Apache License 2.0. It is designed to work with any Kubernetes cluster running in any environment, including all major cloud providers and on-premises environments. We welcome contributions to build additional cloud providers or to improve core project functionality. If you find a bug, have a suggestion, or have something to contribute, please engage with us on GitHub.
Getting Started with Karpenter on AWS
To get started with Karpenter in any Kubernetes cluster, ensure there is some compute capacity available, and install it using the Helm charts provided in the public repository. Karpenter also requires permissions to provision compute resources on the provider of your choice.
Once installed in your cluster, the default Karpenter provisioner will observe incoming Kubernetes pods, which cannot be scheduled due to insufficient compute resources in the cluster and automatically launch new resources to meet their scheduling and resource requirements.
I want to show a quick start using Karpenter in an Amazon EKS cluster based on Getting Started with Karpenter on AWS. It requires the installation of AWS Command Line Interface (AWS CLI), kubectl, eksctl, and Helm (the package manager for Kubernetes). After setting up these tools, create a cluster with eksctl
. This example configuration file specifies a basic cluster with one initial node.
cat <<EOF > cluster.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: eks-karpenter-demo
region: us-east-1
version: "1.20"
managedNodeGroups:
- instanceType: m5.large
amiFamily: AmazonLinux2
name: eks-kapenter-demo-ng
desiredCapacity: 1
minSize: 1
maxSize: 5
EOF
$ eksctl create cluster -f cluster.yaml
Karpenter itself can run anywhere, including on self-managed node groups, managed node groups, or AWS Fargate. Karpenter will provision EC2 instances in your account.
Next, you need to create necessary AWS Identity and Access Management (IAM) resources using the AWS CloudFormation template and IAM Roles for Service Accounts (IRSA) for the Karpenter controller to get permissions like launching instances following the documentation. You also need to install the Helm chart to deploy Karpenter to your cluster.
$ helm repo add karpenter https://charts.karpenter.sh
$ helm repo update
$ helm upgrade --install --skip-crds karpenter karpenter/karpenter --namespace karpenter \
--create-namespace --set serviceAccount.create=false --version 0.5.0 \
--set controller.clusterName=eks-karpenter-demo
--set controller.clusterEndpoint=$(aws eks describe-cluster --name eks-karpenter-demo --query "cluster.endpoint" --output json) \
--wait # for the defaulting webhook to install before creating a Provisioner
Karpenter provisioners are a Kubernetes resource that enables you to configure the behavior of Karpenter in your cluster. When you create a default provisioner, without further customization besides what is needed for Karpenter to provision compute resources in your cluster, Karpenter automatically discovers node properties such as instance types, zones, architectures, operating systems, and purchase types of instances. You don’t need to define these spec:requirements if there is no explicit business requirement.
cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
#Requirements that constrain the parameters of provisioned nodes.
#Operators { In, NotIn } are supported to enable including or excluding values
requirements:
- key: node.k8s.aws/instance-type #If not included, all instance types are considered
operator: In
values: ["m5.large", "m5.2xlarge"]
- key: "topology.kubernetes.io/zone" #If not included, all zones are considered
operator: In
values: ["us-east-1a", "us-east-1b"]
- key: "kubernetes.io/arch" #If not included, all architectures are considered
values: ["arm64", "amd64"]
- key: " karpenter.sh/capacity-type" #If not included, the webhook for the AWS cloud provider will default to on-demand
operator: In
values: ["spot", "on-demand"]
provider:
instanceProfile: KarpenterNodeInstanceProfile-eks-karpenter-demo
ttlSecondsAfterEmpty: 30
EOF
The ttlSecondsAfterEmpty
value configures Karpenter to terminate empty nodes. If this value is disabled, nodes will never scale down due to low utilization. To learn more, see Provisioner custom resource definitions (CRDs) on the Karpenter site.
Karpenter is now active and ready to begin provisioning nodes in your cluster. Create some pods using a deployment, and watch Karpenter provision nodes in response.
$ kubectl create deployment --name inflate \
--image=public.ecr.aws/eks-distro/kubernetes/pause:3.2
Let’s scale the deployment and check out the logs of the Karpenter controller.
$ kubectl scale deployment inflate --replicas 10
$ kubectl logs -f -n karpenter $(kubectl get pods -n karpenter -l karpenter=controller -o name)
2021-11-23T04:46:11.280Z INFO controller.allocation.provisioner/default Starting provisioning loop {"commit": "abc12345"}
2021-11-23T04:46:11.280Z INFO controller.allocation.provisioner/default Waiting to batch additional pods {"commit": "abc123456"}
2021-11-23T04:46:12.452Z INFO controller.allocation.provisioner/default Found 9 provisionable pods {"commit": "abc12345"}
2021-11-23T04:46:13.689Z INFO controller.allocation.provisioner/default Computed packing for 10 pod(s) with instance type option(s) [m5.large] {"commit": " abc123456"}
2021-11-23T04:46:16.228Z INFO controller.allocation.provisioner/default Launched instance: i-01234abcdef, type: m5.large, zone: us-east-1a, hostname: ip-192-168-0-0.ec2.internal {"commit": "abc12345"}
2021-11-23T04:46:16.265Z INFO controller.allocation.provisioner/default Bound 9 pod(s) to node ip-192-168-0-0.ec2.internal {"commit": "abc12345"}
2021-11-23T04:46:16.265Z INFO controller.allocation.provisioner/default Watching for pod events {"commit": "abc12345"}
The provisioner’s controller listens for Pods changes, which launched a new instance and bound the provisionable Pods into the new nodes.
Now, delete the deployment. After 30 seconds (ttlSecondsAfterEmpty = 30
), Karpenter should terminate the empty nodes.
$ kubectl delete deployment inflate
$ kubectl logs -f -n karpenter $(kubectl get pods -n karpenter -l karpenter=controller -o name)
2021-11-23T04:46:18.953Z INFO controller.allocation.provisioner/default Watching for pod events {"commit": "abc12345"}
2021-11-23T04:49:05.805Z INFO controller.Node Added TTL to empty node ip-192-168-0-0.ec2.internal {"commit": "abc12345"}
2021-11-23T04:49:35.823Z INFO controller.Node Triggering termination after 30s for empty node ip-192-168-0-0.ec2.internal {"commit": "abc12345"}
2021-11-23T04:49:35.849Z INFO controller.Termination Cordoned node ip-192-168-116-109.ec2.internal {"commit": "abc12345"}
2021-11-23T04:49:36.521Z INFO controller.Termination Deleted node ip-192-168-0-0.ec2.internal {"commit": "abc12345"}
If you delete a node with kubectl, Karpenter will gracefully cordon, drain, and shut down the corresponding instance. Under the hood, Karpenter adds a finalizer to the node object, which blocks deletion until all pods are drained, and the instance is terminated.
Things to Know
Here are a couple of things to keep in mind about Kapenter features:
Accelerated Computing: Karpenter works with all kinds of Kubernetes applications, but it performs particularly well for use cases that require rapid provisioning and deprovisioning large numbers of diverse compute resources quickly. For example, this includes batch jobs to train machine learning models, run simulations, or perform complex financial calculations. You can leverage custom resources of nvidia.com/gpu, amd.com/gpu, and aws.amazon.com/neuron for use cases that require accelerated EC2 instances.
Provisioners Compatibility: Kapenter provisioners are designed to work alongside static capacity management solutions like Amazon EKS managed node groups and EC2 Auto Scaling groups. You may choose to manage the entirety of your capacity using provisioners, a mixed model with both dynamic and statically managed capacity, or a fully static approach. We recommend not using Kubernetes Cluster Autoscaler at the same time as Karpenter because both systems scale up nodes in response to unschedulable pods. If configured together, both systems will race to launch or terminate instances for these pods.
Join our Karpenter Community
Karpenter’s community is open to everyone. Give it a try, and join our working group meeting for future releases that interest you. As I said, we welcome any contributions such as bug reports, new features, corrections, or additional documentation.
To learn more about Karpenter, see the documentation and demo video from AWS Container Day.
– Channy
]]>Defunctland’s new video goes deeper on crowd management than you ever thought possible
]]>‘[Adult animation] is ripe for new kinds of formats and new types of stories’
]]>A few shagedelic recommendations from across Netflix, HBO Max, and Amazon
]]>Earlier this year, we announced Cloud Load Balancer support for Cloud Run. You might wonder, aren't Cloud Run services already load-balanced? Yes, each *.run.app endpoint load balances traffic between an autoscaling set of containers. However, with the Cloud Balancing integration for serverless platforms, you can now fine tune lower levels of your networking stack. In this article, we will explain the use cases for this type of set up and build an HTTPS load balancer from ground up for Cloud Run using Terraform.
Every Cloud Run service comes with a load-balanced *.run.app endpoint that’s secured with HTTPS. Furthermore, Cloud Run also lets you map your custom domains to your services. However, if you want to customize other details about how your load balancing works, you need to provision a Cloud HTTP load balancer yourself.
Here are a few reasons to run your Cloud Run service behind a Cloud Load Balancer:
The list goes on, Cloud HTTP Load Balancing has quite a lot of features.
The short answer is that a Cloud HTTP Load Balancer consists of many networking resources that you need to create and connect to each other. There’s no single "load balancer" object in GCP APIs.
To understand the upcoming task, let's take a look at the resources involved:
As you might imagine, it is very tedious to provision and connect these resources just to achieve a simple task like enabling CDN.
You could write a bash script with the gcloud command-line tool to create these resources; however, it will be cumbersome to check corner cases like if a resource already exists, or modified manually later. You would also need to write a cleanup script to delete what you provisioned.
This is where Terraform shines. It lets you declaratively configure cloud resources and create/destroy your stack in different GCP projects efficiently with just a few commands.
The goal of this article is to intentionally show you the hard way for each resource involved in creating a load balancer using Terraform configuration language.
We'll start with a few Terraform variables:
First, let's define our Terraform providers:
Then, let's deploy a new Cloud Run service named "hello" with the sample image, and allow unauthenticated access to it:
If you manage your Cloud Run deployments outside Terraform, that’s perfectly fine: You can still import the equivalent data source to reference that service in your configuration file.
Next, we’ll reserve a global IPv4 address for our global load balancer:
Next, let's create a managed SSL certificate that's issued and renewed by Google for you:
If you want to bring your own SSL certificates, you can create your own google_compute_ssl_certificate resource instead.
Then, make a network endpoint group (NEG) out of your serverless service:
Now, let's create a backend service that'll keep track of these network endpoints:
If you want to configure load balancing features such as CDN, Cloud Armor or custom headers, the google_compute_backend_service resource is the right place.
Then, create an empty URL map that doesn't have any routing rules and sends the traffic to this backend service we created earlier:
Next, configure an HTTPS proxy to terminate the traffic with the Google-managed certificate and route it to the URL map:
Finally, configure a global forwarding rule to route the HTTPS traffic on the IP address to the target HTTPS proxy:
After writing this module, create an output variable that lists your IP address:
When you apply these resources and set your domain’s DNS records to point to this IP address, a huge machinery starts rolling its wheels.
Soon, Google Cloud will verify your domain name ownership and start to issue a managed TLS certificate for your domain. After the certificate is issued, the load balancer configuration will propagate to all of Google’s edge locations around the globe. This might take a while, but once it starts working.
Astute readers will notice that so far this setup cannot handle the unencrypted HTTP traffic. Therefore, any requests that come over port 80 are dropped, which is not great for usability. To mitigate this, you need to create a new set of URL map, target HTTP proxy, and a forwarding rule with these:
As we are nearing 150 lines of Terraform configuration, you probably have realized by now, this is indeed the hard way to get a load balancer for your serverless applications.
If you like to try out this example, feel free to obtain a copy of this Terraform configuration file from this gist and adopt it for your needs.
To address the complexity in this experience, we have been designing a new Terraform module specifically to skip the hard parts of deploying serverless applications behind a Cloud HTTPS Load Balancer.
Stay tuned for the next article where we take a closer look at this new Terraform module and show you how easier this can get.
I’ve been updating the static version after each release and it got busier and busier overtime, so I decided that it is time to create a more in depth version of it.
So here is the updated for PostgreSQL 13 interactive version of PostgreSQL Observability Pgstats.dev.
It now has additional information about each internal and each stat. Just follow the links and you can discover more about each one of Postgres components and ways they interact.
Share it!
The post Interactive new look for PostgreSQL Observability appeared first on Data Egret.
]]>The tracks arrive on the 6th anniversary of the miniseries
]]>When does a model own her own image?
]]>It was May 2015, and the Hotel Surya in Varanasi, India, was hosting “Inner Awakening,” the flagship spiritual training program of Paramahamsa Nithyananda. To his followers, he is a god incarnate, “His Divine Holiness Bhagavan Sri Nithyananda Paramashivam,” the living avatar of Shiva, or, as one devotee put it, “the…
]]>Editor's note: Zach is one of the chairs for the Kubernetes documentation special interest group (SIG Docs).
I'm pleased to announce that the Kubernetes website now features the Docsy Hugo theme.
The Docsy theme improves the site's organization and navigability, and opens a path to improved API references. After over 4 years with few meaningful UX improvements, Docsy implements some best practices for technical content. The theme makes the Kubernetes site easier to read and makes individual pages easier to navigate. It gives the site a much-needed facelift.
For example: adding a right-hand rail for navigating topics on the page. No more scrolling up to navigate!
The theme opens a path for future improvements to the website. The Docsy functionality I'm most excited about is the theme's swaggerui
shortcode, which provides native support for generating API references from an OpenAPI spec. The CNCF is partnering with Google Season of Docs (GSoD) for staffing to make better API references a reality in Q4 this year. We're hopeful to be chosen, and we're looking forward to Google's list of announced projects on August 16th. Better API references have been a personal goal since I first started working with SIG Docs in 2017. It's exciting to see the goal within reach.
One of SIG Docs' tech leads, Karen Bradshaw did a lot of heavy lifting to fix a wide range of site compatibility issues, including a fix to the last of our legacy pieces when we migrated from Jekyll to Hugo in 2018. Our other tech leads, Tim Bannister and Taylor Dolezal provided extensive reviews.
Thanks also to Björn-Erik Pedersen, who provided invaluable advice about how to navigate a Hugo upgrade beyond version 0.60.0.
The CNCF contracted with Gearbox in Victoria, BC to apply the theme to the site. Thanks to Aidan, Troy, and the rest of the team for all their work!
]]>Meta for Mac.
For the past year, I’ve been using a high-res Sony music player to listen to my personal music collection. I detailed the entire story in the December 2019 episode of our Club-exclusive MacStories Unplugged podcast, but in short: I still use Apple Music to stream music every day and discover new artists; however, for those times when I want to more intentionally listen to music without doing anything else, I like to sit down, put on my good Sony headphones, and try to enjoy all the sonic details of my favorite songs that wouldn’t normally be revealed by AirPods or my iPad Pro’s speakers. But this post isn’t about how I’ve been dipping my toes into the wild world of audiophiles and high-resolution music; rather, I want to highlight an excellent Mac app I’ve been using to organize and edit the metadata of the FLAC music library I’ve been assembling over the past year.
These days, when I think of an old album I want to repurchase in high resolution (either 16-bit or 24-bit FLAC), or if I come across a new release I instantly fall in love with, I go ahead and buy it as a standalone FLAC digital download.1 I then organize albums with a standard Artist ⇾ Album folder hierarchy in the Mac’s Finder, as pictured below:
My FLAC music collection.
Before you ask: yes, I could do this file organization with my iPad Pro alone because the Sony music player I use (this Walkman model) can be connected via USB to the iPad (with this adapter) and comes with a standard SD card for expandable storage. However, I prefer to purchase and download FLAC music on my Mac mini because my music collection is also backed up and mirrored to Plex, and the Mac mini – as you might imagine – is running a Plex media server instance in the background at all times. The music library is stored on a 1 TB Samsung T5 external SSD that’s connected via USB-C to the Mac mini; whenever I purchase new music, I manually copy it into the T5 as well as the Sony Walkman’s SD card via the Finder.
Most of the time, FLAC music I purchase online comes with correct built-in metadata for fields such as track number, year, disc, and album artwork. But sometimes it doesn’t, which leads to the unfortunate situation of ending up with songs on my Walkman that lack album artwork or feature extra text in their titles such as “Remastered” or “Explicit”. It’s particularly annoying when artwork is missing because it ruins the experience of looking at the now playing screen while I’m focused on enjoying music. Plex doesn’t have this issue: by virtue of being an online service, Plex can search various databases for correct metadata and automatically fill missing fields in my library. One of the reasons I enjoy listening to my personal music library the old-school way is that the Walkman is completely offline, but that comes with the disadvantage of being unable to fix incorrect metadata via the web.
An example of songs with incorrect metadata on my Walkman. The word "Remastered" shouldn't be part of the song title field.
No album artwork makes me sad.
This is where Meta, an advanced music tag editor for Mac developed by French indie developer Benjamin Jaeger, comes in. I came across Meta a few months ago when, frustrated with the ugliness and bloated nature of other desktop metadata editors, I took it upon myself to find a polished, modern tag editor designed specifically with Mac users in mind. Meta is exactly what I was looking for: the app is a modern Mac utility that supports all popular audio formats (from standard MP3 and MP4 to FLAC, DSF, and AIFF) and can write metadata formats such as ID3v1, ID3v2, MP4, and APE tags. Unlike other cross-platform or open-source tag editors, Meta’s feature set is focused on one area – editing metadata for your digital music collection – and supports all the common interface paradigms you’d expect from a professional Mac app running on Catalina.
Meta can do a lot of different things, and this is by no means a comprehensive review of the app: after all, my needs are fairly basic – I drop an album into the app, fix the tags that are incorrect, and close the app. The Meta website has details and screenshots that cover all of the app’s features, and there’s also a full 3-day free trial you can use without limitations to take Meta for a spin. Having used the app for a while, I would describe it as follows: Meta is based on a powerful tag engine (called TagLib) and customizable, native Mac UI that supports a resizable sidebar, popovers, dark mode, keyboard shortcuts, and multiple view options; in addition to this flexible UI, Meta sports an incredible text-matching engine that lets you perform tasks such as batch renaming multiple files, creating new tags by mixing text and metadata tokens, finding and replacing text, and even converting file paths or metadata to new tags using regex patterns.
As I noted above, that’s a lot to take in, so let me walk through my typical use of Meta. Once I’ve identified an album that needs some of its metadata fixed, I drop the entire folder from the Finder into Meta. If I want to add artwork to it, all I need to do is select all songs in Meta’s main list, then drag the artwork image I previously downloaded from Google Images into the artwork section of Meta’s sidebar. The artwork tag will be instantly written to all FLAC files at once, and that’s it.
You can drag and drop album covers from the Finder.
But what if you don’t want to manually find and download a high-quality album artwork image beforehand? Meta has you covered there as well: by purchasing the additional, one-time-only Cover Finder feature, you’ll be able to select multiple tracks and hit ⌘⇧K to let Meta find artwork images for you. In my experience, results provided by Meta have been perfect (images fetched by Meta are high-definition covers with the correct color saturation and no watermarks); at least for me, they’re worth the extra purchase so I don’t have to manually search for each album artwork and clutter my desktop with (often low-res) images downloaded from random websites found via Google.
Meta can find album covers for you by searching online databases.
Besides album artwork, I’ve also been using Meta to fix text-based metadata by taking advantage of the app’s powerful text engine. I often come across remastered editions of albums where each song has the word “Remastered” included in the title field of its metadata – e.g. instead of “Wonderwall”, the song is called “Wonderwall Remastered”. Meta makes these kinds of batch edits extremely easy: after selecting all songs, I can select Edit ⇾ Find ⇾ Find & Replace from the menu bar (or hit ⌥⌘F) to activate the app’s find and replace UI. From there, I can type the text I want to get rid of in the ‘Find’ field and select the ‘All’ button next to ‘Replace’ (a common UI element in Mac apps) to clear the unwanted text from all title fields at once. In a nice touch, Meta visually confirms text matches found in the selected items and even lets you add special tokens such as tab characters or white space if you want to further refine your searches.
Meta's find and replace UI.
You can insert special characters in the find and replace UI with this menu.
I could also edit each song’s metadata manually by clicking into the appropriate field in the sidebar (also pictured above), but for these batch operations, Meta’s find and replace system is ideal. If you don’t want to replace an existing tag’s value but compose a new one altogether, Meta can do that as well: after selecting an item (or multiple ones), hit the pencil button in the toolbar to open the ‘Compose Tag’ popup, which lets you use arbitrary plain text as well as built-in tokens to overwrite any existing tag value. In the screenshot below, you can see how I edited the ‘Year’ field for all songs contained in The Cure’s Greatest Hits album by manually typing “2001” into the Compose Tag box.
Composing a metadata tag in Meta.
Meta can also rename files in the Finder: hit ⌘⇧R and, as with tags, you can mix and match existing tags and plain text to rename files however you see fit. In the example below, I used the track number tag, a hyphen, and the title tag to come up with a file name pattern I like to see in the Finder. Meta even remembers your preferences for file renaming, so once you’ve found a style you like, you won’t have to recreate the naming pattern every single time.
You can also let Meta rename files for you.
There are dozens of advanced features in Meta I haven’t used myself, but which contribute to making this app the premier utility for managing music collections. In the app’s preferences, you can set different options for album artwork including format, compression, and max size; you can let the app move and organize files in subfolders for you using the ‘Create Directory’ feature, which, again, uses patterns to let you craft a custom file path; there’s support for editing built-in lyrics, dates (with a visual date picker), and even track number sequences. The only thing Meta can’t do for me is add lyrics in the LRC format Sony uses for the Walkman, but I can’t blame the developer for not supporting this particular aspect of Sony’s music ecosystem.
Meta offers several customization options.
The greatest compliment I can pay to Meta is that, despite its abundance of features, it never feels overwhelming, and every functionality is always where I expect it to be. Thanks to Meta, I’ve been able to clean up and reorganize my FLAC music collection so that every album now has correct metadata on my Walkman, and I’m happier because of it.
Meta is a remarkable example of the kind of thoughtful, powerful professional tools indie developers can still create on macOS these days, and it’s one of my favorite Mac app discoveries in a long time. If you also like to manage your music library the old-fashioned way and can’t stand incorrect metadata, I can’t recommend Meta enough.
Club MacStories offers exclusive access to extra MacStories content, delivered every week; it’s also a way to support us directly.
Club MacStories will help you discover the best apps for your devices and get the most out of your iPhone, iPad, and Mac. Plus, it’s made in Italy.
Join Now]]>The American Psychological Association is on the defensive over its newly released clinical guidance (PDF) for treating boys and men, which links traditional masculinity ideology to a range of harms, including sexism, violence, mental health issues, suicide, and homophobia. Critics contend that the guidelines attack traditional values and innate characteristics of males.
The APA’s 10-point guidance, released last week, is intended to help practicing psychologists address the varied yet gendered experience of men and boys with whom they work. It fits into the APA’s set of other clinical guidelines for working with specific groups, including older adults, people with disabilities, and one for girls and women, which was released in 2007. The association began working on the guidance for boys and men in 2005—well before the current #MeToo era—and drew from more than four decades of research for its framing and recommendations.
That research showed that “some masculine social norms can have negative consequences for the health of boys and men,” the APA said in a statement released January 14 amid backlash. Key among these harmful norms is pressure for boys to suppress their emotions (the “common ‘boys don’t cry’ refrain”), the APA said. This has been documented to lead to “increased negative risk-taking and inappropriate aggression among men and boys, factors that can put some males at greater risk for psychological and physical health problems.” It can also make males “less willing to seek help for psychological distress.”
]]>Fallout: New California, an enormously ambitious mod for Fallout: New Vegas, is now out, finished and ready to play.
New California—built from the bones of an older mod called Brazil— is set in between the events of Fallout 2 and Fallout New Vegas, and to simply call it a “mod” is to sell it short. This is practically a brand new, fan-made Fallout game, that’s even got voice acting and takes place on a new map, with the game set in the Black Bear Mountain National Forest in California.
Advertisement
Its release is timed well; Fallout 76 is out soon, and anyone disappointed that it’s a multiplayer affair, and not the traditional epic singleplayer RPG, can just try this out instead.
You can download the mod here. As for how good it is, Nathan is playing it right now, and will have some impressions up on Kotaku soon!
Advertisement
UPDATE: This post’s headline earlier referred to the mod being nine years old. It’s actually been in active development since 2012.
]]>The Environmental Working Group has released their latest “Dirty Dozen” list of supposedly pesticide-laden fruits and vegetables. (This is a misleading list, as we’ve explained before.) You may be tempted to buy organic produce, as the EWG suggests, but guess what—organic produce is not pesticide-free.
Organic farmers may use pesticides, so long as they choose from a list of approved options. The USDA organic program does not disallow all pesticides, just “synthetic” ones. (By the way, the term “pesticides” includes both bug sprays and weed killers.)
Advertisement
So what remains on our vegetables? The USDA periodically tests produce for pesticide residues; this is the Pesticide Data Program. (The EWG repurposes this data to create their Dirty Dozen and Clean Fifteen lists.) But the USDA does not test for the presence of organic-allowed pesticides. So the EWG is reporting the stuff on conventional crops without considering what’s present on organic crops.
So, will you lower your pesticide exposure by switching to organic? We don’t know, but the answer may very well be no. Even looking at the synthetic, non-organic pesticides in the USDA’s tests, conventional crops don’t always have the lowest amounts. Take strawberries, for example, the “dirtiest” item on the 2018 list: 75 percent of organic strawberries, and 76 percent of conventional strawberries, had pesticide levels that were under 5 percent of the allowable levels.
In other words, buying organic strawberries might expose you to more pesticide residues than buying conventional. We recommend ignoring the Dirty Dozen list entirely, and buying whichever fruits and veggies work for your diet and your budget.
]]>Adult Swim’s hit animated show Rick and Morty finished its third season back in October, and we might not get season 4 until 2019, according to one of the show’s writers. To tide us over, Adult Swim just released a music video for Run The Jewels’ song “Oh Mama,” featuring the mad scientist and his grandson.
The video is directed by Rick and Morty director Juan Meza-León (he was responsible for episodes like “The Rickshank Rickdemption,” “The Whirly Dirly Conspiracy,” and “The ABC’s of Beth”), and it plays out a bit like a regular episode of the show: Rick and Morty fly off to a random planet, crash a club full of insectoid Gromflamites, and carnage ensues. They steal a briefcase and head out, only to encounter some additional problems,...
]]>It’s well-known that Nintendo was originally founded in Kyoto, Japan as a maker of playing cards in 1889. But a recent historical project by the city of Kyoto has turned up, for the first time, a photograph of what the company’s headquarters looked like in that year.
The photo, and an accompanying blog post, were published in December on “Memories of Kyoto, 150 Years After The Meiji Period,” an ongoing historical project documenting the city’s history during the reign of Emperor Meiji from 1868 to 1912. Nintendo historians authors Florent Gorges and Isao Yamazaki shared them around today, noting that the blog post was full of fascinating little-known information about Nintendo’s founding.
Advertisement
Nintendo’s founder Fusajiro Yamauchi originally ran a company called Haiko, which at the time specialized in cement, the post says. His name was originally Fusajiro Fukui, but he was adopted as an adult by his boss Naoshichi Yamauchi. This is actually extremely common in Japan—government records show that the vast majority of adoptions are adult men adopting other adult men, so that their companies can remain a “family business” even when there is no biological son to inherit the company.
Nintendo stayed a family business until the retirement of Fusajiro’s great-grandson Hiroshi Yamauchi in 2002, after which Satoru Iwata took the helm. Haiko is still in operation and is run by Kazumasa Yamauchi, who wrote the City of Kyoto’s blog post.
In 1889, Fusajiro Yamauchi struck out on his own to form Marufuku Nintendo Card Co., producing traditional Japanese playing cards called hanafuda and eventually Western playing cards as well. Here’s a map of where that original headquarters used to be located, in case you want to make a pilgrimage.
If you want to see the oldest Nintendo building that’s still actually standing, there’s one very close to downtown Kyoto. According to Gorges and Yamazaki’s book, the building that housed the original headquarters was demolished in 2004, and replaced with a parking lot.
]]>The Red Strings Club starts, and ends, with a character falling out of a high-rise window. There’s no indication that you could do anything to stop this from happening, or even that you’re supposed to. The game is more interested in how you get to that point—and the web of lies, manipulation, and tough choices you leave in your wake.
This cyberpunk point-and-click game, out today on PC, is by Deconstructeam, the developers behind 2014’s Gods Will Be Watching. Gods was a series of scenarios in which the player had to manage characters’ needs and unclear emotional states. The Red Strings Club is set up roughly the same way, but it’s not the brutal tightrope walk of life and death its predecessor was. It’s more forgiving, but it’s also more complicated. Its decisions open up into possibilities, nuances, and outcomes that don’t have clear rights and wrongs.
The game mostly takes place in the titular Red Strings Club, a bar in a cyberpunk dystopia owned by a man named Donovan. In this future, people have cybernetic implants, but Donovan has a medical condition that prevents him from getting them. This gives him a fairly anti-implant view, in contrast to Brandeis, an implant-sporting hacker.
Advertisement
In addition to being a bartender, Donovan is also an information broker, luring secrets out of clients through mixing signature cocktails. Brandeis and Donovan stumble upon a plan by megacorporation Supercontinent that involves making secret changes to people’s implants—or rather, the plan stumbles upon them, in the form of an AI who crashes through the bar’s door one night. From there, the game becomes a tangle of hacking, social engineering, and heavy drinking.
You spend most of your time in The Red Strings Club mixing cocktails. When a character comes into the bar, they’ll have several possible emotions—pride, depression, lust, euphoria—represented by icons in different positions on the screen. The player has to mix alcohol to move an indicator over the emotion they want to access. The labels on the bottles helpfully integrate arrows to remind you what moves where, and you later gain the ability to make the indicator move diagonally or to tilt its orientation. The controls are imprecise and uncomfortable, though it’s fun to spill booze everywhere. Nevertheless, it’s an unusual, enjoyable minigame, enriched with satisfying sound design. I regularly hurled ice cubes to the floor just to listen to them clatter.
Donovan uses these cocktails to manipulate characters into telling him what he wants to know. You have a series of objectives to uncover before taking on Supercontinent. Questions like who their CEO really is, or what role the android you found plays, can be teased out of a prideful inventor or scared out of a depressed marketing executive if you read their starting state right and adjust it accordingly. Manipulating characters through alcohol might seem deceitful, but everyone who comes into the bar wants to get drunk to change how they feel. I struggled with the idea of forcing patrons to feel unpleasant emotions, but Donovan’s intentionality felt less like selfishness and more like acknowledging the truth behind why we drink.
Advertisement
You’ll need whatever information you can get for the game’s climax, an epic tangle of social engineering done over a landline phone while the evil corporation closes in. It feels like the perfect combination of what you’ve been doing all along: teasing, lying, considering and exploiting the connections between people. The Red Strings Club humanizes each of its villains and protagonists, and as I called up one person pretending to be their crush in order to trick them into giving me their computer password, I felt ashamed of how clever I thought I was being, and guilty about how easy it was to get what I needed.
Your choices are tracked by a screen that gets progressively more covered in red lines as the game goes. These “red strings” are another recurring theme: the web of secrets, lies, desires, and dreams that Donovan manipulates to get what he wants.
Advertisement
Things started fairly linear for me, though they soon became an intriguing mess. The Red Strings Club has a Telltale-style indicator for when a choice you’ve made has an impact, but what that impact would be seldom felt immediately apparent. Unlike Gods Will Be Watching, conversations never felt like they had a hard fail state. The game would tell me I’d done something impactful and a string would be added to my running tally, but it mostly felt like I was following my natural inclinations. With the exception of one moment, in which Brandeis has to approach another character in a methodical, clearly-laid out dance, The Red Strings Club’s choices are open-ended and vague, made in dialogue trees full of lines that run the gamut of options. It feels messy and mysterious in a very human way.
The Red Strings Club deals with all the heavy issues you’d expect from cyberpunk: free will, humanity, technology, immortality. On occasion it felt a bit sophomoric—at one point two characters argue about how depression is vital to making good art—but it was just as quick to disagree with itself and come back down to earth. It’s a deeply emotional game, with a queer love story at its center. It’s also unexpectedly diverse, considering issues related to race, gender, and sexuality all as tacit parts of its future. One playthrough took me about three and a half hours, and I’m curious to see the consequences of different choices on another playthrough. A character will still probably fall out of a window, but it will mean something different depending how I get there.
Quick question: During which season does Japan look most beautiful? Spring is probably the number one choice, followed by fall. But winter certainly is no slouch.
Twitter user Naagaoshi has been uploading a series of stunning Japanese winter photos, showing off just how lovely the country can look covered in snow and ice.
Have a look for yourself!
Kotaku East is your slice of Asian internet culture, bringing you the latest talking points from Japan, Korea, China and beyond. Tune in every morning from 4am to 8am.
]]>Tlohtzin Espinosa and Owen Spence
The original symphonic treatment for the game Destiny was long thought lost, thanks to it being shelved after a major staffing shake-up at developer Bungie. But Destiny fans received quite the Christmas miracle—albeit a legally dubious one—when fans discovered and posted an apparent rip of the album in question, titled Destiny: Music of the Spheres.
The 8-track, 48-minute album leak, which is live as of press time at more than a few mirrors, was quickly confirmed as legitimate by two major contributors to the project: former Bungie composer Marty O'Donnell and former Bungie creative director Joe Staten. O'Donnell offered a "I think this is it" on Monday via Twitter, followed by an emphatic post of "Finally! #NeverForgotMotS." Staten followed up with acknowledgement that Sir Paul McCartney himself sang a lyric Staten had suggested, then added, "Glad #MOTS is finally out for all to hear."
]]>In late 2013, veteran Bungie composer Marty O’Donnell finished Music of the Spheres, an eight-part musical work designed to be released alongside Destiny. It never came out. But today, thanks to an anonymous leaker, the elusive work is finally on the internet—at least until the copyright strikes hit.
Composed by O’Donnell, his partner Michael Salvatori, and former Beatle Paul McCartney, Music of the Spheres was envisioned as a musical companion to Bungie’s ambitious Destiny. But Bungie and O’Donnell spent nearly a year battling over, among other things, publisher Activision’s failure to use O’Donnell’s music in a trailer at E3 2013. In April 2014, Bungie fired O’Donnell, and despite O’Donnell’s hopes, the company indefinitely shelved Music of the Spheres. He has made several public comments on the work since, and last month, he implicitly encouraged people to share it.
Advertisement
“Years ago, when I was Audio Director at Bungie, I gave away nearly 100 copies of Music of the Spheres,” O’Donnell tweeted on November 30, 2017. “I don’t have the authority to give you permission to share MotS. However, no one in the world can prevent me from giving you my blessing.”
In late 2016, teenager Owen Spence started his own independent project to recreate Music of the Spheres (as well-documented in this Eurogamer piece) using publicly available material. Spence has been in touch with me since then, and yesterday he told me via Twitter DM that he and a friend, Tlohtzin Espinosa, were contacted by someone with a copy of Music of the Spheres who wanted it to be public.
Advertisement
So they put it on Soundcloud, where you can now listen to it all (until Activision takes it down).
O’Donnell has not yet publicly commented on the leak—I’ve reached out for his thoughts—but it’s clear that he’s wanted this to happen for years now. Must be one hell of a Christmas gift.
UPDATE (7:50pm): In an e-mail, O’Donnell told me that he’s thrilled about this development:
]]>I’m quite relieved and happy. This was the way it was supposed to have been heard 5 years ago.
My wife and I spent the afternoon with my now 93 year old father and we showed him that people were finally able to hear this work. It made our Christmas even better. My mother, his wife of over 60 years died a couple years ago and although she loved listening and shared it with some of her friends (she was a musician) she never understood why it wasn’t released.
I don’t know who actually did it but they have my blessing. I honestly don’t know how anyone could begrudge this any longer.
Right after reinventing existing public services with private apps, hacking death may be the ultimate dream of SiliconValley’s elite. Death is truly the final boss for anyone who thinks enough money and lines of code can solve anything, and boy are they attacking it hard. In 2016, Mark Zuckerberg and his wife Priscilla Chan pledged $3 billion toward a plan to cure all diseases by the end of the century.
“By the time we get to the end of this century, it will be pretty normal for people to live past 100,” Zuckerberg said in 2016.
Advertisement
And to be sure, science, medicine and unlocking more about how the body functions have already worked what would look like a miracle to someone living centuries ago: the average life expectancy for someone born in the United States doubled in just 130 years, from 39 years in 1880 to 78 years in 2011. So Zuckerberg’s prediction may actually be easier than ridding his platform of Russian bots. Longevity—and potential immortality—is a particularly popular obsession with the tech world and Silicon Valley billionaires, who seem to be offended that death would ever get the better of them, and that somehow future generations MUST be able to bask in their immortal wisdom, even if their bodies are just throbbing electric impulses in a jar sustained by regular infusions of monkey testicles (yes, a real thing people tried for awhile).
The ultimate problem is that human bodies, these sad, slumping, failure-prone products of evolution, just aren’t cut out for living forever. People throughout history have tried, but the garbage body always gets in the way.
Advertisement
“We humans, as we are now, messy bags of blood and bone, are not really fit for immortality,” Stephen Cave, a philosopher at the University of Cambridge and author of the book Immortality: The Quest to Live Forever and How It Drives Civilization, told me. “So some really profound thing has to happen if we’re going to [change that].”
But if you’re interested in trying, oligarchs, rich lunatics and scientists throughout history provide some a framework, and a lot more is in the works at this very moment. Below, a rundown of the different approaches that have been taken up in the never-ending quest for life to never end.
Zuckerberg, along with his Silicon Valley pals from Google and 23andme, set up the Breakthrough Prize in 2012 to celebrate and promote science innovations, including fighting disease and living longer.
Advertisement
He also set up The Chan Zuckerberg Initiative, which will donate $3 billion over a decade to basic medical research with the goal of curing disease. Some have argued this approach isn’t the most efficient and the money would be better spent targeting single diseases at a time instead of an across-the-board assault. For instance, eradicating smallpox cost just $300 million in less than 10 years.
There is a problem with this approach, said Brian Kennedy, the director of the Center for Healthy Aging at the National University of Singapore: even if you treat diseases, you still haven’t cured aging itself.
“We don’t do healthcare [in the medical community], we do sick care,” he said, pointing out that the goal shouldn’t be just giving rich people access to cures for any disease but rather fundamentally attacking “aging” itself as a threat.
Advertisement
“Aging is the biggest risk factor to all these diseases that go out of control,” he said. “This is not just about a few billionaires living longer. This is about a million people living longer.”
Aging itself creates risks, he said, because organs and body systems inevitably break down over time. His center is researching ways to halt aging at the enzyme level. One of the most promising is the TOR pathway, a kind of cellular signaling that tells a cell to grow and divide or hunker down and turn up stress responses. Scientists believe that manipulating that pathway could slow down aging.
“It’s a really robust effect,” Kennedy said.
Once people realize that, he hopes his cause will be as flashy and imagination-capturing as Zuckerberg’s longevity quest.
Advertisement
“The most important thing we can do right now is to validate [the idea that] we can affect the aging,” he said. “Once that happens, I think interest level will go way up.”
Biohacking will also open up new avenues—and intense ethical debates—about what lengths people can go to to change their genetic code. Scientists, for instance, are still carefully exploring CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) technology, which acts like a homing missile that tracks down a specific DNA strand, then cuts and pastes a new strand in its place. It can be used to alter just about every aspect of DNA. In August, scientists for the first time in the United States used the gene editing technology on a human embryo to erase a heritable heart condition.
Throughout history, people have seized on the idea that you can essentially patch or infuse the human body with parts of other bodies and cheat death, kinda like jailbreaking your iPhone so it can accept any software.
Advertisement
Take, for instance Serge Voronoff, a Russian-born scientist who in the early 20th century believed animal sex glands held the secret to prolonging life. In 1920, he tried it out, taking a piece of monkey testicle and sewing it to a human’s (although, it should be noted, not his own) scrotum. The idea seemed to catch on: by the mid-1920s, according to Atlas Obscura, 300 people underwent his procedure; at least one woman received a graft of monkey ovary.
“The sex gland stimulates cerebral activity as well as muscular energy and amorous passion,” Voronoff wrote in his 1920 book, Life; a Study of the Means of Restoring Vital Energy and Prolonging Life. “It pours into the stream of the blood a species of vital fluid which restores the energy of all the cells, and spreads happiness.”
Advertisement
Voronoff eventually built his own monkey enclosure on his property and claimed he was able to restore 70 year olds to their youthful vigor. Some could live to 140, he claimed. He was able to charge as much as an average year’s salary at the time for the procedure.
Voronoff died in 1951, apparently never having rejuvenated himself.
Monkey testicles have fallen out of style, but, unlike the good doctor Voronoff, the idea of harvesting body parts is still very much alive.
Advertisement
Trump surrogate, Gawker killer and overall too-rich person Peter Thiel has talked about his interest in parabiosis, the process of getting transfusions of blood from a younger person, to reverse aging.
“I’m looking into parabiosis stuff, which I think is really interesting. This is where they did the young blood into older mice and they found that had a massive rejuvenating effect,” he told Inc. “It’s one of these very odd things where people had done these studies in the 1950s and then it got dropped altogether. I think there are a lot of these things that have been strangely under-explored.”
Studies have shown this may just be the latest snake oil tactic, though targeted to lunatic rich people who can’t help but be fascinated by the idea of literally feeding off the young.
Advertisement
It certainly didn’t work out for Alexander Bogdanov, a science fiction writer, doctor, and pioneer of cybernetics who dabbled in blood transfusions in the 1920s. He thought that if he ran a train of blood transfusions on himself, he could become functionally immortal. This thirst for blood met a hubristic end: he eventually took a blood transfusion from a malaria patient. The patient survived, but he did not.
Cave’s book breaks up immortality schemes throughout history into four classifications: the first one, staying alive in the body, involves all those life extending medicines and life hacking gene therapies discussed above. The second one involves resurrection, an idea that has fascinated people throughout history, from Luigi Galvani’s 18th century experiments running electricity through a dead frog’s legs to more recent efforts at cryonics, the process of freezing your body with the hope that future medicine or technology will be able to restore you to health. Some in Silicon Valley are interested in new versions of cryonics, but so far it doesn’t seem to be getting that much attention.
Advertisement
Cave’s third path involves finding immortality through the soul, something that has driven religious wars and controlled populations for eons. It takes as a fact that your physical body is a degrading mess that will one day betray you, but that doesn’t matter, since the soul is the real, eternal essence of who you are. But it’s best left to religious discussions nowadays, as science can’t seem to prove it exists.
“If bits of your brain are damaged, then bits of you, the fundamental deepest idea of who you are, have disappeared,” Cave said. He’s talking about the idea that if the soul is the indestructible essence of you that can survive eternity, why does our essence change when we suffer brain damage or other personality altering maladies? If your soul lets you live forever, which version of you exactly is the one that lives forever?
“That leads us to wonder if your soul is somehow supposed to be maintaining all these things, why can’t your soul do that for you? If it can do that when your whole brain is gone, why can’t that do that when a part of your brain is gone?”
Advertisement
But some techies argue the nature of these projects will redefine what a soul is entirely: not so much a ghostly essence of your being connected to a higher power, but more a specific set of brain signatures unique to you, a code that can be hacked like any other code.
“Consider, then, the modern soul as the unique neuronal-synaptic signature integrating brain and body through a complex electrochemical flow of neurotransmitters. Each person has one, and they are all different,” Marcelo Gleiser, a theoretical physicist, writer, and a professor of natural philosophy, physics and astronomy at Dartmouth College, wrote for NPR in April. “Can all this be reduced to information, such as to be replicated or uploaded into other-than-you substrates? That is, can we obtain sufficient information about this brain-body map so as to replicate it in other devices, be they machines or cloned biological replicas of your body?”
Advertisement
Google’s lifespan-extending project Calico launched in 2013 with a mission statement that calls aging “one of life’s greatest mysteries.” Also a great mystery is exactly what Calico has been up to: the company’s work has been shrouded in secrecy, which has led to lots of curiosity and frustration from the rest of people in the anti-aging field. So far, according to a New Yorker piece in April, all that’s known is the company is tracking a thousand mice from birth to death to find “biomarkers” of aging, what can be described as biochemical substances whose levels predict death. The company has invested in drugs that may help fight diabetes and Alzheimer’s.
The tech side of things brings us to Cave’s fourth path to immortality: legacy. For ancient civilizations, that meant creating monuments, having your living relatives chant your name after you’re gone or carving names on tomb walls.
Advertisement
“If your name was spoken and your monuments still stood, they thought,” he wrote in his book, “then at least a part of you still lived.”
Today’s legacies look different than giant stone shrines, but the ego behind them is probably comparable. The idea of uploading consciousness to the cloud has crossed from science fiction into science possible: Russian web mogul Dmitry Itskov in 2011 launched the 2045 Initiative, an experiment to make himself immortal within the next 30 years by creating a robot that can store a human personality.
“Different scientists call it uploading or they call it mind transfer. I prefer to call it personality transfer,” Itskov told the BBC last year.
So here is one of the obvious main problems with Silicon Valley-led innovations, like many other tech-based lurches into the advanced future: it could be too expensive for everyone to afford. Which in turn could mean that we’ll have a class of near immortals, or cloud-based consciousnesses, ruling over people bound to their horrifying analog bodies. The meshing of human/computer/nantech parts will also open up a whole new thinkpiece industry about when someone stops becoming a “person” all together and is just lines of code.
Advertisement
Kennedy said opening these options up to everyone will depend on what avenue of research proves the most effective. If aging is treated as a disease (and healthcare in general somehow becomes affordable to everyone), there’s hope.
“The challenge is to figure out ways to improve health span and get it to everybody as quickly as possible,” he said. “If it’s drugs, it’s achievable. If it’s a bunch of transfusions of young blood, that’s less achievable.”
If all this has you bristling at the thought of techies creating their own super race of “disruptors” impervious to the torments of time and the limits of flesh, that’s understandable. But Cave said you may be encouraged by the entire history of people who’ve chased extended lifespans, from ancient Egypt to the people clinging to their diets and exercise throughout the 21st century.
Advertisement
“The one thing that everyone has pursued immortality has in common,” he said, “is that they’re now six foot under, pushing up daisies.”
]]>From an outsider’s perspective, what’s a cult and what’s not a cult can seem obvious. Not a cult: your new book group. Cult: that group your second cousin joined where all the women are renamed Meadow and are betrothed to Jeremy, their unshowered leader. Simple!
In reality, cults aren’t always so obvious; sometimes the cult-like aspects of an organization reveal themselves to you slowly, when you’re already fully invested.
Advertisement
In our latest podcast episode, Rick Alan Ross outlined the three criteria of establishing whether or not a group is a cult—as identified by psychiatrist Robert Jay Lifton.
1. There is an authoritarian figure in charge of the group who is revered like a god. Everyone and every decision revolves around said figure.
Advertisement
2. People in the group act against their own best interests, but in the best interest of the group (and the charismatic leader). This occurs through a process called “thought reform.”
3. The group exploits its members. The degree of harm inflicted on each member varies wildly depending on the group—some may take your money, others might inflict physical and sexual abuse.
Here’s the paper by Robert Jay Lifton in its entirety—it’s fascinating and worth a read, especially if there are any organizations in your life you have doubts about. (Your book group leader is strangely charismatic, and everyone always loves her book suggestions...)
]]>In this episode we discussed cults: how they operate, how you identify one, what it’s like to be in one, and how to get out. To that end, we spoke with author Rebecca Stott, whose book In the Days of Rain: A Father, a Daughter, a Cult details her childhood in the Exclusive Brethren, a cult that believed the world is ruled by Satan. We also talked to Rick Alan Ross, the founder and Executive Director of The Cult Education Institute. And we talked with Elizabeth Yuko, a bioethicist and journalist who’s written extensively about cults.
Listen to The Upgrade above or find us in all the usual places where podcasts are served, including Apple Podcasts, Google Play, Spotify, iHeartRadio, Stitcher, and NPR One. Please subscribe, rate, and review!
We reached out to any organizations mentioned by Rick Ross if he either referred to them as a cult or described having received complaints about them. We received responses from two organizations. Jehovah’s Witnesses declined to comment directly, but sent us these links: Are Jehovah’s Witnesses a Cult? Are Jehovah’s Witnesses an American Sect?
We also heard from Landmark, who commented, in part:
Landmark is a global personal and professional growth, training, and development company that delivers programs and courses that empower and develop people to fulfill on what’s really important to them. Recognized for their leading-edge material and methodology, Landmark’s programs equip people to produce breakthrough results in areas such as career, relationships, productivity, and overall quality of life. More than 2.4 million people in more than 20 countries have taken Landmark’s programs and Landmark is considered one of the leading companies in its field.
Every week we like to let you in on the upgrades we’ve made in our own lives. This week we talked about bloomers, killing fruit flies, and this mesmerizing podcast.
There are two ways to reach out:
We look forward to hearing from you!
]]>Every business has a starting point. For Japanese arcades, one of them was on department store rooftops.
Yes, Japanese arcades have their roots in the carnival type games often seen at local religious festivals. But way before coffee houses installed tabletop cabinets to capitalize on the late 70s Space Invaders craze, there were game machines atop department stores in cities like Osaka and Tokyo.
Case in point: Namco, which still operates a large number of arcades in Japan. The company got its start making attractions, rides, and games for department store roofs. In 1955, when Namco was still Nakamura Manufacturing, the nascent company built two wooden horses for a Yokohama department store. This was not the first of its kind. The maiden rooftop amusement area opened on a Tokyo department store in 1903, with wooden horses, a seesaw and an indoor play area.
By the time Nakamura Manufacturing entered the scene in the mid-1950s, this was a well established tradition in Japan, with a history of buildings covered with ropeways, Ferris Wheels and other attractions. No wonder they’re known as okujou yuuenchi (屋上遊園地), literally “rooftop amusement park.”
But with cities rebuilding after World War II, these rooftop play areas offered new business opportunities.
In the early 1960s, Nakamura Manufacturing constructed a kiddy attraction called “Roadway Ride” on top of Mitsukoshi Department Store with children “driving” small cars on a railroad type track.
Not everything was a ride, as there were also coin operated and carnival type games. Other companies, like Sega, also made mechanical games that were enjoyed on these rooftops.
Advertisement
Website Tetsugaku News recently published a series of old photos of Japanese department store roofs, showing the different attractions.
Via website Nippon Sumizumi Kanko, here are some of the retro games found on the rooftop play area of Nagasaki’s Hamaya Department Store.
Around this same time, there were also bowling alleys with mechanical games, but everything radically changed with Space Invaders. Dedicated establishments, called “inbeedaa hausu” (“invader house”), starting popping up across the country, evolving into the Japanese gaming arcades of today.
Advertisement
When I came to Japan in 2001, rooftop arcades were still fairly common. But these days, with more and more of them are being shuttered instead of getting needed updates and repairs.
For example, the previously mentioned Hamaya Department Store’s rooftop amusement park is no more, and after 56 years in operation, the okujou yuuenchi on Hanshin Department Store in Osaka was shut down, as documented by Man-san’s Photo Gallery. These are just two of many, and it’s increasingly rare for department stores to have them. Sadly, the age of rooftop amusement parks is drawing to a close.
But many modern Japanese arcades still have this rooftop amusement park DNA, offering small rides for children. Some, like Joypolis, are modern throwbacks to the okujou yuuenchi of yore.
The only thing that is missing is the rooftop setting.
For more photos, check out Nippon Sumizumi Kanko and Man-san’s Photo Gallery.
This article was originally posted on April 6, 2017.
Kotaku East is your slice of Asian internet culture, bringing you the latest talking points from Japan, Korea, China and beyond. Tune in every morning from 4am to 8am.
]]>A big part of Your Name takes place in Tokyo. Over on Tofugu.com, Kanae Nakamine visited the real-world locations depicted in the movie.
The resulting comparisons between the anime and real life are fascinating.
Nakamine also created a helpful itinerary in case you want to make a Your Name seichijunrei (聖地巡礼) or pilgrimage, visiting the places depicted in the hit anime.
Be sure to check Tofugu for more comparisons and info regarding where these Tokyo spots are located.
In case you missed it, you can read Kotaku’s Your Name review right here.
Kotaku East is your slice of Asian internet culture, bringing you the latest talking points from Japan, Korea, China and beyond. Tune in every morning from 4am to 8am.
]]>Here’s a cool video tribute to Half-Life 2: Episode 3 by Anomalous Materials. Oh, what could have been.
]]>There’s an official Evangelion x New Balance line of sneakers coming out in Japan and China this weekend.
There’ll be four colours available, each in a very subtle scheme, while they’ll also come in nice padded zip-up boxes that feature Evangelion imagery.
They’ll go on sale for around USD$100. More pics at Hypebeast.
]]>I’ve been reviewing Razer accessories and hardware for ages, and aside from the odd license tie-in and those rainbow headsets, they’ve mostly had one thing in common—they’ve been black. Well now we’ve got the Mercury line, so bright and white they look like the restless spirits of real Razer products.
Razer’s actually got two new color schemes going on, Mercury (white) and Gunmetal (gray), but gray is really just light black, babysteps compared to the jarring contrast between Razer’s jet black lineup and these ivory beauties.
First off we have the Invicta “gaming surface.” I put gaming surface in quotes because it’s generally just a fancy term for mouse pad, but in this case it might be appropriate. For one, this has a case.
The Invicta is a dual-sided gaming surface (it’s growing on me). One side is smooth for speed. The other is rough, for control. It costs $60 and weighs around 1.5 pounds, not counting the case.
The weight is because the Invicta pad comes with an aluminum base plate.
That’s one serious mouse pad. And here’s a serious mouse for it. The Lancehead Tournament Edition is a mouse with a 16,000 DPI 5G optical sensor, tracking at 450 inches per second. As a man who generally gets by with a trackball, that’s ridiculous.
The $80 Lancehead is completely symmetrical, with side buttons on the left and right so anyone can use it, regardless of dominant hand. Plug it in and place it on top of the Invicta and it looks like a racing pod from some utopian future.
The Kraken 7.1 v2 is Razer’s middle-of-the-road headset, priced at $99.99 (the $200 Razer Tiamat 7.1 V2 is their latest top-of-the-line wired set). It’s a great set of cans with a pretty good retractable mic. The switch to white isn’t as dramatic for this one, mainly because I never see it once it’s on my head. It’s still very pretty.
The $150 Blackwidow X, on the other hand, barely looks like a Razer product anymore. Rather than go straight-up white everything, Razer made the exposed metal plate silver, giving the board a lovely contrast. The Razer logo on the front is barely visible when the unit’s LEDs are off.
There is something otherworldly about a set of shine-through white keycaps with soft RGB lighting piping through. It’s dreamy.
Razer’s even gone as far as running a batch of their custom switches (this one has the clicky greens) with a pale gray housing as opposed to the normal black. The company’s made a real commitment here.
The whole set comes together nicely. Razer is known for creating aggressive-looking gaming hardware with bold (some would say obnoxious) branding. The Mercury line softens those rough edges considerably.
]]>Nier: Automata audio engineer Masami Ueda has written a cool blog post detailing how he implemented composer Keiichi Okabe’s secondary hacking soundtrack. It’s easy to overlook how much work something like that takes, and fun to get a look at Ueda’s process.
]]>Released this week on Steam by Sand Sailor Studio, Black The Fall is a 2D side-scrolling puzzle platformer that shares many similarities with last year’s indie hit, Inside. Dark and moody atmosphere? Clever puzzles requiring plenty of trial and error? It’s got all that, plus a lil’ robot friend.
Comparisons between Black The Fall and Playdead’s dark and thoughtful platformer are unavoidable. The game takes place in a ruined world ravaged by communism. The player is a tired worker trying to escape years of toil under the regime. Though eventually opening up to the brighter outside world, the opening moments of the game take place within the gray walls and black shadows of a vast factory. Early in the game the player gains access to a laser pointer, a tool that manipulates mechanical devices as well as the brains of mind-controlled workers.
It’s all very dark and hopeless, until the player makes a new friend.
The introduction of this little robot pupper makes a dark game a bit brighter. The robodog can help the player climb to higher heights, activate electronic switches and can even act as a brace, as seen in the GIF atop this post.
Advertisement
I streamed an hour and a half of Black The Fall earlier this week. Check out the stream archive below to see my escape attempt panned out.
]]>