High-availability Namespaces
Temporal Cloud's replicated Namespaces provide disaster-tolerant deployment for workloads where availability is critical to your operations. When you enable high availability, Temporal Cloud automatically synchronizes your data between a primary and a fallback Namespace, keeping them in sync. Should an incident occur, Temporal will failover your Namespace. This allows your Workflow Executions and Schedules to seamlessly shift from the active availability zone to the synchronized replica in the fallback availability zone.
Advantages of using Temporal Cloud’s High Availability features:
- No manual deployment or configuration needed, just simple push-button operations.
- Existing Workflows resume seamlessly in the replica with minimal interruption and data loss.
- No changes needed for Worker and Workflow code during setup or failover.
- 99.99% contractual SLA.
High availability options
Temporal currently offers the following High Availability features, which you configure at a Namespace level:
- Replication: Workflows are seamlessly replicated to a different isolation domain within the same region as the Namespace, such as "us-east-1". Choose this option for applications architected for a single-region. You will failover within the same region to a separate isolation domain.
- Multi-region Replication: Workflows are seamlessly replicated to a different region that you choose. Choose this option when your business requires multi-regional availability and the higher-level of resilience that separated locations offers. You will failover from one region to a separate region.
Please note that replication charges apply when enabling High Availability features. For pricing details, visit Temporal Cloud's Pricing page.
Replication and replicas
High Availability features in Temporal Cloud simplify deployment, ensuring operational continuity and data integrity even during unexpected events impacting an availability zone (AZ) or a region using a process called Replication. Replication asynchronously replicates Workflow Executions from an active region to its replica. Using Temporal Cloud’s High Availability features, you can create a replica in the same region or in a different region. In the event of network service or performance issues in the active region, your replica is ready to take over. Temporal Cloud smoothly transitions control from the active to the replica via a "failover".
Availability zones and replicas
An availability zone is a physically isolated data center within a deployment region for a given cloud provider. Regions consist of multiple availability zones, providing redundancy and fault tolerance. In some cases, the fallback zone may be in the same region as the primary zone, or it may be in a different region altogether, depending on your deployment configuration.
High availability simplifies deployment, ensuring operational continuity and data integrity even during unexpected events. Regional disruptions or other issues that affect the data centers within a specific availability zone may occur. High availability allows processing to shift from the affected zone to an already-synchronized fallback zone.
This synchronized zone is called a "replica." The process of duplicating all Workflow data ensures that your replica, which serves as the standby region, is always available and ready to take on the active role.
In the event of network service or performance issues in the active zone, your replica is ready to take over. When necessary, Temporal Cloud smoothly transitions control from the active to the standby zone using a process called "failover".
High availability and business continuity
For many organizations, ensuring high availability is critical to maintaining business continuity. Temporal Cloud's high availability Namespace feature includes a 99.99% contractual Service Level Agreement (SLA). It provides 99.99% availability and 99.99% guarantee against service errors.
A high availability Namespace (HAN) creates a single logical Namespace that operates across two physical zones: one active and one standby. HANs streamline access for both zones to a unified Namespace endpoint. As Workflows progress in the active zone, history events are asynchronously replicated to the standby zone, ensuring continuity and data integrity.
In the event of an incident or outage in the active zone, Temporal Cloud will seamlessly failover to your standby zone. Failovers allow existing Workflow Executions to continue running and new Workflow Executions to be started. Once failover occurs, the roles of the active and standby zones switch. The standby zone becomes active, and the previous active zone becomes the standby. After the issue is resolved, the zone "fails back" from the replica to the original.
Types of high availability
Temporal currently offers the following high availability options, which you select when upgrading your Namespace to use high availability:
- In-region replication - Data is replicated to a separate zone in the same availability region, such as "us-east-1". This option offers near-instantaneous failovers but does not protect against regional disasters like hurricanes where both the primary and the fallback .
- Multi-region replication - Data is replicated to a separate region on the same continent. This option offers the greatest protection against weather events and other possible external causes for regional outages, as the regions are physically separated by large distances. Failover may experience some minor latency.
As Namespace pairing is currently limited to regions within the same continent, South America is excluded as only one region is available.
Should you choose high availability?
Should you be using high availability Namespaces? It depends on your availability requirements:
- High availability Namespaces offer a 99.99% contractual SLA for workloads with strict high availability needs. HANs use two Namespaces in two deployment zones to support standby recovery. In the event of a zone failure, Temporal Cloud automatically fails over the HAN Namespace to the standby replica.
- Single-zone Namespaces include a 99.9% contractual Service Level Agreement (SLA). In single-zone use, Temporal clients connect to a single Namespace in one deployment zone. For many applications, this offers sufficient availability.
Temporal Cloud provides 99.99% service availability for all Namespaces, both single-region and high availability.
SLA guarantees
High availability Namespaces offer 99.99% availability, enforced by Temporal Cloud's service error rates SLA. Our system is designed to limit data loss after recovery when the incident triggering the failover is resolved.
Our recovery point objective (RPO) is near-zero. There may be a short period of time during an incident or forced failover when some data is unavailable in the standby region. Some Workflow History data won't arrive until networks issue are fixed, enabling the History to finish replicating and the divergent History branches to reconcile.
Temporal Cloud proactively responds to incidents by triggering failovers. Our recovery time objective (RTO) is 20 minutes or less per incident.
During a disaster scenario in which the data on the hard drives in the active region cannot be recovered, the duration of data loss may be as high as the replication lag at the time of disaster.
Regional availability
Multi-region Namespaces are available in all existing Temporal Cloud regions.
Namespace pairing is currently limited to regions within the same continent. South America is excluded as only one region is available.