On the Fault-tolerant Online Bin Packing Problem
(Server Consolidation)
Shahin Kamali, Pooya Nikbakht - July 2021

Server Consolidation - Fault-tolerant Bin Packing

Abstract: Bin packing is a well-known problem in optimization theory with many variants and applications in practice. In many cases, each variant of the problem can be studied under two separate settings: offline or online. An input to the offline problem is a multi-set of items with different sizes in the range (0,1], and the target is to pack them into the minimum number of bins of uniform capacity. In the online setting, the input is not revealed all at once, instead, the items are revealed one by one. An online algorithm can either place a revealed item in an open bin that has enough space, or open a new bin for the item. The point is that the online algorithm should make irrevocable decisions in the sense that after placing an item into a bin, the algorithm cannot move the item into another bin.

The fault-tolerant variant of online bin packing is one of the more recent settings of the bin packing problem, in which items are packed into faulty bins. When a bin fails, we need to transfer the items of the failed bins into other bins such that no bin overflows. Server consolidation in the Cloud is an important application of the fault-tolerant online bin packing problem. Cloud service providers such as Amazon host client applications, called tenants, on their cloud servers. The service provider commits to a Service Level Agreement (SLA) that defines the minimum performance requirement for the tenant. The goal of the service provider is to satisfy the SLA requirements and, at the same time, minimize the operational costs. For that, cloud providers often consolidate tenants on shared computing machines to improve utilization and ultimately reduce their cost. Such consolidation can be modeled via online bin packing, where each item represents a tenant, and each bin represents a computing machine (a server). Upon the arrival of a tenant, an online algorithm assigns it to a server (a bin) that has enough capacity left for the item. Tenants have different loads that are defined through their required bandwidth, memory, CPU-usage, and so on, as indicated in the SLA. Each server has a certain capacity with respect to these requirements. In a typical setting, 50 to 90 percent of the capacity of the active servers are unused. Reducing the number of active servers by better assignments of tenants to servers (a better packing) is important in a data center for saving energy-related costs which constitute 70 to 80 percent of maintenance costs in a typical data center. As such, a good bin packing algorithm that minimizes the number of used bins can help in decreasing the operational cost of the data center.

In cloud systems, client applications are replicated in multiple servers. Such replication is necessary to keep the service available and uninterrupted in case of server failures. In order to keep services uninterrupted against the failure of up to f services, it is necessary to replicate each tenant in at least f+1 different servers. In many practical scenarios, each tenant has a primary replica and multiple secondary (or backup) replicas. In the case of database tenants, the read queries can be answered by any replica while the write queries should be handled in the primary replica before being propagated to all copies. Primary replicas are also responsible for redirecting/answering queries issued to the failed replicas. As such, the load of the primary replica is more than other replicas.

We are continuing a line of research on the primary-secondary scheme for the fault-tolerant bin packing problem to achieve more efficient algorithms. The goal is to pack an online sequence of items (tenants) into a minimum number of bins (servers) such that each item of size x has a primary replica of size x and f secondary replicas each of size x/𝜂. Here, 𝜂 >1 is a parameter that captures the higher load of primary replicas relative to secondary replicas. To ensure the service is tolerant against the failure of up to f servers, the primary replica of each job should be available at any given time. That is, in case of failure of a server that hosts the primary replica of an item x, a secondary replica of x should be selected to become its primary replica. The subsequent increase in the load of such replica should not cause an overflow in the bin.

Full Paper: On the Fault-Tolerant Online Bin Packing Problem