Amazon Server Throughput

easy

Software engineer

You are given an array hostThroughput with $N$ elements, where each element represents the throughput of a host server. Your task is to form as many clusters as possible, with each cluster consisting of exactly three distinct servers.

The throughput of a cluster is defined as the median of the three throughput values. Each server can belong to at most one cluster, and any leftover servers will remain unused.

The goal is to maximize the overall system throughput, which is the sum of the medians from all clusters.

Example 1

Input

hostThroughput = [8, 6, 3, 4, 4, 5, 6]

Output

Explanation

To maximize throughput, we want the medians to be as large as possible. 
1. Sort the throughputs: [3, 4, 4, 5, 6, 6, 8]. 
2. With 7 servers, we can form at most floor(7/3) = 2 clusters.
3. Cluster 1: [3, 6, 8] -> Median = 6.
4. Cluster 2: [4, 5, 6] -> Median = 5.
5. Total throughput = 6 + 5 = 11. One server (4) is left unused.

Example 2

Input

hostThroughput = [9, 8, 7, 6, 5, 4]

Output

Explanation

Clusters: [4, 8, 9] (median 8) and [5, 6, 7] (median 6). Sum = 14.

Example 3

Input

hostThroughput = [100, 1, 10, 5, 50, 20]

Output

Explanation

Clusters: [1, 50, 100] (median 50) and [5, 10, 20] (median 10). Sum = 60.

Constraints

3 <= hostThroughput.length <= 10⁵
1 <= hostThroughput[i] <= 10⁹
Total number of clusters is floor(hostThroughput.length / 3).

ArrayGreedySorting

Amazon Server Throughput Editorial

Overview

The problem asks us to group $N$ servers into clusters of three, where the "score" of each cluster is the median value of the three servers. We want to maximize the sum of these medians.

The core intuition lies in how a median is calculated for a set of three: it is the middle value when sorted. To maximize the sum, we want to pick the highest possible values to serve as medians. However, every median requires one larger value to its right to "protect" it and one smaller value (or any available value) to its left. By sorting the servers, we can greedily pick the largest available values to be our medians by pairing them with the absolute largest values as their "right-side" buffers.

Brute Force

A brute-force approach would involve generating all possible partitions of the array into clusters of three, calculating the sum of medians for each partition, and keeping the maximum.

Since we need to choose $k = \lfloor N/3 \rfloor$ clusters, we are essentially looking at combinations of permutations. The number of ways to group $N$ elements into triplets is roughly $\frac{N!}{(3!)^{N/3} \cdot (N/3)!}$. For $N=10^5$, this is astronomically large and computationally impossible.

def getMaxThroughput(hostThroughput: list[int]) -> int:
    # A brute force approach would involve generating all permutations
    # or combinations of clusters, which is O(N!) or exponential.
    # This is not feasible given the constraint N <= 10^5.
    # Below is a placeholder representing the logic.
    pass

Time complexity: O((N!)) — Attempting to partition the array into all possible combinations of triplets is factorial in complexity. Space complexity: O(N) — Required to store the permutations or recursion stack.

Optimal Approach

To maximize the sum of medians, we should use a greedy strategy after sorting the throughputs.

Sort the hostThroughput array in ascending order.
We can form $k = \lfloor N/3 \rfloor$ clusters.
To get the largest possible median, we look at the end of the sorted array. The largest value (at index $n-1$) can never be a median (as no value is larger than it to make it the middle). Thus, the best role for the largest value is to be the "upper bound" for a cluster where the second-largest value (at index $n-2$) is the median.
We repeat this logic: the third-largest and fourth-largest values can form another cluster where the fourth-largest is the median, and the third-largest is the "upper bound".
For each such pair (Median, Upper Bound), we conceptually take the smallest available value from the start of the array to complete the triplet. Since we only care about the sum of medians, we don't actually need to track which small value is used.

Visualizing the selection from a sorted array: If we have sorted values and need $k$ clusters, we start from the second-to-last element and skip every other element moving backwards until we have collected $k$ medians.

def getMaxThroughput(hostThroughput: list[int]) -> int:
    # 1. Sort the throughputs to easily identify the largest values
    hostThroughput.sort()
    
    n = len(hostThroughput)
    num_clusters = n // 3
    total_throughput = 0
    
    # 2. We start from the second to last element (index n-2)
    # 3. We pick 'num_clusters' elements, skipping one each time
    # This ensures each median has a larger element to its right.
    current_index = n - 2
    for _ in range(num_clusters):
        total_throughput += hostThroughput[current_index]
        current_index -= 2
        
    return total_throughput

Time complexity: O(N log N) — The bottleneck of the algorithm is sorting the hostThroughput array. Space complexity: O(1) or O(N) — Depending on the sorting implementation (Python's Timsort uses O(N) auxiliary space in the worst case).

Why this works

By sorting the array, we ensure that for every median we pick at hostThroughput[i], there is at least one element at hostThroughput[i+1] that is greater than or equal to it. By picking medians from the right side of the array and skipping only one element (the "buffer"), we ensure we are using the largest possible values available to be medians. We skip the smallest values entirely because they serve best as the "third" element in each cluster, which does not contribute to the sum.

Fast

Comments on the solution

No comments on the solution yet. Start a thread about the editorial or your own approach.

Language

Code editor loads in the browser.

Output

Input

hostThroughput[8,6,3,4,4,5,6]

Expected

Your output

Run to see your output.

Example 1

Example 2

Example 3

Constraints

Interview stages

Topics

Amazon Server Throughput Editorial

Overview

Brute Force

Optimal Approach

Why this works

Comments on the solution

Output