Thoughts about software craftsmanship: December 2012

I recently learned that a common interview question for software engineering positions is the Skyline problem, and people struggle a lot to solve it. It also took me a few tries before I had the final solution, but once I got the idea the problem became a lot easier than I thought it would be. Below I will present and explain the different approaches one can take to solve the problem. I am also going to give running time analysis for each of them. Before we go into details, let's start with the problem statement first.

Problem statement

We have to design a program which helps drawing the skyline of a two-dimensional city given the locations of the rectangular buildings in the city. All of the buildings are built on a flat ground (that is they share a common bottom) and each building B_i is represented by a triplet of (l_i, r_i, h_i) where l_i and r_i are the left and right coordinates, respectively, of the ith building, and h_i is the height of the building. In the diagram below there are 8 buildings, represented from left to right by the triplets (1, 5, 11), (2, 7, 6), (3, 9, 13), (12, 16, 7), (14, 25, 3), (19, 22, 18), (23, 29, 13) and (24, 28, 4).

Input

The input of the program is a sequence of building triplets. In some versions of the problem further restrictions are applied on the input, like all coordinates are non-negative integers less than 10,000, there is at least 1 and at most 5,000 buildings in the input sequence, and the triplets are sorted by l_i (the left x-coordinate of the building) in ascending order. Sometimes the input source is a file and the format of this file is also described, but we're going to ignore it because pre-processing the input sequence of buildings is completely irrelevant to the problem.

Output

The output is a collection of points which describe the path of the skyline. In some versions of the problem this collection of points is represented by a sequence of numbers p₁, p₂, ..., p_n, such that the point p_i represents a horizontal line drawn at height p_i if i is even, and it represents a vertical line drawn at position p_i if i is odd. In our case the collection of points will be a sequence of P₁, P₂, ..., P_n pairs of (x_i, h_i) where P_i(x_i, h_i) represents the h_i height of the skyline at position x_i. Note that technically it is the exact same representation as the previous one, except having pairs of numbers representing the height at a given position is more meaningful than a parity-based interpretation of numbers. In the diagram above the skyline is drawn with a yellow line around the gray buildings and it is represented by the sequence of position-height pairs (1, 11), (3, 13), (9, 0), (12, 7), (16, 3), (19, 18), (22, 3), (23, 13) and (29, 0).

The brute-force solution

Let's simplify the problem by not ignoring all the restrictions mentioned in the Input paragraph. The most important piece of information is that we know that the left and right coordinates of each and every building are non-negative integers less than 10,000. Now why is this important? Because we can assign a height-value to every distinct x_i coordinate where i is between 0 and 9,999. How do we do that? It's simple.

Allocate an array for 10,000 elements and initialize all of the elements to 0. Let's call this array H.
Iterate over all of the buildings and for every B_i building iterate on the range of [l_i..r_i) where l_i is the left, r_i is the right coordinate of the building B_i.
For every x_j element of this range check if h_i > H[x_j], that is if building B_i is taller than the current height-value at position x_j. If so, replace H[x_j] with h_i.

Once we checked all the buildings, the H array stores the heights of the tallest buildings at every position. There is one more thing to do: convert the H array to the expected output format, that is to a sequence of position-height pairs. It's also easy: just map each and every i index to an (i, H[i]) pair. That's it, we just solved the Skyline problem!

Notice that the result is not an optimal skyline (a skyline with minimum number of position-height pairs) because it will likely contain redundant information. What points in the skyline are considered redundant? Those points where the height didn't change compared to the preceding point's height. More formally, given the sequence of P₁, P₂, ..., P_i-1, P_i, ..., P_n pairs of (x, h), the point P_i can be left out from the skyline if h_i = h_i-1. To get an optimal skyline we need to map only those i indexes where the height changes. Here's a working implementation in Scala which - instead of allocating an array for 10,000 elements - creates an array that is exactly as big as needed to store all possible building coordinates.

The implementation

First we're going to define some data structures to represent a building, a point in the skyline and the skyline:

case class Building(left: Int, right: Int, height: Int) {

  require(left <= right, "left must not be greater than right")
  require(height >= 0, "height must be non-negative")
}

case class Point(x: Int, height: Int)

class Skyline private (val points: List[Point])

Now we can implement our first solution:

object Skyline {

  def apply(buildings: Building*): Skyline = if (buildings.length == 0) {
    new Skyline(List.empty)
  } else {
    val (min, max) = buildings.foldLeft((Integer.MAX_VALUE, Integer.MIN_VALUE))((minMax: (Int, Int), b: Building) => (math.min(minMax._1, b.left), math.max(minMax._2, b.right)))
    val heights = Array.ofDim[Int](max - min + 1)
    buildings.foreach(b => Range(b.left, b.right).foreach(i => if (heights(i - min) < b.height) heights(i - min) = b.height))
    val points = ListBuffer.empty[Point]
    heights.foldLeft((0, 0)) { (res: (Int, Int), height: Int) => val (idx, prevHeight) = res
      if (height != prevHeight) points += Point(idx + min, height)
      (idx + 1, height)
    }
    new Skyline(points.toList)
  }
}

The brute force algorithm in action

Computed skyline

Time and space complexity

Now that we've solved the Skyline problem, let's have a look at the resources necessary to execute this algorithm. Even though the problem statement may define the maximum number of buildings and the biggest coordinate, we assume that the above algorithm is designed to work with inputs of arbitrary number of buildings and arbitrary (positive) integer coordinates. From now on n indicates the number of buildings in the input sequence and m indicates the maximum coordinate.

The best case is clearly when n = 0, that is the input sequence is empty (lines 3 and 4). The running time is O(1) (we have to check if the input sequence is empty) and the used space is also O(1) (we create an empty skyline).

But what's the running time when there are actually some buildings in the input sequence? In lines 6 and 7 we first find the minimum and maximum coordinate values then allocate an array of size max - min + 1. To do so, we have to search the whole input sequence once, which means linear running time, that is O(n) for n buildings^*, and the allocated array uses O(m) space in the worst case (when min = 0 and max = m - 1). Next (in line 8) we iterate over the input buildings and save the height value for the coordinates the actual building spans over. In the worst case we have n equal-size buildings, each having l = 0 left and r = m - 1 right coordinates, that is every building spans over the whole [0..m) interval. Thus the running time of setting the height of every position is O(n * m). Finally - in lines 9-13 - we go once more over the elements of this array to find those positions where the height changes and include them in the final result (in the ListBuffer defined in line 9). This is again O(n) running time. Therefore, the overall time-complexity is O(n) + O(n * m) + O(n) = O(n * (m + 2)), which is a lot larger than O(n²) if m > n. It seems that our algorithm performs pretty badly (worse than quadratic running time) and it also uses a lot of space to calculate the final solution.

* If we know in advance that all coordinates are less than 10,000, we can skip this min-max search and just simply allocate an array for 10,000 elements. What this means is that there is no O(n) penalty on the running time but we may have a bigger array than needed.

Notice that even in this simplest solution we can completely ignore whether or not the buildings are sorted in the input sequence. But something is not quite right besides the very bad running time and used space. This solution only works if the building coordinates are positive integers (so that we can use an array to calculate the heights). And what if the maximum coordinate is way bigger than 10,000? Can we be sure that we can always allocate a big enough array to store all the heights? And if the coordinates are not even integers or perhaps negative? And finally, can we come up with a more time- and space-efficient algorithm? Can we do better?

The inductive solution

Sure we can! It'd be a huge speed-up if somehow we could determine the skyline by calculating the height for those coordinates only where it matters, wouldn't it? Intuition tells us that if we can insert a building into an existing skyline then instead of all the coordinates the building spans over we only need to check the height at the left and right coordinates of the building plus those coordinates of the skyline the building overlaps with and may modify. Similarly to the mathematical induction we can construct an algorithm that does exactly that.

The base case is trivial, every building can be converted into a skyline: given B(l, r, h) the skyline is P(l, h), P(r, 0). Now the inductive step is that we already have a skyline into which we need to insert a new building. Obviously, if we can insert a building into the skyline then we can insert all the buildings to get the final solution. The algorithm is as follows:

If the current skyline is empty add the two P(l₁, h₁), P(r₁, 0) points representing B₁ to the skyline.
Otherwise find the first P_i(x, h) point of the skyline which immediately precedes l_j, the left coordinate of the building B_j, that is x_i < l_j ≤ x_{i + 1}. Save h_i to H if there is such a P_i point, otherwise set H to 0.
Prepend the skyline with P(l_j, h_j) if there is no such P_i point or insert P after P_i if h_i < h_j.
For every P_k(x, h) point where l_j ≤ x_k < r_j replace P_k with P(x_k, h_j) if h_j > h_k and h_j ≠ h_{k - 1}, or remove P_k if h_j ≥ h_k and h_j = h_{k - 1}. Save h_k to H.
Let P_l(x, h) denote the first point in the skyline such that r_j ≤ x_l. Append P(r_j, 0) to the skyline if there is no such point or insert P(r_j, H) before P_l if r_j < x_l and H < h_j.

This algorithm always produces an optimal skyline since in step 4 it removes those points which would be duplicates. This is done for every inserted building, so the final skyline is described with the minimum number of points. Notice that there is no need to store all the heights for all the coordinates of the tallest buildings in an array, so technically the coordinates can be negative or real numbers. Nevertheless, we are going to work with the previously defined Building and Point data structures which use integers. This, however, is not a requirement.

The implementation

Now let's see some code, a functional, recursive implementation. There are some minor differences between those few lines of code below and the algorithm described above which are worth noting, even though they are only implementation details. While the algorithm above suggests an implementation where the actual skyline is being modified on the fly in every iteration by inserting and removing points, the Scala code below, however, emits a new skyline when a building is inserted and uses that as the input skyline for the next building. Another difference is that the described algorithm works with a building and a skyline, while the code below first converts the actual building into two Points as if it was a skyline. This is only done so that it's easier to work with the building's coordinates recursively.

object Skyline {

  def apply(buildings: Building*): Skyline = new Skyline(
    buildings.foldLeft(List.empty[Point]) { (points: List[Point], b: Building) =>
      mergePoints(List(Point(b.left, b.height), Point(b.right, 0)), points, 0, 0, 0, ListBuffer.empty[Point])
    }
  )

  @tailrec private def mergePoints(p1: List[Point], p2: List[Point], prevHgt1: Int, prevHgt2: Int, currHgt: Int, res: ListBuffer[Point]): List[Point] = if (p2.isEmpty) {
    (res ++= p1).toList
  } else if (p1.isEmpty) {
    (res ++= p2).toList
  } else {
    val (firstPt1, firstPt2) = (p1.head, p2.head)
    val (hgt1, hgt2) = if (firstPt1.x == firstPt2.x) (firstPt1.height, firstPt2.height) else if (firstPt1.x < firstPt2.x) (firstPt1.height, prevHgt2) else (prevHgt1, firstPt2.height)
    val maxHgt = math.max(hgt1, hgt2)
    val (points1, points2) = if (firstPt1.x == firstPt2.x) (p1.tail, p2.tail) else if (firstPt1.x < firstPt2.x) (p1.tail, p2) else (p1, p2.tail)
    mergePoints(points1, points2, hgt1, hgt2, maxHgt, if (maxHgt != currHgt) res += (Point(math.min(firstPt1.x, firstPt2.x), maxHgt)) else res)
  }
}

The inductive algorithm in action

Building
Existing skyline
New skyline

Time and space complexity

How much did we improve the brute force solution with this more sophisticated approach? The main work is done inside the mergePoints function. The trivial case, when the skyline is empty, is handled in lines 9-10 by simply adding all the points from the p1 list - which represents the building as skyline points and as such it contains maximum 2 points at any moment - to the result. This clearly runs in O(1) time. If the algorithm already inserted n - 1 buildings and it's about to insert the nth, things get a bit more complicated, but it'll become clear in a moment as I explain.

Since there is an existing skyline and there is a building to be inserted, the mergePoints function skips the first few lines of code and continues in line 14 where it selects the first point (left-hand side) of the building and the first point of the skyline. Because of it always creates a new skyline from scratch when it inserts a building, the algorithm has to determine the height of the point to be added to the skyline (lines 15-16). There are three different cases: the building starts right where the skyline does, or it is before or after the first skyline point. If the two coordinates are the same, then the taller one is selected. If the left-hand side of the building comes before the skyline, then the building's height is selected, otherwise the algorithm has not yet found the building's position in the skyline and the first skyline point's height is selected. In line 17 the algorithm decides how to proceed with the unprocessed points. If the building starts where the skyline does then both points are processed and only the right hand side of the building and the rest of the skyline points need to be merged in the next recursive call. If the building's left-hand side is before the skyline then only the building's left coordinate is processed, the skyline is still to be merged with the right-hand side of the building. Finally, if the whole building is positioned after the first skyline point then only this first skyline point is processed and the building still needs to be merged with the rest of the skyline. The last step is adding the point that comes first with the selected height to the newly built skyline (if its height is different than the previously added point) and recursively call the function with the unprocessed points, the actual heights of the building and the skyline and the current height of the newly built skyline. By doing so the algorithm finds the positions where the building's points are to be inserted and it also maintains the correct height throughout the recursion. If during the algorithm either the building or every point of the skyline is fully processed then it just appends the rest of the building / skyline points to the new skyline in lines 10 or 12.

In the worst case none of the previous n buildings are overlapping so there is a gap between each building in the skyline, and the new building is also distinct. A building is described by 2 points in the skyline (P(l, h), and P(r, 0)) which means that the p2 list contains 2 * n elements. Since the new building is also not overlapping with any of the previous buildings and the above algorithm consumes one point at a time, its running time is O(2 * n + 2), where the +2 is coming from the two points representing the building to be inserted. For the sake of simplicity let's swap the number of buildings by the number of points such that we have n / 2 buildings and n points. For the analysis we can ignore the +2, since this is only a negligible constant. Therefore, the running time of the mergePoints function is linear: O(n).

There is one more thing to consider, though: lines 4-6 where the algorithm iterates through all the buildings. Now that we know that mergePoints is linear in terms of the number of buildings, we can say that when the algorithm inserts the (n + 1)th building, it has to scan through the skyline points of 0, then 1, then 2, ..., then n buildings. Hence the overall running time is 0 + 1 + 2 + ... + n = n * (n + 1) / 2 = O((n² + n) / 2) ≈ O(n²), which is still quadratic.

The used space is always linear, though, in terms of the number of buildings, and it does not depend on the size of the range the buildings span over. The actual skyline is stored in the p2 list and is garbage-collected at the end of every iteration where a new skyline is created from the points of the skyline and the inserted building. Even though the algorithm does not depend on m and the coordinates need not to be non-negative integers anymore, unfortunately it seems we didn't really improve the brute force solution. The question remains: can we do better?

The divide and conquer solution

As a matter of fact, yes we can! We already represented a building in the previous algorithm as if it was a skyline so you might ask if merging two arbitrary skylines is a significantly different problem than merging a building and a skyline? No, it's not. Luckily, we already have everything in place! The above mergePoints function has no idea about what the p1 list represents, be it only a building or a skyline of multiple buildings. The function merges the two skylines in linear time: if p1 contains n₁, p2 contains n₂ points, then the time complexity is O(n₁ + n₂). This is as good as it gets, it can't be improved because we need to check every point in both skylines to decide which one to keep and which one to drop. If we want to improve, we have to find it elsewhere.

Inserting the buildings one after the other is not the fastest way to solve this problem as we've seen it above. If, however, we first merge pairs of buildings into skylines, then we merge pairs of these skylines into bigger skylines, and then merge pairs of these bigger skylines into even bigger ones, then - since the problem size is halved in every step - after log(n) steps we can compute the final skyline. We can still use the mergePoints function to merge two skylines, which eventually runs in O(n) time in every step (since in the worst case none of the buildings are overlapping), so the overall running time of the divide and conquer approach will be - as expected - O(n * log(n)), which is much-much faster than any of the previous two algorithms. And all this is achieved by merging partial skylines instead of inserting the buildings one by one. Let's see how.

The implementation

Compared to the previous solution only the implementation of the apply method changes, so I am not going to repeat the mergePoints function here.

def apply(buildings: Building*): Skyline = {
  def toSkyline(from: Int, to: Int): List[Point] =
    if (to - from == 1) List(Point(buildings(from).left, buildings(from).height), Point(buildings(from).right, 0))
    else {
      val middle = (from + to) / 2
      mergePoints(toSkyline(from, middle), toSkyline(middle, to), 0, 0, 0, ListBuffer.empty[Point])
    }

  new Skyline(if (buildings.length == 0) List.empty else toSkyline(0, buildings.length))
}

Unlike the mergePoint function, toSkyline is not tail recursive. This should not be a problem, though, for we don't expect this method to go very deeply by the recursive calls: it reaches maximum log(n) depth which is very small even for a very a large n.

The divide and conquer algorithm in action

Skyline #1
Skyline #2
New skyline

Final notes on the used Scala collections

In case you are not or less familiar with Scala, you may be interested in the performance characteristics of the different collection types used throughout the different approaches, namely the mutable Array and ListBuffer and the immutable List and Range. There is an implicit 5th type, the WrappedArray, which is the actual type of the vararg method parameter buildings: Building*, and is actually just an Array, "victim" of Scala's implicit type-conversions.

The methods used on the arrays and the range are foreach and foldLeft. Both iterate through all the elements of the underlying collection, thus their running time is O(n).

The methods used on the lists are isEmpty, head and tail. All of these methods are blazing fast, they run in O(1) time and basically have no impact on the performance of the algorithms.

Lastly, the methods used on the list buffer are += and toList. Both of these methods require constant running time so they also don't change significantly the overall performance of the Skyline algorithm. We must be careful, though, with modifying a ListBuffer after it has been exported by the toList method: after an export, the next modifying method will run in O(n) time because all of the elements in the buffer must first be copied in order to not to modify the exported list. Fortunately, we don't have to be worried about it, because the toList method is always the last operation after which the ListBuffer will be garbage-collected.

You can download the complete source code and test cases from here.

Thoughts about software craftsmanship

Wednesday, December 26, 2012

The Skyline problem

Problem statement

Input

Output

The brute-force solution

The implementation

Time and space complexity

The inductive solution

The implementation

Time and space complexity

The divide and conquer solution

The implementation

Final notes on the used Scala collections