Comprehending the Problem

Paul Brown @ 2006-04-02T18:48:00Z

List comprehensions are a programming language convenience but also an instance of a good way to think about solving some kinds of programming problems. For example, consider the problem:

Given a set, define a function that returns its powerset, i.e., the set of all subsets.

For my purposes herein, the datatype for a set has a notion of membership, size, and enumeration.

Et tu, Bruté?

The brute force solution using recursion is roughly:

Set powerset(Set s) {
  if s == EmptySet {
    return {EmptySet}
  } else {
    x = first element of s
    y = powerset (s \ x)
    return y union (apply (__ union {x}) to y) 
}

And this is far from efficient, since the powerset of a set of n elements has 2n elements which have a total of n2n-1 elements between them:

# elements in union over powerset =
                       = sum (j*(n choose j),j=0 to n)
                       = sum(j * n!/((n-j!)*j!),j=1 to n)
                       = sum(n!/((n-(k+1))!*k!,k=0 to n-1)
                       = n sum( (n-1)!/((n-1)-k)!*k!),k=0 to n-1)
                       = n (1+1)n-1 = n 2n-1

Thinking a Little

So how could thinking about comprehensions help things? The idea is to use the definition (“set of all subsets”) to influence the implementation. For example, to implement the size method on the powerset:

Powerset implements Set {

  Set _s

  Powerset(Set s) {
    _s = immutable copy of s
  }

  int size() {
    return 2^_s.size
  }
}

The same goes for membership:

boolean contains(e) {
  return (e is a Set && e is a subset of _s)
}

The only item left would be generating an iterator that visits all elements of the powerset, but thinking in terms of comprehensions again suggests a good approach. A subset of the source set is defined by a function that returns true if an element is in the set and false if not. With the source set indexed by 0,..,n-1, this suggests an approach where each subset corresponds to a number between 0 and 2n in binary notation — 1 in the kth bit if the kth element is in the set.

The bit vector encoding approach can be used to get a generator-style, recursion-free solution without necessarily encapsulating the powerset, but the encapsulation is equally important because of the frame of mind that it creates. Thinking about what "is a subset" means is what actually leads to the encoding concept.

Meta

No tags.

(comment bubbles) 0 comments
426 direct views