Combinatorics

Combinatorics is a branch of mathematics dedicated to figuring out how to count things! Specifically, the number of elements contained in – i.e., the cardinality of – event \(A\):

\[\mathbf{card}(A) = |A|\]

Counting’s not so hard, you say? Think again: when it comes to ordering and choosing sets in complicated and clever ways counting can get tricky in heartbeat. Be that as it may, combinatorics plays a fundamental and foundational role in probability as it forms the basis for assigning probabilities to events in many contexts. Beyond that, the need to count carefully and correctly is obviously a hugely important part of Data Science, and it’s important to have few of the “standard counting tricks” handy or you might make something a whole lot harder than it needs to be. For a good “industry perspective” on the importance and challenge of counting in data science check out Counting in Data Science.

Factorials

Factorials count the number of ways to order a set of objects.

E.g., if there are 10 lottery balls (labeled 1-10) and we draw them all, how many possible orderings could be drawn? The answer to this question is

  • there are 10 choices for the first ball

  • 9 choices for the second ball (because we’ve already drawn the first lottery ball)

  • 8 choices for the third ball (because we’ve already drawn the first two lottery balls)

  • and so on…

until there is only one ball left and we must pick it. That is, there are \(10*9*8*\cdots*1 = 10!\), i.e. 10 factorial, possible orderings. This is a demonstration of the so-called product rule for counting things, and it forshadows the incredibly fundamental rule in probability known as the chain rule (which we will return to later).

The number 10 factorial can be calculated in Python, but watch out: factorials get really big really fast…

>>> import math
>>> math.factorial(10)
3628800

Combinations

Combinations count the number of ways to choose things when order does not matter. Here’s an example of all the two character combinations that can be made from the letters A, B, and C:

>>> from itertools import combinations
>>> list(combinations("ABC",2))
[('A', 'B'), ('A', 'C'), ('B', 'C')]

The “number of combinations” problem – i.e., counting the number of all possible unordered collections of size K from a pool of N objects — is often referred to as the “N choose K” problem, and the solution to the problem is commonly notated as

\[\begin{split}\left(\begin{array}{c}N\\K\end{array}\right) = \displaystyle \frac{N!}{(N-K)!K!}\end{split}\]

Note

EXERCISE

If you think about the “N choose K” formula carefully you can actually see that it makes sense: the \(K!\) in the denominator is the number of ways to order a list of length \(K\), and the \(\frac{N!}{(N-K)!}\) is all possible lists of length \(K\) where order matters. Do you see why? Knowing this, can you articulate why the complete formula counts the right thing?

If you’re feeling like this seems like an awful lot of multiplication and division, don’t worry, Python can do all the necessary calculations for you:

>>> from math import factorial
>>> def comb(n, k):
...     return factorial(n) / (factorial(k) * factorial(n - k))
>>> from scipy.misc import comb
>>> comb(3,2)
3.0

Note

EXERCISE

>>> lefthand_beers = ["Milk Stout", "Good Juju", "Fade to Black", "Polestar Pilsner"]
>>> lefthand_beers += ["Black Jack Porter", "Wake Up Dead Imperial Stout","Warrior IPA"]
  1. We have sampler plates that hold 4 beers. How many different ways can we combine these beers?

  2. Print a list of these pairs so we can identify the bad ones?

Source: lefthandbrewing.com/beers

Permutations

Permutations counts the number of ways subsets can be chosen when order does matter. If you followed the “N choose K” thought exercise above then you won’t be surprised to learn (i.e., be reminded) that the number of ways to choose K things out of N things when order matters is

\[\displaystyle \frac{N!}{(N-K)!}\]

Explicitly writing out the formula makes it clear that permutations are just a slight variation on what we did for the product rule above. I.e., they are just a special case of factorial multiplication. And of course, once again, it’s easy to Python take care of the permutation calculations for you.

from math import factorial
def permu(n, k):
    return factorial(n) / factorial(n - k)
>>> from itertools import permutations
>>> list(permutations("ABC",2))
[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

Note

EXERCISE

On a baseball team with 12 players, how many different batting lineups are there?

Hint: there are 9 players in a lineup.