Consider these base-10 digits: 123456789. If you insert spaces between them, you get various sequences of numbers:

1 2 3 4 5 6 7 8 9

12 3 4 5 67 8 9

1 2 34 5 6 7 89

12 34 56 78 9

1 23456 78 9

12345 6789

etc.

1) Write a program that generates all possible combinations of those digits.

How many are there?

Now let's insert a maximum of 8 addition or subtraction operators between the numbers, like this:

1+2+3+4+5+6+7-8+9

12-3+4+5-67-8+9

1+2+34+5-6-7-89

12-34+56+78+9

1+23456-78-9

12345+6789

etc.

Notice that those arithmetic expressions equate to different values:

1+2+3+4+5+6+7-8+9 = 29

12-3+4+5-67-8+9 = -48

1+2+34+5-6-7-89 = -60

12-34+56+78+9 = 121

1+23456-78-9 = 23370

12345+6789 = 19134

etc.

2) Write a program that generates all possible expressions in this way.

How many sum to 100?

3) Write a program that finds all such expressions for any sum.

Which sum is the most popular, i.e. has the most expressions?

I put my solution in an iPython notebook, which you can view here. Some conclusions:4) Bonus: We can haz pretty data viz?

Like how about a histogram of the number of expressions with sums from -23456788 to 123456789. (A log scale might help. Maybe binning, too.)

- The distribution of values you can generate by arranging these digits and operators has a power-law tail.
- The distribution is a mixture of (approximately) uniform distributions, where each element of the mixture is characterized by the length of the largest term in the expression. That is, if the biggest term has 3 digits, the values tend to be in the 100s. If the biggest number has 4 digits, the value is usually in the 1000s, etc.
- The reason for the power law is that each element of the mixture is an order of magnitude bigger, but about 0.6 orders of magnitude more rare. So the parameter of the power tail is about 0.6.

For future exploration: what do you think happens if you add multiplication to the operator mix? Download the notebook and try it!