Thursday, May 3, 2018

Some people hate custom libraries

For most of my books, I provide a Python module that defines the functions and objects I use in the book.  That makes some people angry.

The following Amazon review does a nice job of summarizing the objections, and it demonstrates the surprising passion this issue evokes:

March 29, 2018
Format: Paperback
Echoing another reviewer, the custom code requirement means you learn their custom code rather than, you know, the standard modules numpy and scipy. For example, at least four separate classes are required, representing hundreds of lines of code, are required just to execute the first six lines of code in the book. All those lines do is define two signals, a cosine and a sine, sums them, then plots them. This, infuriatingly, hides some basic steps. Here's how you can create a cosine wave with frequency 440Hz:

duration = 0.5
framerate = 11025
n = round(duration*framerate)
ts = np.arange(n)/framerate
amp = 1.0
freq = 440
offset = 0.0
cos_sig = amp * numpy.cos( 2*numpy.pi*ts*freq + offset)
freq = 880
sin_sig = amp * numpy.sin( 2*numpy.pi*ts*freq + offset)

cos_sig = thinkdsp.CosSignal(freq=440,amp=1.0,offset=0)
sin_sig = thinkdsp.SinSignal(freq=440,amp=1.0,offset=0)
mix = cos_sig + sin_sig

where CosSignal and SinSignal are custom classes, not functions, which inherits four separate classes, NONE of which are necessary, and all of which serve to make things more complex than necessary, on the pretense this makes things easier. The classes these class inherit are a generic Sinusoid and SumSignal classes, which inherits a Signal class, which depends on a Wave class, which performs plotting using pyplot in matplotlib. None of which make anything really any easier, but does serve to hide a lot of basic functionality, like hiding how to use numpy, matplotlib, and pyplot.

In short, just to get through the first two pages, you have to have access to github to import their ridiculous thinkdsp, thinkplot, and thinkstats, totalling around 5500 lines of code, or you are just screwed and can't use this book. All decent teaching books develops code you need as necessary and do NOT require half a dozen files with thousands of lines of custom code just to get to page 2. What kind of clown does this when trying to write a book to show how to do basic signal processing? Someone not interested in teaching you DSP, but trying to show off their subpar programming skills by adding unnecessary complexity (a sure sign of a basic programmer, not a good).

The authors openly admit their custom code is nothing more than wrappers in numpy and scipy, so the authors KNEW they were writing a crappy book and filling it with a LOT of unnecessary complexity. Bad code is bad code. Using bad code to teach makes bad teaching. It's obvious Allen B. Downey has spent his career in academia, where writing quality code doesn't matter.

Well, at least he spelled my name right.

Maybe I should explain why I think it's a good idea to provide a custom library along with a book like Think DSP.  Importantly, the goal of the book is to help people learn the core ideas of signal processing; the software is a means to this end.

Here's what I said in the preface:
The premise of this book is that if you know how to program, you can use that skill to learn other things, and have fun doing it.
With a programming-based approach, I can present the most important ideas right away. By the end of the first chapter, you can analyze sound recordings and other signals, and generate new sounds. Each chapter introduces a new technique and an application you can apply to real signals. At each step you learn how to use a technique first, and then how it works.
For example, in the first chapter, I introduce two objects defined in thinkdsp.py: Wave and Spectrum.  Wave provides a method called make_spectrum that creates a Spectrum object, and Spectrum provides make_wave, which creates a Wave.

When readers use these objects and methods, they are implicitly learning one of the fundamental ideas of signal processing: that a Wave and its Spectrum are equivalent representations of the same information -- given one, you can always compute the other.

This example demonstrates one reason I use custom libraries in my books: The API is the lesson.  As you learn about these objects and how they interact, you are also learning the core ideas of the topic.

Another reason I think these libraries are a good idea is that they let me introduce ideas top-down: that is, I can show what a method does -- and why it is useful -- first; then I can present details when they necessary or most useful.

For example, I introduce the Spectrum object in Chapter 1.  I use it to apply a low pass filter, and the reader can hear what that sounds like.  You can too, by running the Chapter 1 notebook on Binder.

In Chapter 2, I reveal that my make_spectrum function is a thin wrapper on two NumPy functions, and present the source code:

from np.fft import rfft, rfftfreq

# class Wave:
def make_spectrum(self):
n = len(self.ys)
d = 1 / self.framerate

hs = rfft(self.ys)
fs = rfftfreq(n, d)

return Spectrum(hs, fs, self.framerate)

At this point, anyone who prefers to use NumPy directly, rather than my wrappers, knows how.

In Chapter 7, I unwrap one more layer and show how the FFT algorithm works.  Why Chapter 7?  Because I introduce correlation in Chapter 5, which helps me explain the Discrete Cosine Transform in Chapter 6, which helps me explain the Discrete Fourier Transform.

Using custom libraries lets me organize the material in the way I think works best, based on my experience working with students and seeing how they learn.

This example demonstrates another benefit of defining my own objects: data encapsulation.  When you use NumPy's rfft to compute a spectrum, you get an array of amplitudes, but not the frequencies they correspond to.  You can call rfftfreq to get the frequencies, and that's fine, but now you have two arrays that represent one spectrum.  Wouldn't it be nice to wrap them up in an object?  That's what a Spectrum object is.

Finally, I think these examples demonstrate good software engineering practice, particularly bottom-up design.  When you work with libraries like NumPy, it is common and generally considered a good idea to define functions and objects that encapsulate data, hide details, eliminate repeated code, and create new abstractions.  Paul Graham wrote about this idea in one of his essays on software:
[...] you don't just write your program down toward the language, you also build the language up toward your program. [...] the boundary between language and program is drawn and redrawn, until eventually it comes to rest along [...] the natural frontiers of your problem. In the end your program will look as if the language had been designed for it.
That's why, in the example that makes my correspondent so angry, it takes just three lines to create and add the signals; and more importantly, those lines contain exactly the information relevant to the operations and no more.  I think that's good quality code.

In summary, I provide custom libraries for my books because:

1) They demonstrate good software engineering practice, including bottom-up design and data encapsulation.

2) They let me present ideas top-down, showing how they are used before how they are implemented.

3) And as readers learn the APIs I defined, they are implicitly learning the key ideas.

I understand that not everyone agrees with this design decision, and maybe it doesn't work for everyone.  But I am still surprised that it makes people so angry.