Manifold

Listen

Digital audio, 29′05″. Source code

This is really two compositions back to back. The first, 12′00″, is based on a recording at an elementary school. The second, 19′05″, comes from a recording of an outdoor evening party. These were made in Onishi, Gunma, Japan, in the course of an artist’s residency at Shiro Oni Studio, August–September 2017. The piece was exhibited during a regional fall arts festival held in Onishi at the end of September. Visitors were invited to climb a ladder to a lofted storage space in a disused brewery and sit in the dark listening to the piece, which played on loop.

I started by making field recordings using a Sony PCM-M10 with the onboard microphones. Then I sliced the recordings into overlapping segments a half-second long and extracted from each segment a large number of features describing the spectral envelope of the sound. Next I reinterpreted these features using a family of statistical learning techniques known as manifold learning. Finally I reassembled the segments in a new sequence according to where they fell on the manifold. If you sit and listen for five or ten minutes you’ll hear a subtle but definite change in the quality of the sound.

For nerds. Sources were sampled at 48/24. Shingles of 500ms stepped at 250ms were analyzed to recover mel frequency cepstral coefficients and spectral flatnesses. The feature space comprised the first four central moments (arithmetic mean, variance, skewness, and kurtosis) of the first 13 MFCCs, along with the first four central moments of the first and second derivatives of the MFCCs, together with the first four central moments of the spectral flatness for binary-log fbands. This made for a feature space of cardinality 180. I experimented with a variety of locally linear embedding strategies before concluding that vanilla PCA yielded the best-sounding results, i.e., those that best allowed listeners to detect a trend in timbre over the course of the composition. The feature space strategy was inspired by Dupont and Ravet, “Improved audio classification using a novel non-linear dimensionality reduction ensemble approach.” Unlike Dupont and Ravet, I did not use t-SNE, in part because while it is well suited to categorical classification it does a poor job preserving neighborness. For details on filtering and justification of embedding strategy see the source code.