Weekend Reading Round-Up (Dec. '17)

Hot off the press! Articles on deep learning, early-warning signs of critical transitions, recommendation engines, support vector machines, and music.

Auto-Tuning Data Science: New Research Streamlines Machine Learning

MIT News reports about a recent research paper that promises automated and faster model selection (incl. hyperparameter tuning) for machine learning. The essence of these auto-tuned models (ATM) is that a search over model options is performed to come up with the best modelling technique and hyperparameters. The current open-source implementation of ATM includes SVMs, random forests, extreme trees, decision trees, k-nearest neighbours, logistic regression, naive Bayes methods with various kernels, Gaussian processes, multi-layer perceptrons, and deep belief networks.

A Computational Study on Outliers in World Music

Folk and traditional music are the focus of a fresh study in PLOS ONE. It looks at outliers in world music based on a fairly limited corpus 8,200 recordings from 137 countries mostly from 1950–2010, from which they took 30-second samples. For the speech/music segmentation they use MIREX 2015’s winning algorithm trained on folk music, which uses Mel frequency cepstrum coefficients (MFCCs), spectral entropy, tonality, and 4 Hz modulation. The authors use country as a proxy for the music style.

In terms of feature engineering, they extract onset patterns with the scale transform (rhythm), pitch bi-histograms (melody), chromagrams (harmony), and MFCCs (timbre). Out of the 8,200 recordings they identified 1,706 outliers (20%), which seems to defy the commonly accepted definition of an outlier, but OK.

Hidden Early-Warning Signals in Scale-Free Networks

A critical transition occurs when a system transitions abruptly from one state into another. When such a system (e.g. the global economy, the Earth’s climate, a human brain) goes from a good (or at least less bad) state to a bad, it would be handy if we were able to identify the transition ahead of time. That’s an early-warning signal. A standard indicator of an imminent critical transition is the slowing down of recovery time after perturbations, which often manifests itself in time series as autocorrelation at low lags (i.e. short-term memory) and changes in the variance, skewness, and kurtosis. In PLOS ONE the researchers study simulated two-state scale-free networks, that is, networks where the degree distribution follows a power law and the edges have two possible states. Their findings indicate that to be able to spot hidden early-warning signals, the natural state of a system is not enough for all scale-free networks. The so-called effective state, which takes into account the ‘typical’ node’s overall importance and neighbouring topology, is able to foresee impending critical transitions, especially when the links are distributed in a less uniform manner.

Deep Rewiring: Training Very Sparse Deep Networks

In an arXiv preprint four researchers propose DEEP R, an algorithm to train deep neural networks. The algorithm rewires the network’s connections by randomly re-activating dormant connections so that there is always a fixed number of connections active. Gradients for the stochastic gradient descent (with some noise added) are only computed for active connections. Why add noise to the gradients? It turns out that this is equivalent to solving a constraint optimization problem that converges to a stationary distribution of network configurations for which the largest probabilities belong to the best-performing configurations.

Try This! Researchers Devise Better Recommendation Algorithm

MIT News reports progress in recommendation engines where ratings are sparse. People who buy toilet paper off Amazon or watch documentaries on dolphins on Netflix may not rate many of these products or films. Many consumers often gravitate towards rating only the extremely negative or sublimely positive experiences. For collaborative filtering that is often a problem: if there are few people with similar ratings (as a proxy for tastes), the number of recommendations that can be served is limited. Companies often go around this by looking at other signals: Amazon’s ‘People who bought this item also bought…’ looks at purchase histories too rather than ratings alone. The assumption at the heart of the researchers’ proposed methodology is that people have their own value functions (e.g. one person prefers action movies, and another loves costume dramas) that remain roughly constant, and these value functions feed off the same features (e.g. genre, lead actors, director, and number of Oscar nominations). The novel approach comes with performance guarantees and it should outperform most known recommendation algorithms.

IBM Scientists Demonstrate 10x Faster Large-Scale Machine Learning Using GPUs

EPFL and IBM scientists collaborated to come up with a scheme to train linear machine learning models (e.g. SVMs) for TB-and-beyond data sets. The crux is that the overhead to move batches of data (that fit in the GPU’s memory) between the CPU and GPU is expensive (i.e. slow). In some cases training on a CPU is actually faster than on a GPU.

The key insight is that not all data is of the same value to the algorithm all the time. Their duality-gap-based heterogeneous learning (DuHL) looks at how much information each training example contributes to the overall progress of the learning algorithm. The 10x speed-up listed in the title is compared to CPUs, by the way.

JamBot: Music Theory Aware Chord-Based Generation of Polypohonic Music with LSTMs

An arXiv preprint introduces JamBot, a neural network that can ‘compose’ music based on elementary music theory. The novel bit is that chord progressions are used as a guide for the melody rather than a separate (independent) entity. MIDI was used to train the model on major/minor scales (all transposed to C to avoid having too little training data for rare keys), but apparently the authors are not bothered with the diatonic modes other than Ionian (major scale) and Aeolian (natural minor scale): Dorian, Phrygian, Lydian, Mixolydian, and Locrian. Sure, a C Dorian mode has the same signature as B♭ major key (two flats), but it’s harmonically different. Moreover, they assume chords are based on the three most common notes over an entire bar, which completely ignores inversions, and more complex chords, which they do acknowledge though. To their credit they do look at melodic minor and blues scales, which are not diatonic scales.

The polyphonic LSTM uses a piano roll representation. For the chord LSTM they use a word-embedding-esque variation, so that the context of chords can be inferred in the training of the model. Interestingly, the researchers obtain the circle of fifths from these embeddings! How harmonic the chord progressions are depends on a sampling temperature (parameter) and of course the ear of the beholder.