Is this a big deal?

edmack · on Aug 30, 2016

This area is a big deal - ML networks need to be much deeper and denser to provide human-level understanding, and training networks is currently a considerable bottleneck.

visarga · on Aug 30, 2016

Does this method make it easier to spread a neural network over multiple GPUs/machines? I mean, does it reduce the amount of data being communicated between compute nodes or just decouples the updates from the need to wait for the rest of the net to finish?

nl · on Aug 30, 2016

Does this method make it easier to spread a neural network over multiple GPUs/machines?

Yes, but this isn't the primary focus of this work.

This is about a method of approximating error rates (gradient) back up the neural network.

This is important because allowing the use of approximate error rates means that earlier layers can be trained without waiting for error back-propagation from the later layers.

This asynchronous feature helps on a (computer) network too - there is no need to wait for back-propagation across the network.

As they point out the error will back-propagation eventually. The analogy with an eventually consistent database system (and the effect that has on scalability) is pretty clear.

alphonse23 · on Aug 30, 2016

ANN are not that big of a deal, IMHO, when you compare them to other machine learning techniques i.e. Support Vector Machines. Also, see https://en.wikipedia.org/wiki/Artificial_neural_network#Theo...

Though, this article is so well presented, it deserves an awards for how pretty it is.

Anm · on Aug 30, 2016

My gut thinks this sort of training alternative to back propagation has a lot of uses where SVM have no applicability. The article talks a lot about RNNs (neural nets for sequence prediction), but I would guess it would have uses in online learning as well. Learning twice as fast in those situations seems pretty significant to me.

alphonse23 · on Aug 30, 2016

I can't believe I'm getting down-voted just because I'm not bullish on ANN.

As I said, in my humble opinion (IMHO), (and educated opinion) NN don't really have a lot of practical use. So long as they have to be processed in parallel, SVM will always have the advantage that they can be computed sequentially, meaning they can process much faster and without the need for specialized hardware. SVMs and ANN are solving the same problem in machine learning, they're both methods used for classifying data. Just SVMs do it much faster and within more practical means.

ogrisel · on Aug 30, 2016

The classical solver used to train kernel SVMs and implemented in libsvm [1] has a time complexity in between o(n^2) and o(n^3) where n is the number of labeled samples in the training set. In practice it becomes intractable to train a non-linear kernel SVM as soon as the training set is larger than a few tens of thousands of labeled samples.

Deep Neural Networks trained with variants of Stochastic Gradient Descent on the other hand have no problem scaling to training sets with millions of labeled samples which makes them suitable to solve large industrial-scale problems (e.g. speech recognition in mobile phones or computer vision to help moderate photos that are posted on social networks).

SVMs can be useful in the small training sets regime (less than 10 000 training examples). But for that class of problems, it's also perfectly reasonable to use a single CPU (with 2 or 4 cores) with a good linear algebra library such as OpenBLAS or MKL to train an equally powerful fully connected neural network with 1 or 2 hidden layers. Hyper-parameter tuning for SVMs can be easier for beginners using SVMs with the default kernels (e.g. RBF or polynomial) but with modern optimizers like Adam implemented in well designed and well documented high-level libraries like Keras it has become very easy to train neural networks that just work.

Also for many small to medium scales problems that are not signal-style problems [2], Random Forests and Gradient Boosted Trees tend to perform better than SVMs. Most Kaggle competitions are won with either linear models (e.g. logistic regression), Gradient Boosting, neural networks or a mix of those. Very few competitors have used kernel-based SVMs in a winning entry AFAIK.

[1] https://en.wikipedia.org/wiki/Sequential_minimal_optimizatio...)

[2] By "signal-style" I mean problems such as image or audio processing.

signa11 · on Aug 30, 2016

> I can't believe I'm getting down-voted just because I'm not bullish on ANN.

probably it is happening because ANN's definitely do have some advantage over SVM's for modelling real world phenomenon.

specifically, ANN's are _parametric_, while SVM's are nonparametric in the sense that for an ANN, you have a bunch of hidden layers (of varying sizes) depending on the number of features, and a bias parameter. this is your model.

SVM's otoh (at least in the kernelized case), consist of set of support-vectors selected from a training set. which in worst case can be as large as the training set.

modelling real world phenomenon, for example optimal air-conditioning based on a large number of external inputs in a data center, are far more conducive in ANN's than SVM's. ANN's are afterall universal approximators. with SVM's you have to guess the kernel...

edit-001 : see for example, this paper: http://deeplearning.net/wp-content/uploads/2013/03/dlsvm.pdf , where folks try their hands at deep-learning via SVM's.

nl · on Aug 30, 2016

I can't believe I'm getting down-voted just because I'm not bullish on ANN.

I don't think it's that - I think it is because you are factually wrong. In particular this part: "ANN are not that big of a deal, IMHO, when you compare them to other machine learning techniques i.e. Support Vector Machines. " is wrong.

(Deep) Neural Networks are a very, very big deal because they work so much better than SVNs in every domain where there is sufficient training data, sufficient time (and enough GPUs!) to train them.

This post is a big deal because it shows a way to cut down that training time.

phreeza · on Aug 30, 2016

I believe SVM>ANN might be true if the classification task is relatively simple. Most of the state-of-the-art computer vision algorithms are based on ANNs, though (just as an example). Do you think SVMs will catch up in those areas?

(I upvoted you even though I don't agree with you, just thought getting greyed out was excessive)