Google Brain AMA Learnings

Last year the Google brain team organised a Ask-Me-Anything on reddit. It is an amazing AMA which I encourage everyone to read. However in case you do not have the time to go through the whole thing, I present some of the key take aways and learnings from the AMA.

“our research directions have definitely shifted and evolved based on what we’ve learned. For example, we’re using reinforcement learning quite a lot more than we were five years ago, especially reinforcement learning combined with deep neural nets. We also have a much stronger emphasis on deep recurrent models than we did when we started the project, as we try to solve more complex language understanding problems.”

“Machine learning is equal parts plumbing, data quality and algorithm development. (That’s optimistic. It’s really a lot of plumbing and data :).“

Underrated methods:

  • Random Forests and Gradient Boosting
  • Evolutionary approaches
  • The general problem of intelligent automated collection of training data
    Treating neural nets as parametric representations of programs, rather than parametric function approximators.
  • NEAT
  • Careful cleanup of data, e.g. pouring lots of energy into finding systematic problems with metadata

Exciting Work:

  • The problem of robotics in unconstrained environments is at the perfect almost-but-not-quite-working spot right now, and that deep learning might just be the missing ingredient to make it work robustly in the real world.
  • Architecture search is an area we are very excited about. We could be getting to the point where it may soon be computationally feasible to deploy evolutionary algorithms in large scale to complement traditional deep learning pipelines.
  • Excited by the potential for new techniques (particularly generative models) to augment human creativity. For example, neural doodle, artistic style transfer, realistic generative models, the music generation work being done by Magenta.
  • All the recent work in unsupervised learning and generative models.
    anything related to deep reinforcement learning and low sample complexity algorithms for learning policies. We want intelligent agents that can quickly and easily adapt to new tasks.
  • Moving beyond supervised learning. I’m especially excited to see research in domains where we don’t have a clear numeric measure of success. But I’m biased… I’m working on Magenta, a Brain effort to generate art and music using deep learning and reinforcement learning

Resources:

  • https://keras.io/ : Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
  • http://www.arxiv-sanity.com/ : Get the best of airxiv; also find similar papers according to tf-idf
  • /r/MachineLearning
  • https://nucl.ai/blog/neural-doodles/ : Neural Doodles!!

Difference between Reinforcement Learning and Supervised Learning

During the introductory class of our Neural Networks course a classmate asked me this question. This is a really good question. I thought I knew the answer, until I sat down to write it. So I digged a little deeper and came across this paper “Reinforcement Learning and its Relationship to Supervised Learning ANDREW G. BARTO and THOMAS G. DIETTERICH”. A tldr version from my understanding can be found below:

http://www-anw.cs.umass.edu/pubs/2004/barto_d_04.pdf

 

Supervised Learning

In supervised learning, the learner is given training examples of the form (xi, yi), where each input value xi is usually an n-dimensional vector and each output value yi is a scalar (either a discrete-valued quantity or a real-valued quantity). It is assumed that the input values are drawn from some fixed probability distribution D(x) and then the output values yi are assigned to them.

A supervised learning algorithm takes a set of training examples as input and produces a classifier or predictor as output.

The best possible classifier/predictor for data point x would be the true function f(x) that was used to assign the output value y to x. However, the learning algorithm only produces a “hypothesis” h(x). The difference between y and h(x) is measured by a loss function, L(y, h(x)).  The goal of supervised learning is to choose the hypothesis h that minimizes the expected loss.

 

Reinforcement Learning: 

RL comes in when examples of desired behaviour are not available but it is possible to score examples of behaviour according to some performance criteria.

For example, if you are in an area of poor cellular network coverage, you move around and check the signal strength. You will keep doing this until you find a place with adequate signal strength or till you find the best place in the given circumstances. Here the information we receive is not telling us where we should go or in which direction should we move to obtain a better signal. Each reading just allows us to evaluate the goodness of our current situation. We have to move around and explore in order to determine where we should go.

Given a location x in the world, R(x) the reward at that position, the goal of RL is to determine the location x* that maximizes R and yield the maximum reward R(x*). A RL system is not given R, nor is it given training examples; instead it has the ability to take actions (choose values of x) and observe the resulting reward R(x).

RL combines search and long term memory. Search results are stored in such a way that search effort decreases and possibly disappears,with continued experience.

 

Difference:

1. In RL there is no fixed distribution D(x) from which the data points x are drawn.

2. The goal in RL is not to predict the output values y for a given input x, instead to find a single value x* that gives maximum reward.

 

super tldr;

Reinforcement Learning: Examples of correct behaviour not given, but ‘goodness of current situation’ known. –> Maximize unknown reward function.

 

Supervised Learning: Examples of correct behaviour given, find the hypothesis function h which best maps input to output. At the same time taking care to avoid overfitting.