During the introductory class of our Neural Networks course a classmate asked me this question. This is a really good question. I thought I knew the answer, until I sat down to write it. So I digged a little deeper and came across this paper “Reinforcement Learning and its Relationship to Supervised Learning ANDREW G. BARTO and THOMAS G. DIETTERICH”. A tldr version from my understanding can be found below:
http://www-anw.cs.umass.edu/pubs/2004/barto_d_04.pdf
Supervised Learning
In supervised learning, the learner is given training examples of the form (xi, yi), where each input value xi is usually an n-dimensional vector and each output value yi is a scalar (either a discrete-valued quantity or a real-valued quantity). It is assumed that the input values are drawn from some fixed probability distribution D(x) and then the output values yi are assigned to them.
A supervised learning algorithm takes a set of training examples as input and produces a classifier or predictor as output.
The best possible classifier/predictor for data point x would be the true function f(x) that was used to assign the output value y to x. However, the learning algorithm only produces a “hypothesis” h(x). The difference between y and h(x) is measured by a loss function, L(y, h(x)). The goal of supervised learning is to choose the hypothesis h that minimizes the expected loss.
Reinforcement Learning:
RL comes in when examples of desired behaviour are not available but it is possible to score examples of behaviour according to some performance criteria.
For example, if you are in an area of poor cellular network coverage, you move around and check the signal strength. You will keep doing this until you find a place with adequate signal strength or till you find the best place in the given circumstances. Here the information we receive is not telling us where we should go or in which direction should we move to obtain a better signal. Each reading just allows us to evaluate the goodness of our current situation. We have to move around and explore in order to determine where we should go.
Given a location x in the world, R(x) the reward at that position, the goal of RL is to determine the location x* that maximizes R and yield the maximum reward R(x*). A RL system is not given R, nor is it given training examples; instead it has the ability to take actions (choose values of x) and observe the resulting reward R(x).
RL combines search and long term memory. Search results are stored in such a way that search effort decreases and possibly disappears,with continued experience.
Difference:
1. In RL there is no fixed distribution D(x) from which the data points x are drawn.
2. The goal in RL is not to predict the output values y for a given input x, instead to find a single value x* that gives maximum reward.
super tldr;
Reinforcement Learning: Examples of correct behaviour not given, but ‘goodness of current situation’ known. –> Maximize unknown reward function.
Supervised Learning: Examples of correct behaviour given, find the hypothesis function h which best maps input to output. At the same time taking care to avoid overfitting.