\documentclass[onecolumn]{article}
\usepackage[dvips]{graphicx}
\usepackage{epsfig}
\usepackage[latin1]{inputenc}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{a4}
\author{Max C. Goebel\\ School of Informatics\\University of Edinburgh}
\title{Informatics Research Proposal\\Investigations into On-Demand Teaching for Reinforcement Learning\\}
\date{}
%\addtolength{\hoffset}{-0.5cm} \addtolength{\textwidth}{1cm}


\begin{document}

\maketitle

\addvspace{5.5cm}

\newpage

\section{Proposal Overview}

The proposed research presented in this
paper is concerned with the investigation on supervised learning methods when introduced to the reinforcement
learning framework, and in particular with on-demand teaching. The structure of this paper can be laid out as
follows. I begin by briefly outlining the research proposal itself in the setting of related studies and by
pointing out its potential value for future work. I then give a more detailed account of the different
approaches on how and why to combine the two naturally competing learning methodologies of reinforcement and
supervised learning, as suggested by researchers so far. Next, the methods and techniques I plan to use are
described with an emphasis on the reason why I believe these are particularly interesting for further
investigation. Also, the problem domains for this study are presented here. Once the problem has been
identified in the context of its environmental settings, I shall go on to discussing the empirical evaluations
planned for gaining further insight into the behaviour of on-demand teaching. Of particular interest here is
what is hoped to be achieved as an expressive and meaningful outcome of this research. In conclusion, the
proposal is evaluated in terms of being practically feasible to be completed in the given time. As such, it
will be decomposed into individual work packages, giving both estimates on the duration associated with each
respective package and a time line presenting the major milestones planned for the individual deliverables. 

\section{Introduction}

Reinforcement learning (RL) is a machine learning approach that has been studied in great depth due to its sound convergence
properties when applied to Markovian domains, such as sequential games and idealistic ``toy-world'' simulations. The fact that
this unsupervised trial-and-error learning does not require any target data and seems to mimic our own natural way of learning 
has resulted in RL to become the method of choice for a variety of control problems recently, including the wide field of mobile
robotics. While the integration into these new domains often proves straight-forward, it turns out that learning an evaluation
function based on experiment alone is too time-costly for training most real-world applications. In particular, the slow 
convergence rates of traditional RL algorithms can be identified as the primary limitation when being applied to the highly
complex real-world problems, which often exhibit noisy, unfrequent reinforcement, non-deterministic state transitions and multiple
goals. 
\\

To overcome this burden, researchers have introduced supervised learning techniques to act as an outside expert, guiding the agent 
by giving optimal or near-optimal advice and thus speeding up the learning process. The main motivation of such an approach is that 
in most real-world tasks there exists at least some intrinsic domain knowledge of which we can take advantage of so to better 
exploit the sources of training information available, and thereby to learn as efficiently as possible. The human analogy is to 
learn by observing somebody else's performance on solving a task, or, from the opposite perspective, to teach by demonstration.
\\

In the literature, three different concepts have emerged on training agents, the teacher-driven, the learner-driven, and a mixture
of the two. In this paper we are only interested in the learner-driven approach, where the learning agent has the option of asking 
a training agent (teacher) for help. In other words, this approach introduces an ``on demand'' teacher that tells the learning 
agent what to do on the learner's request. For such a concept to work together with the RL framework, two things need to be clarified. 
First, we need to define a strategy for the learning agent that formulates how often and at what cost this additional advice will 
be supplied by the teacher. Second, we need to think about of what quality the advice itself will be, e.g. if the trainer issues 
optimal or sub-optimal advice, or if random noise is incorporated into the advice. Depending on this definition is the implementation 
of the learning agent as in whether it has the choice between accepting and rejecting the advice or whether it always has to follow it. 
[Clouse] in particular has provided a detailed formulation of this on-demand teaching framework.
\\

In the following section, the research proposal is presented, building upon the thoughts arrived at during the last paragraphs.

\section{Aims and Objectives}



\section{Background Material}



\section{Methods and Techniques}

\section{Evaluation}

\section{Outputs}

\section{Research Plan}

\end{document}
