\documentclass[onecolumn]{article}
\usepackage[pdftex]{graphicx}
\usepackage{epsfig}
\usepackage[latin1]{inputenc}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{a4}
\author{David A. Foster\\School of Informatics\\University of Edinburgh}
\title{Informatics Research Proposal\\A Filesize Efficient Parallel MPEG Encoder\\}
\date{18 March 2005}
\addtolength{\parskip}{+0.2cm}


\begin{document}

\maketitle

%\addvspace{5.5cm}

%\newpage

\section{Background Information}

The MPEG movie format is a standard encoding method for video data that is widespread.  The raw size of  uncompressed video data makes it prohibitive to store it as such.  MPEG is a lossy compression, meaning that the quality of the video data is reduced in order to save on file size.  It does this through detection of similar color patterns and complicated prediction and estimation techniques.

MPEG compressed movies are compressed in a hierarchical fashion.  This hierarchy can be seen in Figure \ref{fig:hierarchy}.  At the lowest level are the block and macroblocks, of which a single frame can be composed.  In the middle is the slice layer, which are horizontal strips of macroblocks.  A number of slices will make up a single frame, and frames are grouped into the highest level, a Group of Pictures level.  It is the highest level which we are interested in parallelizing.  

\begin{figure}[htb!]
  \begin{center}
    \includegraphics[height=2.5in]{levels}
    \caption{The MPEG Hierarchy \cite{iwata}}
    \label{fig:hierarchy}
  \end{center}
\end{figure}

A Group of Pictures (GOP) contains a sequence of encoded frames stored in a pattern of different frame types.  Three frame types are used; I-frames, which are compressed images, and motion prediction frames, which are P and B-frames.  The motion prediction and compensation is the process by which most MPEGs will be compressed a lot, so their inclusion is essential.  Figure \ref{fig:GOP} is an example of a GOP.  A GOP must always start with an I-frame, and must end with either an I or a P-frame.  However, the data dependencies that are introduced by this method mean that splitting encoding into a parallel algorithm at the frame level would be difficult and inefficient due to the communication overhead that would arise.  These data dependencies are shown in Figure \ref{fig:GOP}.

\begin{figure}[htb!]
  \begin{center}
    \includegraphics[height=1in]{gop}
    \caption{A single GOP showing frame dependencies}
    \label{fig:GOP}
  \end{center}
\end{figure}

\section{Aims and Objectives}

Several parallel implementations already exist for the encoding of MPEG movies at the GOP level, however most deal with a static pattern of frame types and number of frames for the GOPs throughout the entire movie file.  While this makes for an even distribution of workload, it can lead to great inefficiencies in filesize, due to the nature of the encoding process.  Two frames may have trivial changes, which could be stored in a very size efficient P-frame, but if those frames sit on the boundary of two GOPs, at least one would need to be stored as a much larger I-frame.  The other extreme is having two frames which differ so much that the second frame would need to be stored as an I-frame, which is something already built into the standard.  This means that there can be a number of I-frames stored in one GOP when it is not even necessary.

Techniques have been developed for finding where these data based boundaries occur.  This is normally done in a scanning phase prior to the compression of the movie data.  This information can then be used in order to encode the movie in a size optimized manner.  While this has been considered before in a distributed message passing algorithm \cite{moore}, this project aims to be implemented on a shared memory architecture.  In that paper, size efficiency is not the end-result of their methods.  The end result in this project will be comparable in terms of running time, while achieving a non-trivial reduction in the encoded movie's filesize.  This reduction is the focus of the project.

\section{Deliverables}

At the end of the project, a program which will achieve this parallel implementation will be delivered.  Modifying a public-domain cut detection mechanism to produce the necessary information, this implementation will be based on Barbosa et. al.'s method using a bag-of-tasks approach \cite{barbosa}.  It will be designed to run on a shared memory multiprocessor machine, utilized through the use of threading with the common pthreads library.  It will utilize a library for dealing with the video mechanisms, and it will be written in C++.  In addition, a monitoring program will be written in order to show the progress of each worker during the encoding.  This will be console based, using the ncurses library.

As said above, the implementation will be based on a bag-of-tasks parallelism pattern.  Essentially, a fixed number of workers grab similar tasks from a queue of available tasks, process the tasks, and repeat.  This means some synchronization will need to take place in order to coordinate the workers.  In this instance, the tasks will be a full GOP of varying number of frames.  Each encoding worker will be responsible for encoding the entire GOP, including all the frames within that GOP.  After it has completed this task, it will hand the data off to a writing process, which will queue the data for writing out to disk again.  This will also require some synchronization to ensure all completed tasks are entered successfully in the write queue.  Example tasks from the task pool or queue are shown in Figure \ref{fig:tasks}.

\begin{figure}[htb!]
  \begin{center}
    \includegraphics[height=1.5in]{tasks}
    \caption{Differing size GOP tasks from the task pool}
    \label{fig:tasks}
  \end{center}
\end{figure}

\section{Research Plan}

There are a few areas which must be researched before executing the project.  First, a mechanism to do the cut detection will need be found.  It should be possible to find a program that is in the public domain and modify it to meet the requirements of the parallel program.  Since this is a core component and the rest of the implementation's success depends on this, it is the first thing that must be done.  Second, a library for dealing with video data will need to be chosen.  This will be used to do the actaul encoding of the movie, with parameters supplied by data from the cut detection.  Several of these libraries exist, and it would be best to find the one that is portable across different architectures and platforms so that this implementation can be easily ported to other platforms itself.  The target language for the implementation is C++, so C or C++ bindings to the library would be required.

\section{Evaluation}

Parallel algorithms are designed with speed in mind, in that it can greatly improve the execution time of a particular task by splitting the workload among multiple processors.  This speedup is very significant, and this implementation will be measured in running time versus other parallel MPEG encoding implementations on the same hardware.  However, this project's main focus is to show an improvement on the filesize, so it does not attempt to be the fastest implementation.  It is reasonable to estimate that the encoding part will run within 5\% of the time of other implementations.  Note that this does not take into account the cut-detection, which will certainly inflate the execution time.  It may be possible to combine both filesize gain figures and execution time figures, including the cut-detection time, into one figure, which will justify the use of this implementation.

The focus of this implementation is to reduce the filesize of the resulting encoded movie.  The size of the encoded files output by this implementation will be compared to the outputs of other implementations on the same input data.  While it is difficult to say how much of an improvement will be gained, a reduction of around 25 to 50mb in filesize is expected with this implementation as opposed to other implementations which use a uniform-sized GOP.

\section{Relation to Previous Work}

This project is similar to Moore et. al.'s work \cite{moore}, in that it uses a cut detection to improve the efficiency of the MPEG algorithm.  This project differs from their work in a few ways.  First, this project is aimed at shared-memory architectures instead of distributed architectures.  In addition, size efficiency was not the goal of their project, and was an optional switch to turn on.  This project's primary focus is to reduce filesize while gaining speed over a non-parallel implementation.  Barbosa et. al.'s work \cite{barbosa} provides a bag-of-tasks implementation for MPEG encoding, and will serve as a model for this project.  However, the code will be independently developed.

\section{Timetable}

In order to make this project achievable, it is useful to break the project into smaller sections.  There are three main sections, Research, Implementation, and Testing.  Each section culminates in a milestone, but none of the milestones are more important then the completion of the program itself.  From that point, it will be able to performance test it in order to compare it to other works.  Figure \ref{fig:timetable} shows a breakdown of the project tasks and their lengths, and Figure \ref{fig:Gantt} shows a Gantt chart of these tasks.

\begin{figure}[htb!]
  \begin{center}
    \includegraphics[height=2in]{timetable}
    \caption{Timetable}
    \label{fig:timetable}
  \end{center}
\end{figure}

\begin{figure}[htb!]
  \begin{center}
    \includegraphics[width=5in]{gantt}
    \caption{Gantt Chart of timetable}
    \label{fig:Gantt}
  \end{center}
\end{figure}

As shown in Figure \ref{fig:Gantt}, everything but the dissertation should be complete by mid-August, which gives some leeway time should deadlines slip a little.  The dissertation will be written in parallel with development in order to concentrate on the details of each stage.  To aid in dissertation writing and planning, a log will be kept of daily activities, status updates, problems, and plans.

\section{Significance and Impact}

Size efficiency is a key aspect in several developing fields, such as medical imaging, remote video applications, and video archival.  A proper implementation of this project should result in smaller encoded movie files, thus taking less time to transfer and less media to store it on.  The fact that it is being implemented for a shared memory architecture means that home users can take advantage of performance gains with the recent proliferation of dual processor systems, Intel's HyperThreading technology, and in the future introduction of multi-core processors.

\section{Backup Plan}

Should time become a pressing issue, some steps can be taken to ensure completion of the project.  The first would be dropping the ncurses-based monitor application, as that is more of a convenience item anyway.  The application can simply print status updates to the console or a log file.  As a second time saving operation, less time could be allocated for testing on my home machine, as it is not really the target architecture of this parallel implementation, although the intended size-efficiency would be the same as on a much larger multiprocessor machine.

\begin{thebibliography}{9}

\bibitem{iwata}
E. Iwata, K. Olukotun.  \emph{Exploiting Coarse-Grain Parallelism in the MPEG-2 Algorithm}.  Technical Report CSL-TR-98-771, Stanford University.

\bibitem{moore}
J. Moore, W. Lee, S. Dawson.  \emph{Optimal Parallel MPEG Encoding}.  Department of Computer Science, Cornell University.

\bibitem{barbosa}
D.M. Barbosa, J.P. Kitajima, W. Meira Jr.  (1999) \emph{Parallelizing MPEG Video Encoding using Multiprocessors}.  Proceedings of the XII Brazilian Symposium on Computer Graphics and Image Processing, pp. 215-222.



\end{thebibliography}

\end{document}
