mmp-osp1/docs/report/Chapters_Research/background-and-objectives.tex

\chapter{Background \& Objectives}

%This section should discuss your preparation for the project, including background reading, your analysis of the problem and the process or method you have followed to help structure your work.  It is likely that you will reuse part of your outline project specification, but at the end of the project you should have more to discuss.
%
%\textbf{Note}:
%
%\begin{itemize}
%   \item All of the sections and text in this example are for illustration purposes. The main Chapters are a good starting point, but the content and actual sections that you include are likely to be different.
%
%   \item  Look at the document MMP\_S08 Project Report and Technical Work \cite{ProjectReportTechicalWork} for additional guidance.
%
%\end {itemize}

\section{Background}
%What was your background preparation for the project? What similar systems or research techniques did you assess? What was your motivation and interest in this project?
\subsection{Context}
In a world where most people have a camera in their pocket, the number of pictures being taken every minute has exponentially increased in the last few decades. Smartphones cameras have reached the point where they can rival their fully-fledged counterparts. The necessity\footnote{Both \textit{Google} and \textit{Apple} accounts are required to use their respective app stores and \textit{Apple} doesn't support alternatives app stores.} of online accounts on \textit{Android} and \textit{iOS}, which both include cloud storage (\textit{Google Drive / Google Photos} and \textit{iCloud} respectively) have made automatic cloud backup for photographs the norm. Together, these aspects have lowered the entry requirements and have enabled an entire generation to produce large quantities of high-resolution photographs, which have been previously limited to professional and hobbyist photographers.

The upsurge of data produced and people's reliance on free cloud storage has started to create storage concerns for providers. To combat this, certain providers have taken measures to reduce the load by reducing or introducing service limits\cite{google_photos}\cite{google_workspace}. In turn, this had led people to reconsider how they manage their data. People have had to return to the tedious and manual practise of filtering through their old photographs in order to free up space for new content.

The significance of photo selection isn't limited to storage issues. The rise of social media and influencer-culture has culminated in entire businesses that require carefully selected photographs. \textit{Instagram}, a photo-sharing social media platform, has become the most prominent hub for influencer-based marketing. Its popularity has spawned many Instagram curation apps\cite{instagram_planner}\cite{unum}, which helps users and businesses plan out their posts to work towards a cohesive profile aesthetic\cite{instagram_aesthetic}. This has also caught the attention of data scientists who have investigated what makes certain images perform better online than others\cite{intrinsic_image_popularity}.

Image aesthethic quality assessment is the process of automatically assessing the visual aesthetic rating of an image. This area of research has seen recent improvements in the last 5 years due to the advancements in AI, computer vision, and machine learning. On the AVA dataset\footnote{The AVA (Aesthetic Visual Assessment) dataset is the most common dataset used for aesthetic quality assessment.} alone we have seen accuracy rates of up to 83\% when predicting the aesthetic quality rating of images, See figure \ref{fig:aqa-on-ava}.

\begin{figure}[H]
\centering
\includegraphics[width=\textwidth]{aqa-on-ava}
\caption{Accuracy of models trained on the AVA dataset\cite{aqa-on-ava}}
\label{fig:aqa-on-ava}
\end{figure}
\subsection{Related Work}
\paragraph{Trophy Camera}
In 2017, fine artist Dries Depoorter and professional photographer Max Pinckers developed a camera ``that can only make award winning pictures''\cite{trophy_camera}. This camera was built using a Raspberry Pi and was programmed to only save award-winning photographs, which were subsequently uploaded to a website: \href{https://trophy.camera}{trophy.camera}\cite{trophy_camera_site}. The AI was trained using previous winning photographs from the WPPY (World Press Photo’s of the Year) contest (\textit{Warning: Website includes images some might find disturbing, including death, extreme violence, and suffering})\cite{wpp}. By comparing the photos featured on \href{https://trophy.camera}{\textit{trophy.camera}} and the previous winning photographs of WPPY, you can see the effectiveness of the project is questionable. This isn't surprising, given the stark difference in subject matter and the photography skills of the general public versus the award-winning photographers. The majority of the photographs submitted to the WPPY are taken by professional photographers using high-end cameras (See figure \ref{fig:trophy camera train photo}) and mostly depict global conflict. In contrast, the Trophy Camera is limited to the walls of a gallery, using a low-end camera, and the photos are taken by the general public (See figure \ref{fig:trophy camera photo}). This is an example where the data used for training doesn't match the desired use-case of the project, resulting in images that fail to achieve an ``award winning'' appearance.
\begin{figure}[H]
\begin{subfigure}[b]{0.3\textwidth}
\centering
\includegraphics[width=\textwidth]{trophy-camera}
\caption{Trophy Camera\cite{trophy_camera}}
\label{fig:trophy camera}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.3\textwidth}
\centering
\includegraphics[width=\textwidth]{trophy-camera-image}
\caption{Photo taken with Trophy Camera\cite{trophy_camera_site}}
\label{fig:trophy camera photo}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.3\textwidth}
\centering
\includegraphics[width=\textwidth]{wpp}
\caption{Example of photo used to train Trophy Camera\cite{wpp-spencer}}
\label{fig:trophy camera train photo}
\end{subfigure}
\caption{Trophy Camera}
\label{fig:trophy camera photos}
\end{figure}
\paragraph{Archillect} A combination of the words ``archive'' and ``intellect'', Archillect\cite{archillect} is an AI that automatically curates visually stimulating content found online. To do this, Archillect uses an algorithm and a list of of keywords that searches for posts and pages, then crawls pages linked from original results to find new images and gain relevant contextual knowledge and learn new keywords. Ultimiately, Archillect learns what images create visual stimulation for people on social media, reposting them and updating the algorithm according "likes" and other user engagement metrics.

\begin{figure}[H]
\centering
\includegraphics[width=\textwidth]{archillect}
\caption{\href{https://archillect.com}{archillect.com} homepage\cite{archillect}}
\label{fig:archillect}
\end{figure}

\subsection{Motivation}
Historically, the most common and successful approaches to grading the aesthetic quality of images has been through machine learning. This involves creating a neural network, typtically a CNN (Convulution Neural Network), and training it with professional photography based datasets. The most common of these datasets being AVA (Aesthetic Visual Analysis), which is comprised of a set of professionally taken photographs and an aesthetic rating provided by numerous professional photographers that acts as a ground truth for aesthetic quality.


Many people believe aesthetics are subjective. We can observe this when the aesthetics of certain art pieces can often be contested and contraversial. Considering the percieved subjective nature of aesthetics, there seems to be certain ideas that are almost universally accepted as aesthetic. In art we often hear about composition rules like the golden ratio, rule of thirds, and symmetry which are meant to invoke a positive sentiment. See figure \ref{fig:common photographic techniques}. For others, aesthetics can't be reduced down to a set of predifined rules and many people welcome images that break the natural conventions of professional photography.
\begin{figure}[H]
\centering

\begin{subfigure}[b]{0.45\textwidth}
\centering
\includegraphics[width=\textwidth]{golden-ratio}
\caption{Golden Ratio\cite{golden_ratio}}
\label{fig:golden ratio}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.45\textwidth}
\centering
\includegraphics[width=\textwidth]{symmetry}
\caption{Symmetry\cite{symmetry}}
\label{fig:symmetry}
\end{subfigure}
\begin{subfigure}[b]{0.45\textwidth}
\centering
\includegraphics[width=\textwidth]{rule-of-thirds}
\caption{Rule of Thirds\cite{rule-of-thirds}}
\label{fig:rule of thirds}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.45\textwidth}
\centering
\includegraphics[width=\textwidth]{dof}
\caption{Shallow Depth of Field\cite{dof}}
\label{fig:y equals x}
\end{subfigure}
\caption{Common photographic techniques}
\label{fig:common photographic techniques}
\end{figure}

\subsubsection{Applications}
Automatic image selection is a useful concept that can be used in many applications. The main motivation for the project was to use the system as a acessibility tool, allowing those who are less technically or physically abled to take good aesthetic-quality images. They use a head-mounted camera and record a journey our outing and the tool could automatically reduce the footage into an album of good-looking photos for them to enjoy.

Professional photographers could use automatic image selection to remove technically poor images from a large collection of images from a photoshoot. This process would help alleviate much of the manual filter photographers have to do post photoshoot.

\subsubsection{Personal motivations}
An important motivation for this project and it's direction is my interest in machine learning. The CS36220 Machine Learning\cite{ml-module} module acted as a good introduction to the topic, but I wanted to get some personal experience with deep learning as it's an area of computer science I find really interesting.
\section{Analysis}
%Taking into account the problem and what you learned from the background work, what was your analysis of the problem? How did your analysis help to decompose the problem into the main tasks that you would undertake? Were there alternative approaches? Why did you choose one approach compared to the alternatives?
%
%There should be a clear statement of the research questions, which you will evaluate at the end of the work.
%
%In most cases, the agreed objectives or requirements will be the result of a compromise between what would ideally have been produced and what was felt to be possible in the time available. A discussion of the process of arriving at the final list is usually appropriate.
%


% When processing video files, each frame should be the same resolution
\subsection{Problem Description}
This project aims to explore the question of "Can we use a computer to make accurate aesthetic judgements on images?".
In order to create a program to detect and select aesthetics images from a video file, the video file needs to be broken down into individual frames, then each frame needs to be analysed against certain aesthetic features and rules like (rule of thirds, contrast, brightness, focus). Each of these filters will need to be implemented in such a way that allows any order and combination of filters to be used. Lastly, a machine learning approach will be taken to rate and rank the remaining images. For this we can use a CNN (Convolution Neural Network) which will need to be trained on professional photographs as they can be considered a good base for what is considered a good ``aesthetic'' quality.

\subsection{Approach}
The specific approach taken for this project is to use a combination of machine learning and conventional image processing techniques. This is a good balance between performance and accuracy as the conventional image processing techniques can filter out the poor technical images (blurry, low contrast, extreme exposure etc.) therefore limiting the amount of processing required by the machine learning model at the end of the pipeline.

\subsection{Alternative Approaches}
This project could also be approached from an exclusively coventional computer vision approach, which would require writing specific code to recognise aesthetic photographic techniques (like rule of thirds and vanishing point: see figure \ref{fig:common photographic techniques}) and using those to help determine the aesthethic quality of an image.

A machine learning only approach could also be taken but for the context of this project this would be computationally wasteful. Due to the nature of a continous video recording, many frames will be blurry or under exposed due to lighting changes. These images are technically poor in quality and processing hundres of them using a CNN would be wasteful when they could be more easily discarded using conventional image processing techniques.

\subsection{Aim}
The aim of the project is to develop a piece of software that will take a video or a set of images as an input, process them through filtering, and output a subset of "aesthetic" images or frames. The set will be processed through filters which will be implementated as part of the project.
\subsection{Objectives}
\begin{itemize}
\item Context specific research into machine learning
	\begin{itemize}
	\item Research suitable machine learning frameworks
	\item Research how to start building a CNN
	\end{itemize}
\item Setup tools for development
	\begin{itemize}
	\item Host a Gitea instance to hsot the project's repository
	\item Create repository for project
	\item Mirror repository to GitLab.com for availability
	\item Setup WoodpeckerCI for CI/CD workloads and connect it to the project repository
	\item Setup pipelines for automated testing in WoodpeckerCI
	\end{itemize}
\item Research convential image processing techniques
	\begin{itemize}
	\item Quantify brightness
	\item Quantify contrast
	\item Quantify focus
	\item Depth of field detection
	\item Vanishing point detection
	\end{itemize}
\end{itemize}
\subsubsection{Research Questions}
\begin{itemize}
\item How effective is the machine learning approach compared to a more conventional one?
\item Which order of filtering is most effective?
\item Can a CNN be used to make aesthetic judgements of an image?
\end{itemize}

%You need to describe briefly the life cycle model or research method that you used. You do not need to write about all of the different process models that you are aware of. Focus on the process model or research method that you have used. It is possible that you needed to adapt an existing method to suit your project; clearly identify what you used and how you adapted it for your needs.
%
%For the research-oriented projects, there needs to be a suitable process for the construction of the software elements that support your work.