1 files changed, 56 insertions, 14 deletions
diff --git a/report/report.tex b/report/report.tex
index 6895f89..700585f 100644
--- a/report/report.tex
+++ b/report/report.tex
@@ -6,6 +6,8 @@
 \usepackage[utf8]{inputenc}
 \usepackage{booktabs}
 \usepackage{float}
+\usepackage{listings}
+\usepackage{xcolor}
 
 \usepackage{pgf}
 
@@ -31,12 +33,12 @@ Analyzing Performance of Booth’s Algorithm and Modified Booth’s Algorithm}
 \begin{document}
 \maketitle
 \begin{abstract}
-In this paper, the performance of Booth’s Algorithm is compared to modified Booth's Algorithm. Each multiplier is simulated in Python, and performance is observed by counting the number of add and subtract operations for various inputs. Results are analyzed and discussed to highlight the potential tradeoffs to consider when deciding how hardware multiplication is implimented.
+In this paper, the performance of Booth’s Algorithm is compared to modified Booth's Algorithm. Each multiplier is simulated in Python, and performance is observed by counting the number of add and subtract operations for inputs of various lengths. Results are analyzed and discussed to highlight the potential tradeoffs one should consider when deciding what multiplier is to be used.
 \end{abstract}
 \section*{Introduction}
-Multiplication is among the most time consuming mathematical operations for processors. In many applications, the time it takes to multiply dramatically influences the speed of the program. Applications of digital signal processing (such as audio modification and image processing) require constant multiply and accumulate operations for functions such as fast fourier transformations and convolutions. Other applications are heavily dependent on multiplying large matrices, such as machine learning, 3D graphics and data analysis. In such scenarios, the speed of multiplication is vital. Consequently, most modern processors implement hardware multiplication. However, not all multiplication circuits are equal; there is often a stark contrast between performance and hardware complexity. To further complicate things, multiplication circuits perform differently depending on what numbers are being multiplied.
-\section*{Algorithm Description and Simulation}
-Booth's algorithim computes the product of two signed numbers in two's compliment format. To avoid overflow, the result is placed into a register two times the size of the operands (or two registers the size of a single operand). Additionally, the algorithim must work with a space that is exended one bit more then the result. For the purpose of brevity, the result register and extra bit will be refered to as the workspace, as the algorithim will use this space for its computations. First, the multiplier is placed into the workspace and shifted left by 1. From there, an operation is performed based off the last two bits, as shown by the following table:
+Multiplication is among the most time consuming mathematical operations for processors. In many applications, the time it takes to multiply dramatically influences the speed of the program. Applications of digital signal processing (such as audio modification and image processing) require constant multiply and accumulate operations for functions such as fast fourier transformations and convolutions. Other applications are heavily dependent on multiplying large matrices, such as machine learning, 3D graphics and data analysis. In such scenarios, the speed of multiplication is vital. Consequently, most modern processors implement hardware multiplication. However, not all hardware multiplication schemes are equal; there is often a stark contrast between performance and hardware complexity. To further complicate things, multiplication circuits perform differently depending on what numbers are being multiplied.
+\section*{Algorithm Description}
+Booth's algorithim computes the product of two signed numbers in two's compliment format. To avoid overflow, the result is placed into a register two times the size of the operands (or two registers the size of a single operand). Additionally, the algorithim must work with a space that is exended one bit more then the result. For the purpose of brevity, the result register and extra bit will be refered to as the workspace, as the algorithim uses this space for its computations. First, the multiplier is placed into the workspace and shifted left by 1. From there, the multiplier is used to either add or subtract from the upper half of the workspace. The specific action is dependent on the last two bits of the workspace.
 \begin{table}[H]
   \centering
 \begin{tabular}{lll}
@@ -70,14 +72,56 @@ Bit 2 & Bit 1 & Bit 0 & Action \\
 \bottomrule
 \end{tabular}
 \end{table}
-Because some operations require multiplying the multiplicand by 2, an extra bit is added to the most significant side of the workspace to avoid overflow. After each iteration, the result is arithmaticlly shifted right twice. The number of iterations is only half of the length of the operands. After all iterations, the workspace is shifted right once, and the second most significant bit is set to the first most significant bit as the result register does not include the extra bit.\\
-The purpose of this paper is to analyze and compare the peformance of these two algorithms for various operand lengths and values. As such, all arithmatic bitwise operations had to account for the length of operand sizes. Take for example, the arithmatic shift right functions:
-\begin{center}
-  put phseudo code here
-\end{center}
-Additionally, after each iteration, the bits more significant then the workspace length had to be erased (the bitwise functions purposefully do not account for this). 
-\newpage
-\section*{Results}
+Because some operations require multiplying the multiplicand by 2, an extra bit is added to the most significant side of the workspace to avoid overflow. After each iteration, the result is arithmaticlly shifted right twice. The number of iterations is only half of the length of the operands. After all iterations, the workspace is shifted right once, and the second most significant bit is set to the first most significant bit as the result register does not include the extra bit.
+\par
+\section*{Simulation Implimentation}
+Both algorithims were simulated in Python in attempts to utalize its high level nature for rapid development. The table for Booth's algorithim was preformed with a simple if-then loop, while a switch case was used in modified booth's algorithim. Simple integers were used to represent registers. 
+\par
+One objective of this paper is to analyze and compare the peformance of these two algorithms for various operand lengths. As such, the length of operands had to be constantly accounted for. Aritmatic bitwise operations, including finding two's compliment, were all implimented using functions that took length as an input. Further more, extra bits were cleared after each iteration.
+\par
+To track down issues and test the validity of the multipliers, a debug function was written. To allow Python to natively work with the operands, each value is calculated from its two's compliment format. The converted numbers are then multiplied, and the result is compared to both Booth's Algorithim and Modified Booth's Algorithim. To ensure that the debugging function itself doesn't malfunction, all converted operands and expected results are put into a single large table for checking. The exported version of this table can be seen in table X. % TODO
+\section*{Analysis}
+Modified Booth's algorithim only requires half the iterations as Booth's algorithim. As such, it can be expected that the benifit of modified Booth's algorithim increases two fold with bit length. This can be shown by comparing the two curves in figure X. 
+\par
+Despite this, the nature of both algorithims dictate that modified booth's algorithim is not explicitly faster. Iteration count translates to the \textit{maxiumum} number of additions and subtractions. Figure X shows the performance of the two algorithims given different input lengths, while table x shows the actual data made to generate the plot. There are some interesting things to note. When operands contain repeating zeros or ones, both operations preform similarly, as only shifting is required. Operands containing entirely ones or zeros result in idential preformance. On the contrary, alternating bits within operands demonstrate where the two algorithims differ, as almost no bits can be skipped over. Operands made entirely of alternating bits result in the maximum performance diffrence, in which modified booth's algorithim is potentially two times faster.
+\par
+All of this needs to be considered when designing an ALU. Modified booth's algorithim may improve speed, but requires substantially more hardware to impliment. One must consider if die space is to be allocated to optimize multiplication. In many applications, fast multiplication is unnessesary; many early single-chip processors and microcontrollers didn't impliment multiplication, as they were intended for simple embeded applications.
+\section*{Conclusion}
+Hardware multipliers can help accellerate applications in which multiplication is frequent. When implimenting hardware multipliers, it's important to consider the advantages and disadvantages of various multiplier schemes. Modified Booth's algorithim gives diminishing returns for smaller operands and requires significantly more logic. In applications that depend heavily on fast multiplication of large numbers, modified booth's algorithim is optimal.
+% mba generally but not always faster 
+% application should be considered
+% 
+
+\section*{Appendix}
+\definecolor{codegreen}{rgb}{0,0.6,0}
+\definecolor{codegray}{rgb}{0.5,0.5,0.5}
+\definecolor{codepurple}{rgb}{0.58,0,0.82}
+\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
+
+\lstdefinestyle{mystyle}{
+    backgroundcolor=\color{backcolour},
+    commentstyle=\color{codegreen},
+    keywordstyle=\color{magenta},
+    numberstyle=\tiny\color{codegray},
+    stringstyle=\color{codepurple},
+    basicstyle=\ttfamily\footnotesize,
+    breakatwhitespace=false,
+    breaklines=true,
+    captionpos=b,
+    keepspaces=true,
+    numbers=left,
+    numbersep=5pt,
+    showspaces=false,
+    showstringspaces=false,
+    showtabs=false,
+    tabsize=2
+}
+
+\lstset{style=mystyle}
+\lstinputlisting[language=Python]{../booth_multiplier.py}
+% efficiency gets comparitively better over length
+% not much for the smaller operands
+% lots of repeated 1s and 0s very good for both algorithims
 \begin{center}
   \input{performance.pgf}\\
   \input{iterations.pgf}\\
@@ -85,7 +129,5 @@ Additionally, after each iteration, the bits more significant then the workspace
   \input{result_table.tex}\\
 \end{center}
     
-\section*{Analysis}
-\section*{Conclusion}
 \end{document}