report/report.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

\documentclass[12pt]{article}
\usepackage{geometry}
\usepackage{titling}
\usepackage{helvet}
\usepackage{tgpagella} % text only
\usepackage[utf8]{inputenc}
\usepackage{booktabs}
\usepackage{float}

\usepackage{pgf}

\newcommand{\mathdefault}[1][]{}


\geometry{
  a4paper,
  lmargin=1in,
  rmargin=1in,
  tmargin=1in,
  bmargin=1in,
}
\setlength{\droptitle}{-3em}   % This is your set screw


\title{\fontfamily{phv}\selectfont
Analyzing Performance of Booth’s Algorithm and Modified Booth’s Algorithm}
\author{Brett Weiland}

\begin{document}
\maketitle
\begin{abstract}
In this paper, the performance of Booth’s Algorithm is compared to modified Booth's Algorithm. Each multiplier is simulated in Python, and performance is observed by counting the number of add and subtract operations for various inputs. Results are analyzed and discussed to highlight the potential tradeoffs to consider when deciding how hardware multiplication is implimented.
\end{abstract}
\section*{Introduction}
Multiplication is among the most time consuming mathematical operations for processors. In many applications, the time it takes to multiply dramatically influences the speed of the program. Applications of digital signal processing (such as audio modification and image processing) require constant multiply and accumulate operations for functions such as fast fourier transformations and convolutions. Other applications are heavily dependent on multiplying large matrices, such as machine learning, 3D graphics and data analysis. In such scenarios, the speed of multiplication is vital. Consequently, most modern processors implement hardware multiplication. However, not all multiplication circuits are equal; there is often a stark contrast between performance and hardware complexity. To further complicate things, multiplication circuits perform differently depending on what numbers are being multiplied.
\section*{Algorithm Description and Simulation}
Booth's algorithim computes the product of two signed numbers in two's compliment format. To avoid overflow, the result is placed into a register two times the size of the operands (or two registers the size of a single operand). Additionally, the algorithim must work with a space that is exended one bit more then the result. For the purpose of brevity, the result register and extra bit will be refered to as the workspace, as the algorithim will use this space for its computations. First, the multiplier is placed into the workspace and shifted left by 1. From there, an operation is performed based off the last two bits, as shown by the following table:
\begin{table}[H]
  \centering
\begin{tabular}{lll}
\toprule
Bit 1 & Bit 0 & Action   \\ 
\midrule
0     & 0     & None     \\
0     & 1     & Add      \\
1     & 0     & Subtract \\
1     & 1     & None     \\
\bottomrule
\end{tabular}
\end{table}
After all iterations are complete, the result is arithmaticlly shifted once to the left, and the process repeats for the number of bits in an operand.\\

Modified booth's algorithim functions similar to Booth's algorithim, but checks the last \textit{three} bits instead. As such, there are a larger selection of actions for each iteration:
\begin{table}[H]
  \centering
\begin{tabular}{llll}
\toprule
Bit 2 & Bit 1 & Bit 0 & Action \\
\midrule
0     & 0     & 0     & None   \\
0     & 0     & 1     & Add    \\
0     & 1     & 0     & Add    \\
0     & 1     & 1     & Add $\times 2$ \\
1     & 0     & 0     & Sub $\times 2$ \\
1     & 0     & 1     & Sub    \\
1     & 1     & 0     & Sub    \\
1     & 1     & 1     & None   \\
\bottomrule
\end{tabular}
\end{table}
Because some operations require multiplying the multiplicand by 2, an extra bit is added to the most significant side of the workspace to avoid overflow. After each iteration, the result is arithmaticlly shifted right twice. The number of iterations is only half of the length of the operands. After all iterations, the workspace is shifted right once, and the second most significant bit is set to the first most significant bit as the result register does not include the extra bit.\\
The purpose of this paper is to analyze and compare the peformance of these two algorithms for various operand lengths and values. As such, all arithmatic bitwise operations had to account for the length of operand sizes. Take for example, the arithmatic shift right functions:
\begin{center}
  put phseudo code here
\end{center}
Additionally, after each iteration, the bits more significant then the workspace length had to be erased (the bitwise functions purposefully do not account for this). 
\newpage
\section*{Results}
\begin{center}
  \input{performance.pgf}\\
  \input{iterations.pgf}\\
  \input{speed_table.tex}\\
  \input{result_table.tex}\\
\end{center}
    
\section*{Analysis}
\section*{Conclusion}
\end{document}