report/report.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

\documentclass[12pt]{article}
\usepackage{geometry}
\usepackage{titling}
\usepackage{helvet}
\usepackage{tgpagella} % text only
\usepackage[utf8]{inputenc}
\usepackage{booktabs}
\usepackage{float}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{layouts}
\usepackage{pdflscape}
\usepackage{rotating}
\usepackage{longtable}

\usepackage{pgf}

\newcommand{\mathdefault}[1][]{}

\geometry{
  a4paper,
  lmargin=1in,
  rmargin=1in,
  tmargin=1in,
  bmargin=1in,
}
\setlength{\droptitle}{-3em}   % This is your set screw

\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.95,0.95,0.92}

\title{\fontfamily{phv}\selectfont
Analyzing Performance of Booth’s Algorithm and Modified Booth’s Algorithm}
\author{Brett Weiland}

\begin{document}
\maketitle
\begin{abstract}
In this paper, the performance of Booth’s algorithm is compared to modified Booth's algorithm. Each multiplier is simulated in Python. The multipliers are benchmarked by counting the number of add and subtract operations for inputs of various lengths. Results are analyzed and discussed to highlight the potential tradeoffs one should consider when deciding what multiplier is to be used.
\end{abstract}
\section*{Introduction}
Multiplication is among the most time consuming mathematical operations for processors. In many applications, the time it takes to multiply dramatically influences the speed of the program. Applications of digital signal processing (such as audio modification and image processing) require constant multiply and accumulate operations for functions such as fast fourier transformations and convolutions. Other applications are heavily dependent on multiplying large matrices, such as machine learning, 3D graphics and data analysis. In such scenarios, the speed of multiplication is vital. Consequently, most modern processors implement hardware multiplication. However, not all hardware multiplication schemes are equal; there is often a stark contrast between performance and hardware complexity. To further complicate things, multiplication circuits perform differently depending on what numbers are being multiplied.
\section*{Algorithm Description}
Booth's algorithim computes the product of two signed numbers in two's compliment format. To avoid overflow, the result is placed into a register two times the size of the operands (or two registers the size of a single operand). Additionally, the algorithim must work with a space that is exended one bit more then the result. For the purpose of brevity, the result register and extra bit will be refered to as the workspace, as the algorithim uses this space for its computations. First, the multiplier is placed into the workspace and shifted left by 1. From there, the multiplicand is used to either add or subtract from the upper half of the workspace. The specific action is dependent on the last two bits of the workspace.
\begin{table}[H]
  \centering
\begin{tabular}{lll}
\toprule
Bit 1 & Bit 0 & Action   \\ 
\midrule
0     & 0     & None     \\
0     & 1     & Add      \\
1     & 0     & Subtract \\
1     & 1     & None     \\
\bottomrule
\end{tabular}
\end{table}
After all iterations are complete, the result is arithmaticlly shifted once to the right, and the process repeats for the number of bits in an operand. 
\par
Modified booth's algorithim functions similar to Booth's algorithim, but checks the last \textit{three} bits instead. As such, there are a larger selection of actions for each iteration:
\begin{table}[H]
  \centering
\begin{tabular}{llll}
\toprule
Bit 2 & Bit 1 & Bit 0 & Action \\
\midrule
0     & 0     & 0     & None   \\
0     & 0     & 1     & Add    \\
0     & 1     & 0     & Add    \\
0     & 1     & 1     & Add $\times 2$ \\
1     & 0     & 0     & Sub $\times 2$ \\
1     & 0     & 1     & Sub    \\
1     & 1     & 0     & Sub    \\
1     & 1     & 1     & None   \\
\bottomrule
\end{tabular}
\end{table}
Because some operations require doubling the multiplicand, an additional extra bit is added to the most significant side of the workspace to avoid overflow. After each iteration, the result is arithmaticlly shifted right twice. The number of iterations is only half of the length of the operands. After all iterations, the workspace is shifted right once, and the second most significant bit is set to the first most significant bit as the result register does not include the extra bit.

\par
\section*{Simulation Implimentation}
Both algorithims were simulated in Python in attempts to utalize its high level nature for rapid development. The table for Booth's algorithim was preformed with a simple if-then, while a switch case was used in modified booth's algorithim. Simple integers were used to represent registers. 
\par
One objective of this paper is to analyze and compare the peformance of these two algorithms for various operand lengths. As such, the length of operands had to be constantly accounted for. Aritmatic bitwise operations, including finding two's compliment, were all implimented using functions that took length as an input. Further more, extra bits were cleared after each iteration.
\par
To track down issues and test the validity of the multipliers, a debug function was written. To allow Python to natively work with the operands, each value is calculated from its two's compliment format. The converted numbers are then multiplied, and the result is used to verify both Booth's Algorithim and Modified Booth's Algorithim. To ensure that the debugging function itself doesn't malfunction, all converted operands and expected results are put into a single large table for checking. The exported version of this table can be seen on the last page  in table \ref{debug_table}. % TODO

The pseudo code below illustrates how each algorithim was implimented in software. For the full code, refer to the listing at the end of the document.\\
\begin{verbatim}
Booth:
  result = multiplier << 1
  loop (operand length) times:
    if last two bits are 01:
      result(upper half) += multiplicand
    if last two bits are 10:
      result(upper half) += twos_comp(multiplicand)
    remove extra bits from result
    arithmatic shift result right
result >> 1
\end{verbatim}
\begin{verbatim}
Modified booth:
  multiplicand(MSB) = multiplicand(second MSB)
  result = multiplier << 1
  loop (operand length / 2) times:
    if last two bits are 001 or 010:
      result(upper half) += multiplicand
    if last two bits are 011:
      result(upper half) += multiplicand * 2
    if last two bits are 100:
      result(upper half) += twos_comp(multiplicand) * 2
    if last two bits are 101 or 110:
      result(upper half) += twos_comp(multiplicand)
    remove extra bits from result
    arithmatic shift result right twice
  result >> 1
  result(second MSB) = result(MSB)
  result(MSB) = 0
\end{verbatim}

\section*{Analysis}
Modified Booth's algorithim only requires half the iterations of Booth's algorithim. As such, it can be expected that the benifit of modified Booth's algorithim increases two fold with bit length. This can be shown by comparing the two curves in figure \ref{igraph}. 
\begin{figure}[h]
  \centering
  \input{iterations.pgf}\\
    \captionof{figure}{Iteration count of various operand lengths.}
    \label{igraph} 
\end{figure}
  
\par
Despite this, the nature of both algorithims dictate that modified booth's algorithim is not explicitly faster. Iteration count translates to the \textit{maxiumum} number of additions and subtractions. Figure \ref{pgraph} shows the performance of the two algorithims given different input lengths, while table \ref{speed_table} shows the actual data used to generate the plot. There are some interesting things to note. When operands contain repeating zeros or ones, both operations preform similarly, as only shifting is required. Operands containing entirely ones or zeros result in idential preformance. On the contrary, alternating bits within operands demonstrate where the two algorithims differ, as almost no bits can be skipped over. Operands made entirely of alternating bits result in the maximum performance diffrence, in which modified booth's algorithim is up to two times faster.
\begin{figure}[H]
  \centering
    \input{performance.pgf}\\
    \captionof{figure}{Add and Subtract operations of various operand lengths.}
    \label{pgraph} 
\end{figure}
\par
All of this needs to be considered when deciding between the two algorithims. Modified booth's algorithim may improve speed, but requires substantially more hardware to impliment. One must consider if it's worth the cost to optimize multiplication. In many applications, fast multiplication is unnessesary; many early single-chip processors and microcontrollers didn't impliment multiplication, as they were intended for simple embeded applications.
\section*{Conclusion}
Hardware multipliers can help accellerate applications in which multiplication is frequent. When implimenting hardware multipliers, it's important to consider the advantages and disadvantages of various multiplier schemes. Modified Booth's algorithim gives diminishing returns for smaller operands and requires significantly more logic. In applications that depend heavily on fast multiplication of large numbers, modified booth's algorithim is optimal.
% mba generally but not always faster 
% application should be considered
% 

\newpage
\section*{Appendix}

% efficiency gets comparitively better over length
% not much for the smaller operands
% lots of repeated 1s and 0s very good for both algorithims
\begin{figure}[h]
  \centering
  \input{speed_table.tex}
  \captionof{table}{Number of additions and subtractions for various inputs.}
  \label{speed_table}
\end{figure}
\begin{figure}[H]
  \input{result_table.tex}
  \captionof{table}{Results of multiplication according to simulated multipliers.}
  \label{result_table}
\end{figure}

\lstdefinestyle{mystyle}{
    backgroundcolor=\color{backcolour},
    commentstyle=\color{codegreen},
    keywordstyle=\color{magenta},
    numberstyle=\tiny\color{codegray},
    stringstyle=\color{codepurple},
    basicstyle=\ttfamily\footnotesize,
    breakatwhitespace=false,
    breaklines=true,
    captionpos=b,
    keepspaces=true,
    numbers=left,
    numbersep=5pt,
    showspaces=false,
    showstringspaces=false,
    showtabs=false,
    tabsize=2
}


\lstset{style=mystyle}
\newpage
\subsection*{Code listing}
\lstinputlisting[language=Python]{../booth_multiplier.py}
\begin{sidewaysfigure}
  \captionof{table}{Simulator self checking}
  \label{debug_table}
\input{debug_table.tex}
\end{sidewaysfigure}


\end{document}