report/report.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201

\documentclass[12pt]{article}
\usepackage{geometry}
\usepackage{titling}
\usepackage{helvet}
\usepackage{tgpagella} % text only
\usepackage[utf8]{inputenc}
\usepackage{booktabs}
\usepackage{float}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{layouts}
\usepackage{pdflscape}
\usepackage{rotating}
\usepackage{longtable}

\usepackage{pgf}

\newcommand{\mathdefault}[1][]{}

\geometry{
  a4paper,
  lmargin=1in,
  rmargin=1in,
  tmargin=1in,
  bmargin=1in,
}
\setlength{\droptitle}{-3em}   % This is your set screw

\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.95,0.95,0.92}

\title{\fontfamily{phv}\selectfont
Analyzing Performance of Booth’s Algorithm and Modified Booth’s Algorithm}
\author{Brett Weiland}

\begin{document}
\maketitle
\begin{abstract}
In this paper, the performance of Booth’s algorithm is compared to modified Booth's algorithm. Each multiplier is simulated in Python. The multipliers are bench marked by counting the number of add and subtract operations for inputs of various lengths. Results are analyzed and discussed to highlight the potential tradeoffs one should consider when deciding what multiplier is to be used.
\end{abstract}
\section*{Introduction}
Multiplication is among the most time consuming mathematical operations for processors. In many applications, the time it takes to multiply dramatically influences the speed of the program. Applications of digital signal processing (such as audio modification and image processing) require constant multiply and accumulate operations for functions such as fast fourier transformations and convolutions. Other applications are heavily dependent on multiplying large matrices, such as machine learning, 3D graphics and data analysis. In such scenarios, the speed of multiplication is vital. Consequently, most modern processors implement hardware multiplication. However, not all hardware multiplication schemes are equal; there is often a stark contrast between performance and hardware complexity. To further complicate things, multiplication circuits perform differently depending on what numbers are being multiplied.
\section*{Algorithm Description}
Booth's algorithm computes the product of two signed numbers in two's compliment format. To avoid overflow, the result is placed into a register two times the size of the operands (or two registers the size of a single operand). Additionally, the algorithm must work with a space that is extended one bit more then the result. For the purpose of brevity, the result register and extra bit will be referred to as the workspace, as the algorithm uses this space for its computations. First, the multiplier is placed into the workspace and shifted left by 1. From there, the multiplicand is used to either add or subtract from the upper half of the workspace. The specific action is dependent on the last two bits of the workspace.
\begin{table}[H]
  \centering
\begin{tabular}{lll}
\toprule
Bit 1 & Bit 0 & Action   \\ 
\midrule
0     & 0     & None     \\
0     & 1     & Add      \\
1     & 0     & Subtract \\
1     & 1     & None     \\
\bottomrule
\end{tabular}
\end{table}
After all iterations are complete, the result is arithmetically shifted once to the right, and the process repeats for the number of bits in an operand. 
\par
Modified Booth's algorithm functions similar to Booth's algorithm, but checks the last \textit{three} bits instead. As such, there are a larger selection of actions for each iteration:
\begin{table}[H]
  \centering
\begin{tabular}{llll}
\toprule
Bit 2 & Bit 1 & Bit 0 & Action \\
\midrule
0     & 0     & 0     & None   \\
0     & 0     & 1     & Add    \\
0     & 1     & 0     & Add    \\
0     & 1     & 1     & Add $\times 2$ \\
1     & 0     & 0     & Sub $\times 2$ \\
1     & 0     & 1     & Sub    \\
1     & 1     & 0     & Sub    \\
1     & 1     & 1     & None   \\
\bottomrule
\end{tabular}
\end{table}
Because some operations require doubling the multiplicand, an additional extra bit is added to the most significant side of the workspace to avoid overflow. After each iteration, the result is arithmetically shifted right twice. The number of iterations is only half of the length of the operands. After all iterations, the workspace is shifted right once, and the second most significant bit is set to the first most significant bit as the result register does not include the extra bit.

\par
\section*{Simulation Implementation}
Both algorithms were simulated in Python in attempts to utilize its high level nature for rapid development. The table for Booth's algorithm was preformed with a simple if-then, while a switch case was used in modified Booth's algorithm. Simple integers were used to represent registers. 
\par
One objective of this paper is to analyze and compare the performance of these two algorithms for various operand lengths. As such, the length of operands had to be constantly accounted for. Arithmetic bitwise operations, including finding two's compliment, were all implemented using functions that took length as an input. Further more, extra bits were cleared after each iteration.
\par
To track down issues and test the validity of the multipliers, a debug function was written. To allow Python to natively work with the operands, each value is calculated from its two's compliment format. The converted numbers are then multiplied, and the result is used to verify both Booth's Algorithm and Modified Booth's Algorithm. To ensure that the debugging function itself doesn't malfunction, all converted operands and expected results are put into a single large table for checking. The exported version of this table can be seen on the last page  in table \ref{debug_table}. % TODO

The pseudo code below illustrates how each algorithm was implemented in software. For the full code, refer to the listing at the end of the document.\\
\begin{verbatim}
Booth:
  result = multiplier << 1
  loop (operand length) times:
    if last two bits are 01:
      result(upper half) += multiplicand
    if last two bits are 10:
      result(upper half) += twos_comp(multiplicand)
    remove extra bits from result
    arithmatic shift result right
result >> 1
\end{verbatim}
\begin{verbatim}
Modified booth:
  multiplicand(MSB) = multiplicand(second MSB)
  result = multiplier << 1
  loop (operand length / 2) times:
    if last two bits are 001 or 010:
      result(upper half) += multiplicand
    if last two bits are 011:
      result(upper half) += multiplicand * 2
    if last two bits are 100:
      result(upper half) += twos_comp(multiplicand) * 2
    if last two bits are 101 or 110:
      result(upper half) += twos_comp(multiplicand)
    remove extra bits from result
    arithmatic shift result right twice
  result >> 1
  result(second MSB) = result(MSB)
  result(MSB) = 0
\end{verbatim}

\section*{Analysis}
Modified Booth's algorithm only requires half the iterations of Booth's algorithm. As such, it can be expected that the benefit of modified Booth's algorithm increases two fold with bit length. This can be shown by comparing the two curves in figure \ref{igraph}. 
\begin{figure}[h]
  \centering
  \input{iterations.pgf}\\
    \captionof{figure}{Iteration count of various operand lengths.}
    \label{igraph} 
\end{figure}
  
\par
Despite this, the nature of both algorithms dictate that modified Booth's algorithm is not explicitly faster. Iteration count translates to the \textit{maximum} number of additions and subtractions. Figure \ref{pgraph} shows the performance of the two algorithms given different input lengths, while table \ref{speed_table} shows the actual data used to generate the plot. There are some interesting things to note. When operands contain repeating zeros or ones, both operations preform similarly, as only shifting is required. Operands containing entirely ones or zeros result in identical performance. On the contrary, alternating bits within operands demonstrate where the two algorithms differ, as almost no bits can be skipped over. Operands made entirely of alternating bits result in the maximum performance difference, in which modified Booth's algorithm is up to two times faster.
\begin{figure}[H]
  \centering
    \input{performance.pgf}\\
    \captionof{figure}{Add and Subtract operations of various operand lengths.}
    \label{pgraph} 
\end{figure}
\par
All of this needs to be considered when deciding between the two algorithms. Modified Booth's algorithm may improve speed, but requires substantially more hardware to implement. One must consider if it's worth the cost to optimize multiplication. In many applications, fast multiplication is unnecessary; many early single-chip processors and microcontrollers didn't implement multiplication, as they were intended for simple embedded applications.
\section*{Conclusion}
Hardware multipliers can help accelerate applications in which multiplication is frequent. When implementing hardware multipliers, it's important to consider the advantages and disadvantages of various multiplier schemes. Modified Booth's algorithm gives diminishing returns for smaller operands and requires significantly more logic. In applications that depend heavily on fast multiplication of large numbers, modified Booth's algorithm is optimal.
% mba generally but not always faster 
% application should be considered
% 

\newpage
\section*{Appendix}

% efficiency gets comparitively better over length
% not much for the smaller operands
% lots of repeated 1s and 0s very good for both algorithims
\begin{figure}[h]
  \centering
  \input{speed_table.tex}
  \captionof{table}{Number of additions and subtractions for various inputs.}
  \label{speed_table}
\end{figure}
\begin{figure}[H]
  \input{result_table.tex}
  \captionof{table}{Results of multiplication according to simulated multipliers.}
  \label{result_table}
\end{figure}

\lstdefinestyle{mystyle}{
    backgroundcolor=\color{backcolour},
    commentstyle=\color{codegreen},
    keywordstyle=\color{magenta},
    numberstyle=\tiny\color{codegray},
    stringstyle=\color{codepurple},
    basicstyle=\ttfamily\footnotesize,
    breakatwhitespace=false,
    breaklines=true,
    captionpos=b,
    keepspaces=true,
    numbers=left,
    numbersep=5pt,
    showspaces=false,
    showstringspaces=false,
    showtabs=false,
    tabsize=2
}


\lstset{style=mystyle}
\newpage
\subsection*{Code listing}
\lstinputlisting[language=Python]{../booth_multiplier.py}
\begin{sidewaysfigure}
  \captionof{table}{Simulator self checking}
  \label{debug_table}
\input{debug_table.tex}
\end{sidewaysfigure}


\end{document}