diff options
author | Brett Weiland <brett_weiland@gmail.com> | 2024-04-17 18:49:39 -0500 |
---|---|---|
committer | Brett Weiland <brett_weiland@gmail.com> | 2024-04-17 18:49:39 -0500 |
commit | 5c7cf02fd7433c28ad2503e624215c18e944246b (patch) | |
tree | 0d0f65ca40ed6aae30df72f1a9b64464c7cfd8b8 | |
parent | 1f37ade7b2f97d96855316248724d791eec28ad1 (diff) |
-rw-r--r-- | report/report.log | 4 | ||||
-rw-r--r-- | report/report.pdf | bin | 232746 -> 232799 bytes | |||
-rw-r--r-- | report/report.tex | 29 |
3 files changed, 17 insertions, 16 deletions
diff --git a/report/report.log b/report/report.log index 2484a5d..1edf67e 100644 --- a/report/report.log +++ b/report/report.log @@ -1,4 +1,4 @@ -This is pdfTeX, Version 3.141592653-2.6-1.40.26 (TeX Live 2024/Arch Linux) (preloaded format=pdflatex 2024.4.11) 12 APR 2024 16:13 +This is pdfTeX, Version 3.141592653-2.6-1.40.26 (TeX Live 2024/Arch Linux) (preloaded format=pdflatex 2024.4.11) 12 APR 2024 16:27 entering extended mode restricted \write18 enabled. %&-line parsing enabled. @@ -671,7 +671,7 @@ tt12.pfb></usr/share/texmf-dist/fonts/type1/public/tex-gyre/qplb.pfb></usr/shar e/texmf-dist/fonts/type1/public/tex-gyre/qplr.pfb></usr/share/texmf-dist/fonts/ type1/public/tex-gyre/qplri.pfb></usr/share/texmf-dist/fonts/type1/urw/helvetic /uhvr8a.pfb> -Output written on report.pdf (11 pages, 232746 bytes). +Output written on report.pdf (11 pages, 232799 bytes). PDF statistics: 90 PDF objects out of 1000 (max. 8388607) 58 compressed objects within 1 object stream diff --git a/report/report.pdf b/report/report.pdf Binary files differindex 9944465..c132610 100644 --- a/report/report.pdf +++ b/report/report.pdf diff --git a/report/report.tex b/report/report.tex index 5699bab..27b8f8b 100644 --- a/report/report.tex +++ b/report/report.tex @@ -40,12 +40,12 @@ Analyzing Performance of Booth’s Algorithm and Modified Booth’s Algorithm} \begin{document} \maketitle \begin{abstract} -In this paper, the performance of Booth’s algorithm is compared to modified Booth's algorithm. Each multiplier is simulated in Python. The multipliers are benchmarked by counting the number of add and subtract operations for inputs of various lengths. Results are analyzed and discussed to highlight the potential tradeoffs one should consider when deciding what multiplier is to be used. +In this paper, the performance of Booth’s algorithm is compared to modified Booth's algorithm. Each multiplier is simulated in Python. The multipliers are bench marked by counting the number of add and subtract operations for inputs of various lengths. Results are analyzed and discussed to highlight the potential tradeoffs one should consider when deciding what multiplier is to be used. \end{abstract} \section*{Introduction} Multiplication is among the most time consuming mathematical operations for processors. In many applications, the time it takes to multiply dramatically influences the speed of the program. Applications of digital signal processing (such as audio modification and image processing) require constant multiply and accumulate operations for functions such as fast fourier transformations and convolutions. Other applications are heavily dependent on multiplying large matrices, such as machine learning, 3D graphics and data analysis. In such scenarios, the speed of multiplication is vital. Consequently, most modern processors implement hardware multiplication. However, not all hardware multiplication schemes are equal; there is often a stark contrast between performance and hardware complexity. To further complicate things, multiplication circuits perform differently depending on what numbers are being multiplied. \section*{Algorithm Description} -Booth's algorithim computes the product of two signed numbers in two's compliment format. To avoid overflow, the result is placed into a register two times the size of the operands (or two registers the size of a single operand). Additionally, the algorithim must work with a space that is exended one bit more then the result. For the purpose of brevity, the result register and extra bit will be refered to as the workspace, as the algorithim uses this space for its computations. First, the multiplier is placed into the workspace and shifted left by 1. From there, the multiplicand is used to either add or subtract from the upper half of the workspace. The specific action is dependent on the last two bits of the workspace. +Booth's algorithm computes the product of two signed numbers in two's compliment format. To avoid overflow, the result is placed into a register two times the size of the operands (or two registers the size of a single operand). Additionally, the algorithm must work with a space that is extended one bit more then the result. For the purpose of brevity, the result register and extra bit will be referred to as the workspace, as the algorithm uses this space for its computations. First, the multiplier is placed into the workspace and shifted left by 1. From there, the multiplicand is used to either add or subtract from the upper half of the workspace. The specific action is dependent on the last two bits of the workspace. \begin{table}[H] \centering \begin{tabular}{lll} @@ -59,9 +59,9 @@ Bit 1 & Bit 0 & Action \\ \bottomrule \end{tabular} \end{table} -After all iterations are complete, the result is arithmaticlly shifted once to the right, and the process repeats for the number of bits in an operand. +After all iterations are complete, the result is arithmetically shifted once to the right, and the process repeats for the number of bits in an operand. \par -Modified booth's algorithim functions similar to Booth's algorithim, but checks the last \textit{three} bits instead. As such, there are a larger selection of actions for each iteration: +Modified Booth's algorithm functions similar to Booth's algorithm, but checks the last \textit{three} bits instead. As such, there are a larger selection of actions for each iteration: \begin{table}[H] \centering \begin{tabular}{llll} @@ -79,17 +79,17 @@ Bit 2 & Bit 1 & Bit 0 & Action \\ \bottomrule \end{tabular} \end{table} -Because some operations require doubling the multiplicand, an additional extra bit is added to the most significant side of the workspace to avoid overflow. After each iteration, the result is arithmaticlly shifted right twice. The number of iterations is only half of the length of the operands. After all iterations, the workspace is shifted right once, and the second most significant bit is set to the first most significant bit as the result register does not include the extra bit. +Because some operations require doubling the multiplicand, an additional extra bit is added to the most significant side of the workspace to avoid overflow. After each iteration, the result is arithmetically shifted right twice. The number of iterations is only half of the length of the operands. After all iterations, the workspace is shifted right once, and the second most significant bit is set to the first most significant bit as the result register does not include the extra bit. \par -\section*{Simulation Implimentation} -Both algorithims were simulated in Python in attempts to utalize its high level nature for rapid development. The table for Booth's algorithim was preformed with a simple if-then, while a switch case was used in modified booth's algorithim. Simple integers were used to represent registers. +\section*{Simulation Implementation} +Both algorithms were simulated in Python in attempts to utilize its high level nature for rapid development. The table for Booth's algorithm was preformed with a simple if-then, while a switch case was used in modified Booth's algorithm. Simple integers were used to represent registers. \par -One objective of this paper is to analyze and compare the peformance of these two algorithms for various operand lengths. As such, the length of operands had to be constantly accounted for. Aritmatic bitwise operations, including finding two's compliment, were all implimented using functions that took length as an input. Further more, extra bits were cleared after each iteration. +One objective of this paper is to analyze and compare the performance of these two algorithms for various operand lengths. As such, the length of operands had to be constantly accounted for. Arithmetic bitwise operations, including finding two's compliment, were all implemented using functions that took length as an input. Further more, extra bits were cleared after each iteration. \par -To track down issues and test the validity of the multipliers, a debug function was written. To allow Python to natively work with the operands, each value is calculated from its two's compliment format. The converted numbers are then multiplied, and the result is used to verify both Booth's Algorithim and Modified Booth's Algorithim. To ensure that the debugging function itself doesn't malfunction, all converted operands and expected results are put into a single large table for checking. The exported version of this table can be seen on the last page in table \ref{debug_table}. % TODO +To track down issues and test the validity of the multipliers, a debug function was written. To allow Python to natively work with the operands, each value is calculated from its two's compliment format. The converted numbers are then multiplied, and the result is used to verify both Booth's Algorithm and Modified Booth's Algorithm. To ensure that the debugging function itself doesn't malfunction, all converted operands and expected results are put into a single large table for checking. The exported version of this table can be seen on the last page in table \ref{debug_table}. % TODO -The pseudo code below illustrates how each algorithim was implimented in software. For the full code, refer to the listing at the end of the document.\\ +The pseudo code below illustrates how each algorithm was implemented in software. For the full code, refer to the listing at the end of the document.\\ \begin{verbatim} Booth: result = multiplier << 1 @@ -123,7 +123,7 @@ Modified booth: \end{verbatim} \section*{Analysis} -Modified Booth's algorithim only requires half the iterations of Booth's algorithim. As such, it can be expected that the benifit of modified Booth's algorithim increases two fold with bit length. This can be shown by comparing the two curves in figure \ref{igraph}. +Modified Booth's algorithm only requires half the iterations of Booth's algorithm. As such, it can be expected that the benefit of modified Booth's algorithm increases two fold with bit length. This can be shown by comparing the two curves in figure \ref{igraph}. \begin{figure}[h] \centering \input{iterations.pgf}\\ @@ -132,7 +132,7 @@ Modified Booth's algorithim only requires half the iterations of Booth's algorit \end{figure} \par -Despite this, the nature of both algorithims dictate that modified booth's algorithim is not explicitly faster. Iteration count translates to the \textit{maxiumum} number of additions and subtractions. Figure \ref{pgraph} shows the performance of the two algorithims given different input lengths, while table \ref{speed_table} shows the actual data used to generate the plot. There are some interesting things to note. When operands contain repeating zeros or ones, both operations preform similarly, as only shifting is required. Operands containing entirely ones or zeros result in idential preformance. On the contrary, alternating bits within operands demonstrate where the two algorithims differ, as almost no bits can be skipped over. Operands made entirely of alternating bits result in the maximum performance diffrence, in which modified booth's algorithim is up to two times faster. +Despite this, the nature of both algorithms dictate that modified Booth's algorithm is not explicitly faster. Iteration count translates to the \textit{maximum} number of additions and subtractions. Figure \ref{pgraph} shows the performance of the two algorithms given different input lengths, while table \ref{speed_table} shows the actual data used to generate the plot. There are some interesting things to note. When operands contain repeating zeros or ones, both operations preform similarly, as only shifting is required. Operands containing entirely ones or zeros result in identical performance. On the contrary, alternating bits within operands demonstrate where the two algorithms differ, as almost no bits can be skipped over. Operands made entirely of alternating bits result in the maximum performance difference, in which modified Booth's algorithm is up to two times faster. \begin{figure}[H] \centering \input{performance.pgf}\\ @@ -140,9 +140,9 @@ Despite this, the nature of both algorithims dictate that modified booth's algor \label{pgraph} \end{figure} \par -All of this needs to be considered when deciding between the two algorithims. Modified booth's algorithim may improve speed, but requires substantially more hardware to impliment. One must consider if it's worth the cost to optimize multiplication. In many applications, fast multiplication is unnessesary; many early single-chip processors and microcontrollers didn't impliment multiplication, as they were intended for simple embeded applications. +All of this needs to be considered when deciding between the two algorithms. Modified Booth's algorithm may improve speed, but requires substantially more hardware to implement. One must consider if it's worth the cost to optimize multiplication. In many applications, fast multiplication is unnecessary; many early single-chip processors and microcontrollers didn't implement multiplication, as they were intended for simple embedded applications. \section*{Conclusion} -Hardware multipliers can help accellerate applications in which multiplication is frequent. When implimenting hardware multipliers, it's important to consider the advantages and disadvantages of various multiplier schemes. Modified Booth's algorithim gives diminishing returns for smaller operands and requires significantly more logic. In applications that depend heavily on fast multiplication of large numbers, modified booth's algorithim is optimal. +Hardware multipliers can help accelerate applications in which multiplication is frequent. When implementing hardware multipliers, it's important to consider the advantages and disadvantages of various multiplier schemes. Modified Booth's algorithm gives diminishing returns for smaller operands and requires significantly more logic. In applications that depend heavily on fast multiplication of large numbers, modified Booth's algorithm is optimal. % mba generally but not always faster % application should be considered % @@ -198,3 +198,4 @@ Hardware multipliers can help accellerate applications in which multiplication i \end{document} + |