by Raymond Filiatreault
Copyright 2005
Latest revision April 2005


The aad instruction is used to adjust the content of the AX register before that register is used to perform the division of two unpacked BCDs by another unpacked BCD digit. The CPU uses the following logic:

   al = ah*10 + al
   ah = 0
The BCD digits must all be in their binary format for this instruction. The ASCII characters would yield erroneous results.

The original intent of this instruction was to be able to divide by a single unpacked BCD digit, i.e. values of 9 or lower. Remainders being lower than 9 and located in the AH register after a division of the AX register by a byte, the maximum value stored in the AL register by the aad instruction before the next division could only be 8*10+9 (=89).

The following example only shows an application for a small number but could be expanded to any length for the number to be divided.

   num       db   7,6,4,3,5,8  ;853467 in reverse order
   numsize   dd   6
   divisor   db   7
   answer    db   8 dup(0) ;answer to be stored in string order for display

   mov   edi,offset answer
   mov   esi,offset num
   mov   ecx,numsize
   add   esi,ecx           ;points to byte following most significant digit
   mov   ah,0              ;initialize "remainder"
   dec   esi
   mov   al,[esi]
   div   divisor           ;quotient -> AL, remainder -> AH
   add   al,30h            ;convert result to ASCII
   stosb                   ;store answer in string order
   dec   ecx
   jnz   @B
   mov   [edi],cl          ;terminate string with 0

As can be observed from the CPU logic, any value in AH higher that 25, when multiplied by 10, would exceed the 255 limit of the AL register. The resulting correct multiplication value in the AX register would then become erroneous when the AH register gets zeroed. Even a value of 25 in the AH register combined with a value exceeding 5 in the AL register would also result in an erroneous value in the AX register. However, this also means that integer values up to 25 inclusive could be used as the divisor (24 being the maximum remainder) and still be able to use the aad instruction.

The following example takes advantage of that flexibility to compute the value of the Napierian constant with up to 25 significant digits. That constant is defined as:

      e = sum(1/n!) = 1/0! + 1/1! + 1/2! + 1/3! + ...
By convention, 0! is equal to 1. The answer buffer is initialized with the sum of the first 2 members of the equation (1+1) and the decimal delimiter to save some coding. The txtbuf is used to hold the computed 1/n! and is also initialized along with the divisor for the same reason. The result of computing the next 1/n! overwrites the previous 1/n! in the same txtbuf. Each computed 1/n! is then added to the answer buffer.
   answer      db    "2.",30 dup(0)
   txtbuf      db    1,31 dup(0)    ;initialize to 1/1!
   divisor     db    2              ;initialize to next n


; Divide the current 1/n! by the next n

   mov   edi,offset txtbuf
   mov   ecx,30
   xor   eax,eax                ;zeroes AH
   mov   al,[edi]               ;get BCD digit
   aad                          ;convert digits in AH/AL to an integer in AX
   div   divisor                ;divide by a byte quotient->AL remainder->AH
   stosb                        ;store quotient and advance to next digit
   dec   ecx
   jnz   @B

; Add that result to the answer

   mov   edi,offset answer+30   ;point to least significant digit
   mov   esi,offset txtbuf+29   ;   idem
   mov   ecx,29                 ;for the 29 fractional digits
   mov   al,[esi]
   adc   al,[edi]
   mov   [edi],al
   dec   esi
   dec   edi
   dec   ecx
   jnz   @B

   inc divisor
   cmp   divisor,25
   jbe   start                  ;until 1/25! has been computed and added

; Convert the answer to ASCII characters

   mov   edi,offset answer+2
   mov   ecx,6
   mov   eax,[edi]
   add   eax,30303030h
   dec   ecx
   jnz   @B
   mov   [edi],cl

; Answer ready for display

When the divisor exceeds 25, the only solution is to simulate the action of the aad instruction but without zeroing any partial register. This would allow integer divisors as large as can be held in 32 bits, i.e. 4,294,967,295. Some additional precautions must also be taken. The following code could be adapted.

; The divisor is a dword integer memory variable
; The dw10 is a dword memory variable set to 10
; "number" is the memory address of the buffer where the BCD number is
; located
; "answer" is the memory address of the buffer where the BCD result must
; be stored

   mov   esi,number+?         ;must point initially to most significant digit
   mov   edi,answer+?         ;must point to large enough buffer
   mov   ecx,number_size
   xor   edx,edx              ;initialize for remainder
   mov   eax,edx
   mul   dw10
   movzx ebx,byte ptr[esi]    ;get the current BCD digit
   add   eax,ebx
   adc   edx,0                ;for any overflow of the addition
   div   divisor              ;quotient -> EAX, remainder -> EDX
   mov   [edi],al             ;only one digit at a time gets computed
   inc/dec esi                ;depending on how BCDs are stored
   inc/dec edi                ;    idem
   dec   ecx
   jnz   @B
With the mul and div instructions being quite slow, the above code may not be any faster than the subtraction method shown with the aas instruction, specially if the divisor is already in the BCD format and would require to be converted to a binary integer.

For divisors larger than 4,294,967,295, the subtraction method shown with the aas instruction would be necessary for the division of unpacked BCDs until 64-bit registers become available; the higher limit on the divisor would then become approximately 2×1019.

The described aad instruction is a special case of the more generalized function available on the CPU. The machine code for aad is:

   D5 0A
The imm8 value following the D5 first byte is effectively taken to multiply the content of the AH register and add it to the content of the AL register. This means that two unpacked digits in any other numeric base could also be converted to a binary value, ready for division or whatever other reason.

However, no mnemonics are available for bases other than 10. The instruction would then need to be hand coded. For example, if the unpacked digits were in base 8, hand coding such as follows would be required:

   db   0D5h,8