|
B & I SNOWDEN-Find a Wealth of Products and Services
|
Executing MMX™ Instructions A programmer must approach the use of MMX instructions differently, based on whether the code being developed is at the system level or at the application level. The details of these differences are discussed in “Programming Considerations” on page 9 .Before using the MMX instructions, the programmer must use the CPUID instruction to determine if the processor supports multimedia technology. See the AMD Processor RecognitionApplication Note , order# 20734, for more information.Function 1 (EAX=1) of the AMD-K6 processor CPUID instruction returns the processor feature bits in the EDX register. Software can then test bit 23 of the feature bits to determine if the processor supports the multimedia technology. If bit 23 is set to 1, MMX instructions are supported. All AMD-K6 processors have bit 23 set. Once it is determined that multimedia technology is supported, subsequent code can use the MMX instructions. Alternatively, the AMD 8000_0001h extended CPUID function can be used to test whether the processor supports multimedia technology. After a module of MMX code has executed, the programmer must empty the MMX state by executing the EMMS command. Because the MMX registers share the floating-point registers, an instruction is needed to prevent MMX code from interfering with floating-point. The EMMS command clears the multimedia state and resets all the floating-point tag bits. Emptying the MMX state sets the floating-point tag bits to empty (all ones), which marks the MMX/FP registers as invalid and available. Register Set The AMD-K6 processor implements eight new 64-bit MMX registers. These registers are mapped on the floating-point registers. As shown in Figure 1 on page 5, the new MMX instructions refer to these registers as mmreg0 to mmreg7. Mapping the new MMX registers on the floating-point stack enables backwards compatibility for the register saving that must occur as a result of task switching. AMD-K6™ Processor Multimedia Technology 520726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information Figure 1. MMX™ Registers Aliasing the MMX registers onto the floating-point stack registers provides a safe way to introduce this new technology. Instead of needing to modify operating systems, new MMX applications can be supported through device drivers, MMX libraries, or DLL files. See the Programming Considerationssection of this document for more information. Current operating systems have support for floating-point operations. Using the floating-point registers for MMX code is an ingenious way of implementing automatic support for MMX instructions. Every time the processor executes an MMX instruction, all the floating-point register tag bits are set to zero (00b=valid). Setting the tag bits after every MMX instruction prevents the processor from having to perform extra tasks. These extra tasks are normally executed on floating-point registers when the Tag field is something other than 00b. If a task switch occurs during an MMX or floating-point instruction, the Control Register (CR0) Task Switch (TS) bit is set to 1. The processor then generates an interrupt 7 (int 7 Device Not Available) when it encounters the next floating-point or MMX instruction, allowing the operating system to save the state of the MMX/FP registers. TAG BITS 63 0mmreg0 mmreg7 mmreg1 mmreg6 mmreg5 mmreg2 mmreg3 mmreg4 xx xx xx xx xx xx xx xx 6 AMD-K6™ Processor Multimedia TechnologyAMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726D/0—January 2000Preliminary Information If there is a task switch when MMX applications are running with older applications that do not include MMX instructions, the MMX/FP register state is still saved automatically through the int 7 handler. Data Types The AMD-K6 processor multimedia technology uses a packed data format. The data is packed in a single, 64-bit MMX register or memory operand as eight bytes, four words, or two double words. Each byte, word, doubleword, or quadword is an integer data type. The form of an instruction determines the data type. For example, the MOV instruction comes in two different forms— MOVD moves 32 bits of data and MOVQ moves 64 bits of data. The four new data types are defined as follows: Packed byte Eight 8-bit bytes packed into 64 bits Signed integer range(–2 7 to 27–1)Unsigned integer range(0 to 2 8–1)Packed word Four 16-bit words packed into 64-bits Signed integer range(–2 15to 215–1)Unsigned integer range(0 to 2 16–1)Packed Two 32-bit doublewords packed into 64 bits doubleword Signed integer range(–2 31 to 231–1)Unsigned integer range(0 to 2 32–1)Quadword One 64-bit quadword Signed integer range(–2 63 to 263–1)Unsigned integer range(0 to 2 64–1)Figure 2 on page 7 shows the four new data types. AMD-K6™ Processor Multimedia Technology 720726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information Figure 2. MMX™ Data Types Instructions The AMD-K6 processor multimedia technology includes 57 new MMX instructions. These new instructions are organized into the following groups: n Arithmeticn Empty MMX registersn Comparen Convert (pack/unpack)n Logicaln Moven ShiftThe following mnemonics are used in the instructions: n P—Packed datan B—Byten W—Wordn D—Doublewordn Q—Quadwordn S—Signed63 56 55 47 63 39 31 23 15 7 47 63 63 31 15 48 40 32 24 16 0 0 32 48 32 16 0 0 8 31 (8 bits x 8) Packed bytes (16 bits x 4) Packed words (32 bits x 2) Packed double words (64 bits x 1) Quadword B2 B1 B4 B3 B5 B0 B6 B7 W0 W1 W2 W3 D0 D1 Q0 8 AMD-K6™ Processor Multimedia TechnologyAMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726D/0—January 2000Preliminary Information n U—Unsignedn SS—Signed Saturationn US—Unsigned SaturationFor example, the mnemonic for the PACK instruction that packs four words into eight unsigned bytes is PACKUSWB. In this mnemonic, the US designates an unsigned result with saturation, and the WB means that the source is packed words and the result is packed bytes. The term saturation is commonly used in multimediaapplications. Saturation allows mathematical limits to be placed on the data elements. If a result exceeds the boundary of that data type, the result is set to the defined limit for that instruction. A common use of saturation is to prevent color wraparound. Instruction Formats All MMX instructions, except the EMMS instruction that uses no operands, are formatted as follows: INSTRUCTION mmreg1, mmreg2/mem64 The source operand (mmreg2/mem64) can be either an MMX register or a memory location. The destination operand (mmreg1) can only be an MMX register. The MOVD and MOVQ instructions also have the following acceptable formats: MOVD mmreg1, mreg32/mem32 MOVD mreg32/mem32, mmreg1 MOVQ mem64, mmreg1 In the first example, the source operand (mreg32/mem32) can be either an integer register or a 32-bit memory address. The destination operand (mmreg1) can only be an MMX register. The second example has the source operand as an MMX register. The destination operand (mreg32/mem32) can be either an integer register or a 32-bit memory address. The third example has the source operand as an MMX register and the destination operand as a 64-bit memory location The SHIFT instructions can also utilize an immediate source operand. It is designated as imm8.PSRLW mmreg1, imm8 9 20726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information 2 Programming Considerations This chapter describes considerations for programmers writing operating systems, compilers, and applications that utilize MMX instructions as implemented in the AMD-K6 MMX enhanced processor. Feature Detection To use the AMD-K6 processor multimedia technology, the programmer must determine if the processor supports them. The CPUID instruction gives programmers the ability to determine the presence of multimedia technology on the processor. Software must first test to see if the CPUID instruction is supported. For a detailed description of the CPUID instruction, see the AMD Processor RecognitionApplication Note, order# 20734.The presence of the CPUID instruction is indicated by the ID bit (21) in the EFLAGS register. If this bit is writable, the CPUID instruction is supported. The following code sample shows how to test for the presence of the CPUID instruction. 10 Programming ConsiderationsAMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726D/0—January 2000Preliminary Information pushfd ; save EFLAGS pop eax ; store EFLAGS in EAX mov ebx, eax ; save in EBX for later testing xor eax, 00200000h ; toggle bit 21 push eax ; put to stack popfd ; save changed EAX to EFLAGS pushfd ; push EFLAGS to TOS pop eax ; store EFLAGS in EAX cmp eax, ebx ; see if bit 21 has changed jz NO_CPUID ; if no change, no CPUID If the processor supports the CPUID instruction, the programmer must execute the standard function, EAX=0. The CPUID function returns a 12-character string that identifies the processor’s vendor. For AMD processors, standard function 0 returns a vendor string of “Authentic AMD”. This string requires the software to follow the AMD definitions for subsequent CPUID functions and the values returned for those functions. The next step is for the programmer to determine if MMX instructions are supported. Function 1 of the CPUID instruction provides this information. Function 1 (EAX=1) of the AMD CPUID instruction returns the feature bits in the EDX register. If bit 23 in the EDX register is set to 1, MMX instructions are supported. The following code sample shows how to test for MMX instruction support. mov eax,1 ; setup function 1 CPUID ; call the function test edx, 800000 ; test 23rd bit jnz YES_MM ; multimedia technology supported Alternatively, the extended function 1 (EAX=8000_0001h) can be used to determine if MMX instructions are supported. mov eax,8000_0001h ; setup extended function 1 CPUID ; call the function test edx, 800000 ; test 23rd bit jnz YES_MM ; multimedia technology supported Programming Considerations 1120726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information Task Switching A task switch is an event that occurs within operating systems that allows multiple programs to be executed in parallel. Most modern operating systems utilizing task switching, are called multitasking operating systems. There are two types of multitasking operating systems— cooperative and preemptive. Cooperative Multitasking In cooperative multitasking operating systems, applications do not care about other tasks that may be running. Each task assumes that it owns the machine state (processor, registers, I/O, memory, etc.). In addition, these tasks must take care of saving their own information (i.e., registers, stacks, states) in their own memory areas. The cooperative multitasking operating system does not save operating state information for the applications. There are different types of cooperative multitasking operating systems. Some of these operating systems perform some level of state saves, but this state saving is not always reliable. All software engineers programming for a cooperative multitasking environment must save the MMX or floating-point states before relinquishing control to another task or to the operating system. The FSAVE and FRSTOR commands are used to perform this task. Figure 4 illustrates this task switching process. Note: Some cooperative operating systems may have API calls toperform these tasks for the application. Figure 3. Cooperative Task Switching PROGRAM MUST RESTORE STATES FRSTOR code executing code module finished PROGRAM MUST SAVE STATES FSAVE goto TASK 1 executing MMX™/FP code PROGRAM MUST RESTORESTATES FRSTOR executing code TASK 1 TASK 2 TASK 1 Task Switch to TASK 2 PROGRAM MUST SAVE STATES FSAVE Preemptive Multitasking In preemptive multitasking operating systems like OS/2, Windows NT™, and UNIX, the operating system handles all state and register saves. The application programmer does not need to save states when programming within a preemptive multitasking environment. The preemptive multitasking operating system sets aside a save area for each task. In a preemptive multitasking operating system, if a task switch occurs, the operating system sets the Control Register 0 (CR0) Task Switch (TS) bit to 1. If the new task encounters a floating-point or MMX instruction, an interrupt 7 (int 7, Device Not Available) is generated. The int7 handler saves the state of the first task and restores the state of the second task. The int7 handler sets the CR0.TS to 0 and returns to the original floating-point or MMX instruction in the second task. Figure 4 illustrates this task switching process. Figure 4. Preemptive Task Switching executing MMX™/FP code executing code Save Task 1 State Restore Task 2 Set CR0.TS=0 Return to Task 2 MMX/FP code TASK 1 TASK 2 INT 7 handler Task Switch to TASK 2 Set CR0.TS=1 Encounter MMX/FP code Because TS=1 goto INT 7 handler Programming Considerations 1320726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information Exceptions Table 1 contains a list of exceptions that MMX instructions can generate. The rules for exceptions have not changed in the implementation of MMX instructions. None of the exception handlers need to be modified. Note: 1. An invalid opcode exception interrupt 6 occurs if an MMX instruction is executed on a processor that does not support MMX instructions. 2. If a floating-point exception is pending and the processor encounters an MMX instruction, FERR# is asserted and, if CR0.NE = 1, an interrupt 16 is generated. Table 1. MMX™ Instruction Exceptions Exception Real Virtual 8086 Protected Description Invalid opcode (6) X X X The emulate MMX instruction bit (EM) of the control register (CR0) is set to 1. Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of the control register (CR0) is set to 1. Stack exception (12) X X X During instruction execution, the stack segment limit was exceeded. General protection (13) X During instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. Segment overrun (13) X X One of the instruction data operands falls outside the address range 00000h to 0FFFFh. Page fault (14) X X A page fault resulted from the execution of the instruction. Floating-point exception pending (16) X X X An exception is pending due to the floating-point execution unit. Alignment check (17) X X An unaligned memory reference resulted from the instruction execution, and the alignment mask bit (AM) of the control register (CR0) is set to 1. (In Protected Mode, CPL = 3.) 14 Programming ConsiderationsAMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726D/0—January 2000Preliminary Information Mixing MMX™ and Floating-Point Instructions The programmer must take care when writing code that contains both MMX and floating-point instructions. The MMX code modules should be separated from the floating-point code modules. All code of one type (MMX or floating-point code) should be grouped together as often as possible. To obtain the highest performance, routines should not contain any conditional branches at the end of loops that jump to code of a different type than the code that is currently being executed. In certain multimedia environments, floating-point and MMX instructions may be mixed. For example, if a programmer wants to change the viewing perspective of a three-dimensional scene, the perspective can be changed through transformation matrices using floating-point registers. The picture/pixel information is integer-based and requires MMX instructions to manipulate this information. Both MMX and floating-point instructions are required to perform this task. The software must clean up after itself at the end of an MMX code module. The EMMS instruction must be used at the end of an MMX code module to mark all floating-point registers as empty (11=empty/invalid). In cooperative multitasking operating systems, the EMMS instruction must be used when switching between tasks. Note: In some situations, experienced programmers can utilize theMMX registers to pass information between tasks. In these situations, the EMMS instruction is not required. The tag bits are affected by every MMX and floating-point instruction. After every MMX instruction except EMMS, all the tag bits in the floating-point tag word are set to 0. When the EMMS instruction is executed, all the tag bits in the tag word are set to 1. Prefixes All instructions in the x86 architecture translate to a binary value or opcode. This 1 or 2 byte opcode value is different for each instruction. If an instruction is two bytes long, the second byte is called the Mod R/M byte. The Mod R/M byte is used to further describe the type of instruction that is used. Programming Considerations 1520726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information The x86 opcode and the Mod R/M byte can also be followed by an SIB byte. This byte is used to describe the Scale, Index and Base forms of 32-bit addressing. The format of the x86 instruction allows for certain prefixes to be placed before each instruction. These prefixes indicate different types of command overrides. The MMX instructions follow these rules just like all the current existing instructions. This allows for an easy implementation into the x86 architecture. All of the rules that apply to the x86 architecture apply to MMX instructions, including accessing registers, memory, and I/O. Most opcode prefixes can be utilized while using MMX instructions. The following prefixes can be used with MMX instructions: n The Segment Override prefixes (2Eh/CS, 36h/SS, 3Eh/DS,26h/ES, 64h/FS, and 65h/GS) affect MMX instructions that contain a memory operand. n The LOCK prefix (F0h) triggers an invalid opcode exception(interrupt 6). n The Address Size Override prefix (67h) affects MMXinstructions that contain a memory operand. 16 Programming ConsiderationsAMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726D/0—January 2000Preliminary Information 17 20726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information 3 MMX™ Instruction Set The following MMX instruction definitions are in alphabetical order according to the instruction mnemonics. 18 MMX™ Instruction SetAMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726D/0—January 2000Preliminary Information EMMS mnemonic opcode description EMMS 0F 77h Clear the MMX state Privilege: none Registers Affected: MMX Flags Affected: none Exceptions Generated: The EMMS instruction is used to clear the MMX state following the execution of a block of code using MMX instructions. Because the MMX registers and tag words are shared with the floating-point unit, it is necessary to clear the state before executing code that includes floating-point instructions. Exception Real Virtual 8086 Protected Description Invalid opcode (6) X X X The emulate MMX instruction bit (EM) of the control register (CR0) is set to 1. Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of the control register (CR0) is set to 1. Floating-point exception pending (16) X X X An exception is pending due to the floating-point execution unit. MMX™ Instruction Set 1920726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information MOVD mnemonic opcode description MOVD mmreg1, reg32/mem32 0F 6Eh Copy a 32-bit value from the general purpose register or memory location into the MMX register MOVD reg32/mem32, mmreg1 0F 7Eh Copy a 32-bit value from the MMX register into the general purpose register or memory location Privilege: none Registers Affected: MMX Flags Affected: none Exceptions Generated: The MOVD instruction moves a 32-bit data value from an MMX register to a general purpose register or memory, or it moves the 32-bit data from a general purpose register or memory into an MMX register. If the 32-bit data to be moved is provided by an MMX register, the instruction moves bits 31–0 of the MMX register into the specified register or memory location. If the 32-bit data is being moved into an MMX register, the instruction moves the 32-bits of data into bits 31–0 of the MMX register and fills bits 63–32 with zeros. Related Instructions See the MOVQ instruction.Exception Real Virtual 8086 Protected Description Invalid opcode (6) X X X The emulate MMX instruction bit (EM) of the control register (CR0) is set to 1. Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of the control register (CR0) is set to 1. Stack exception (12) X During instruction execution, the stack segment limit was exceeded. General protection (13) X During instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. Segment overrun (13) X X One of the instruction data operands falls outside the address range 00000h to 0FFFFh. Page fault (14) X X A page fault resulted from the execution of the instruction. Floating-point exception pending (16) X X X An exception is pending due to the floating-point execution unit. Alignment check (17) X X An unaligned memory reference resulted from the instruction execution, and the alignment mask bit (AM) of the control register (CR0) is set to 1. (In Protected Mode, CPL = 3.) 20 MMX™ Instruction SetAMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726D/0—January 2000Preliminary Information MOVQ mnemonic opcode description MOVQ mmreg1, mmreg2/mem64 0F 6Fh Copy a 64-bit value from an MMX register or memory location into an MMX register MOVQ mmreg2/mem64, mmreg1 0F 7Fh Copy a 64-bit value from an MMX register into an MMX register or memory location Privilege: none Registers Affected: MMX Flags Affected: none Exceptions Generated: The MOVQ instruction moves a 64-bit data value from one MMX register to another MMX register or memory, or it moves the 64-bit data from one MMX register or memory to another MMX register. Copying data from one memory location to another memory location cannot be accomplished with the MOVQ instruction. Related Instructions See the MOVD instruction.Exception Real Virtual 8086 Protected Description Invalid opcode (6) X X X The emulate MMX instruction bit (EM) of the control register (CR0) is set to 1. Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of the control register (CR0) is set to 1. Stack exception (12) X During instruction execution, the stack segment limit was exceeded. General protection (13) X During instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. Segment overrun (13) X X One of the instruction data operands falls outside the address range 00000h to 0FFFFh. Page fault (14) X X A page fault resulted from the execution of the instruction. Floating-point exception pending (16) X X X An exception is pending due to the floating-point execution unit. Alignment check (17) X X An unaligned memory reference resulted from the instruction execution, and the alignment mask bit (AM) of the control register (CR0) is set to 1. (In Protected Mode, CPL = 3.) MMX™ Instruction Set 2120726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information PACKSSDW mnemonic opcode description PACKSSDW mmreg1, mmreg2/mem64 0F 6Bh Pack with saturation signed 32-bit operands into signed 16-bit results Privilege: none Registers Affected: MMX Flags Affected: none Exceptions Generated: The PACKSSDW instruction performs a pack and saturate operation on two signed 32-bit values in the first operand and two signed 32-bit values in the second operand. The four signed 16-bit results are placed in the specified MMX register. The pack operation is a data conversion. The PACKSSDW instruction converts or packs the four signed 32-bit values into four signed 16-bit values, applying saturating arithmetic. If the signed 32-bit value is less than –32768 (8000h), it saturates to –32768 (8000h). If the signed 32-bit value is greater than 32767 (7FFFh), it saturates to 32767 (7FFFh). All values between –32768 and 32767 are represented with their signed 16-bit value. The first operand must be an MMX register. In addition to providing the first operand, this MMX register is the location where the result of the pack and saturate operation is stored. The second operand can be an MMX register or a 64-bit memory location. Exception Real Virtual 8086 Protected Description Invalid opcode (6) X X X The emulate MMX instruction bit (EM) of the control register (CR0) is set to 1. Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of the control register (CR0) is set to 1. Stack exception (12) X During instruction execution, the stack segment limit was exceeded. General protection (13) X During instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. Segment overrun (13) X X One of the instruction data operands falls outside the address range 00000h to 0FFFFh. Page fault (14) X X A page fault resulted from the execution of the instruction. Floating-point exception pending (16) X X X An exception is pending due to the floating-point execution unit. Alignment check (17) X X An unaligned memory reference resulted from the instruction execution, and the alignment mask bit (AM) of the control register (CR0) is set to 1. (In Protected Mode, CPL = 3.) 22 MMX™ Instruction SetAMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726D/0—January 2000Preliminary Information Functional Illustration of the PACKSSDW Instruction The following list explains the functional illustration of the PACKSSDW instruction: n Bits 63–32 of the source operand (mmreg2/mem64) are packed into bits 63–48 ofthe destination operand (mmreg1). The result is saturated to the largest possible 16-bit negative number because the 32-bit negative source operand (8000_0002h) exceeds the capacity of the signed 16-bit destination operand. n Bits 31–0 of the source operand are packed into bits 47–32 of the destinationoperand. The result is saturated to the largest possible 16-bit positive number because the 32-bit positive source operand (0000_8000h) exceeds the capacity of the 16-bit destination operand. n Bits 63–32 of the destination operand are packed into bits 31–16 of the destinationoperand. The results are not saturated because the 32-bit negative source operand (FFFF_8002h) does not exceed the capacity of the 16-bit destination operand. n Bits 31–0 of the destination operand are packed into bits 15–0 of the destinationoperand. The results are not saturated because the 32-bit positive source operand (0000_01FCh) does not exceed the capacity of the 16-bit destination operand. Related Instructions See the PACKSSWB instruction.See the PACKUSWB instruction. See the PUNPCKHWD instruction. See the PUNPCKLWD instruction. 0000 8000 8000h 7FFFh 8002h 01FCh mmreg1 mmreg2/mem64 mmreg1 0 20726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information PACKSSDW mnemonic opcode description PACKSSDW mmreg1, mmreg2/mem64 0F 6Bh Pack with saturation signed 32-bit operands into signed 16-bit results Privilege: none Registers Affected: MMX Flags Affected: none Exceptions Generated: The PACKSSDW instruction performs a pack and saturate operation on two signed 32-bit values in the first operand and two signed 32-bit values in the second operand. The four signed 16-bit results are placed in the specified MMX register. The pack operation is a data conversion. The PACKSSDW instruction converts or packs the four signed 32-bit values into four signed 16-bit values, applying saturating arithmetic. If the signed 32-bit value is less than –32768 (8000h), it saturates to –32768 (8000h). If the signed 32-bit value is greater than 32767 (7FFFh), it saturates to 32767 (7FFFh). All values between –32768 and 32767 are represented with their signed 16-bit value. The first operand must be an MMX register. In addition to providing the first operand, this MMX register is the location where the result of the pack and saturate operation is stored. The second operand can be an MMX register or a 64-bit memory location. Exception Real Virtual 8086 Protected Description Invalid opcode (6) X X X The emulate MMX instruction bit (EM) of the control register (CR0) is set to 1. Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of the control register (CR0) is set to 1. Stack exception (12) X During instruction execution, the stack segment limit was exceeded. General protection (13) X During instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. Segment overrun (13) X X One of the instruction data operands falls outside the address range 00000h to 0FFFFh. Page fault (14) X X A page fault resulted from the execution of the instruction. Floating-point exception pending (16) X X X An exception is pending due to the floating-point execution unit. Alignment check (17) X X An unaligned memory reference resulted from the instruction execution, and the alignment mask bit (AM) of the control register (CR0) is set to 1. (In Protected Mode, CPL = 3.) 22 MMX™ Instruction SetAMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726D/0—January 2000Preliminary Information Functional Illustration of the PACKSSDW Instruction The following list explains the functional illustration of the PACKSSDW instruction: n Bits 63–32 of the source operand (mmreg2/mem64) are packed into bits 63–48 ofthe destination operand (mmreg1). The result is saturated to the largest possible 16-bit negative number because the 32-bit negative source operand (8000_0002h) exceeds the capacity of the signed 16-bit destination operand. n Bits 31–0 of the source operand are packed into bits 47–32 of the destinationoperand. The result is saturated to the largest possible 16-bit positive number because the 32-bit positive source operand (0000_8000h) exceeds the capacity of the 16-bit destination operand. n Bits 63–32 of the destination operand are packed into bits 31–16 of the destinationoperand. The results are not saturated because the 32-bit negative source operand (FFFF_8002h) does not exceed the capacity of the 16-bit destination operand. n Bits 31–0 of the destination operand are packed into bits 15–0 of the destinationoperand. The results are not saturated because the 32-bit positive source operand (0000_01FCh) does not exceed the capacity of the 16-bit destination operand. Related Instructions See the PACKSSWB instruction.See the PACKUSWB instruction. See the PUNPCKHWD instruction. See the PUNPCKLWD instruction. 0000 8000 8000h 7FFFh 8002h 01FCh mmreg1 mmreg2/mem64 mmreg1 0 0 0 63 63 63 0002h 8000h 31 32 31 32 31 32 47 48 15 16 0000 FFFF 8002h 01FCh Indicates a saturated value MMX™ Instruction Set 2320726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information PACKSSWB mnemonic opcode description PACKSSWB mmreg1, mmreg2/mem64 0F 63h Pack with saturation signed 16-bit operands into signed 8-bit results Privilege: none Registers Affected: MMX Flags Affected: none Exceptions Generated: The PACKSSWB instruction performs a pack and saturate operation on four signed 16-bit values in the first operand and four signed 16-bit values in the second operand. The eight signed 8-bit results are placed in the specified MMX register. The pack operation is a data conversion. The PACKSSWB instruction converts or packs the eight signed 16-bit values into eight signed 8-bit values, applying saturating arithmetic. If the signed 16-bit value is less than –128 (80h), it saturates to –128 (80h). If the signed 16-bit value is greater than 127 (7Fh), it saturates to 127 (7Fh). All values between –128 and 127 are represented by their signed 8-bit value. The first operand must be an MMX register. In addition to providing the first operand, this MMX register is the location where the result of the pack and saturate operation is stored. The second operand can be an MMX register or a 64-bit memory location. Exception Real Virtual 8086 Protected Description Invalid opcode (6) X X X The emulate MMX instruction bit (EM) of the control register (CR0) is set to 1. Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of the control register (CR0) is set to 1. Stack exception (12) X During instruction execution, the stack segment limit was exceeded. General protection (13) X During instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. Segment overrun (13) X X One of the instruction data operands falls outside the address range 00000h to 0FFFFh. Page fault (14) X X A page fault resulted from the execution of the instruction. Floating-point exception pending (16) X X X An exception is pending due to the floating-point execution unit. Alignment check (17) X X An unaligned memory reference resulted from the instruction execution, and the alignment mask bit (AM) of the control register (CR0) is set to 1. (In Protected Mode, CPL = 3.) 24 MMX™ Instruction SetAMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726D/0—January 2000Preliminary Information Functional Illustration of the PACKSSWB Instruction The following list explains the functional illustration of the PACKSSWB instruction: n Bits 63–48 of the source operand (mmreg2/mem64) are packed into bits 63–56 ofthe destination operand (mmreg1). The result is not saturated because the 16-bit positive source operand (007Eh) does not exceed the capacity of a signed 8-bit destination operand. n Bits 47–32 of the source operand are packed into bits 55–48 of the destinationoperand. The result is saturated to the largest possible 8-bit positive number because the 16-bit positive source operand (7F00h) exceeds the capacity of a signed 8-bit destination operand. n Bits 31–16 of the source operand are packed into bits 47–40 of the destinationoperand. The result is saturated to the largest possible 8-bit negative number because the 16-bit negative source operand (EF9Dh) exceeds the capacity of a signed 8-bit destination operand. n Bits 15–0 of the source operand are packed into bits 39–32 of the destinationoperand. The result is not saturated because the 16-bit negative source operand (FF88h) does not exceed the capacity of the 8-bit destination operand. n Bits 63–48 of the destination operand are packed into bits 31–24 of the destinationoperand. The result is saturated to the largest possible 8-bit negative number because the 16-bit negative source operand (FF02h) exceeds the capacity of a signed 8-bit destination operand. 00 mmreg1 mmreg2/mem64 mmreg1 0 0 0 63 63 63 7Eh 31 32 31 32 31 32 47 48 15 16 47 48 15 16 47 48 15 16 7 8 23 24 39 40 55 56 7F 00h EF 9Dh FF 88h FF 02h 00 85h 00 7Eh 81 CFh 7Eh 80h 80h 7Eh 7Fh 88h 7Fh 80h Indicates a saturated value MMX™ Instruction Set 2520726D/0—January 2000 AMD-K6™ MMX™ Enhanced Processor Multimedia TechnologyPreliminary Information n Bits 47–32 of the destination operand are packed into bits 23–16 of the destinationoperand. The result is saturated to the largest possible 8-bit positive number because the 16-bit positive source operand (0085h) exceeds the capacity of a signed 8-bit destination operand. n Bits 31–16 of the destination operand are packed into bits 15–8 of the destinationoperand. The result is not saturated because the 16-bit positive source operand (007Eh) does not exceed the capacity of a signed 8-bit destination operand. n Bits 15–0 of the destination operand are packed into bits 7–0 of the destinationoperand. The result is saturated to the largest possible 8-bit negative number because the 16-bit negative source operand (81CFh) exceeds the capacity of a signed 8-bit destination operand. Related Instructions See the PACKSSDW instruction.See the PACKUSWB instruction. See the PUNPCKHBW instruction. See the PUNPCKLBW instruction. 26 MMX™ Instruction SetAMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726D/0—January 2000Preliminary Information PACKUSWB mnemonic opcode description PACKUSWB mmreg1, mmreg2/mem64 0F 67h Pack with saturation signed16-bit operands into unsigned 8-bit results Privilege: none Registers Affected: MMX Flags Affected: none Exceptions Generated: The PACKUSWB instruction performs a pack and saturate operation on four signed 16-bit values in the first operand and four signed 16-bit values in the second operand. The eight unsigned 8-bit results are placed in the specified MMX register. The pack operation is a data conversion. The PACKUSWB instruction converts or packs the eight signed 16-bit values into eight unsigned 8-bit values, applying saturating arithmetic. If the signed 16-bit value is a negative number, it saturates to 0 (00h). If the signed 16-bit value is greater than 255 (FFh), it saturates to 255 (FFh). All values between 0 and 255 are represented with their unsigned 8-bit value. The first operand must be an MMX register. In addition to providing the first operand, this MMX register is the location where the result of the pack and saturate operation is stored. The second operand can be an MMX register or a 64-bit memory location. Exception Real Virtual 8086 Protected Description Invalid opcode (6) X X X The emulate MMX instruction bit (EM) of the control register (CR0) is set to 1. Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of the control register (CR0) is set to 1. Stack exception (12) X During instruction execution, the stack segment limit was exceeded. General protection (13) X During instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. Segment overrun (13) X X One of the instruction data operands falls outside the address range 00000h to 0FFFFh. Page fault (14) X X A page fault resulted from the execution of the instruction. Floating-point exception pending (16) X X X An exception is pending due to the floating-point execution unit. Alignment check (17) X X An unaligned memory reference resulted from the instruction execution, and the alignment mask bit (AM) of the control register (CR0) is set to 1. (In Protected Mode, CPL = 3.) need other links go to SITE MAP |
Bisnowden,3330 Adeline st. Berkeley,Ca94703 or send to bisnowden@yahoo.com Tele 510-595-1332send mail to
about this web site.
|