Global TMW:
Login  |  Register          Free Newsletter Subscription
Subscribe
Email
Print
Reprint
Learn RSS

MMX Can Speed Image-Processing Software

Capitalizing on MMX technology isn't easy, but here are some tips for how you can take advantage of it.

Fernando Serra Imaging Technology, Bedford, MA -- Test & Measurement World, 6/1/1999

The MMX technology, which Intel added to its Pentium processors to speed graphics processing, can also speed image-analysis tasks. Unfortunately, capitalizing on MMX technology isn’t easy. First, programming languages, such as C, C++, and Basic, lack the data elements needed to let the code they produce use MMX hardware. Second, existing software cannot automatically benefit from MMX technology. Your choices are few. You can purchase new or upgraded image-analysis software that includes MMX capabilities, or you can resort to using assembly language to optimize your code to include MMX capabilities. Lastly, you may find you get reasonable-fast processing by simply optimizing existing code, without using MMX.

Intel’s MMX technology adds several new data types and new machine-language instructions to Pentium-class CPUs. The new data types let the CPU handle 64-bit data: The CPU uses 64 bits in each of its eight 80-bit floating-point registers to form MMX registers. Think of the 64-bit MMX registers as containing eight bytes, four words, two double words, or one quadword.

Because the MMX and floating-point operations share registers, software cannot mix floating-point and MMX instructions without paying a price. A Pentium takes about 50 clock cycles to toggle the floating-point register set between floating point use and MMX use. So, simply switching the context of the registers consumes valuable time.

MMX defines several new CPU instructions that manipulate a register’s data in parallel. For example, when an operation processes 1 byte in an MMX register, the same operation can take place simultaneously on the other 7 bytes. The added MMX instructions perform the following types of operations: add, subtract, multiply, multiply-and-accumulate, compare, shift, logical, move, and pack-unpack.

Test a Real Algorithm
To test how much MMX capabilities can speed processing, we optimized a variance algorithm, which determines whether or not an image contains useful information. The algorithm measures intensity distribution of pixel values in an image. If all values are more or less the same, the variance is small, meaning there is nothing useful in the image. If the values show a large variance, with some dark and some light, “something’’ exists in the image.

We developed and tested the variance algorithm using Microsoft Visual C++ 5.0 and a Windows NT 4.0-based 266-MHz Intel PentiumPro computer with 128 Mbytes of memory. Our timing information (Table 1) shows results for processing a 1023x1023-pixel 8-bit image. We chose the nonstandard image size simply to verify that the algorithm worked properly. Listing 1 shows our C-language algorithm for variance.

06t4table.gif (10347 bytes)
Listing 1
double Variance0(BYTE** rat,int dx,int dy)
{
double sum=0;
double sumsq=0;
for(int x=0; x for(int y=0; y y++)
{
    double pixel=
rat[y][x];
    sum+=pixel;
    sumsq+=pixel*pixel;
}
double n = double)dx*
(double)dy;
   
if(n>1) return (n*sumsq - sum*sum)/n/(n-1.);
else return 0;
}

During our optimization experiments, we kept track of the execution times for each new version of the software, as shown in Table 1. You can find all the optimization details in two sections, complete with code, at www.imaging.com/tutorials.html

We used the first version (Listing 1) as our reference. That routine ran in 147.8 ms. Each version builds on the code of the previous version, unless noted otherwise. First we modified version 1 to use only integer math (version 2). In version 3, we used a more efficient C-pointer construct to replace the innermost for loop. Version 4 added more speed when we “unrolled’’ the loops, a procedure in which we duplicated the code for the inner for loop to increase time between CPU branch instructions.

Moving to version 5 required a rewrite in assembly language. This version ran slightly slower than the best C-language version, but it provided the base from which we began code optimization that would eventually include MMX operations. Unrolling the assembly-language code in version 5 produced version 6.

Version 7—the first MMX code—required a drastic recoding of the assembly-language code from version 6. To optimize version 7 of the algorithm so it best used MMX hardware, we turned to Intel’s VTune software-analysis tools (developer.intel.com/ vtune/analyzer/). This product analyzes assembly-language code and determines how efficiently the code executes on the CPU.

VTune Optimizes MMX Code
An Intel PentiumPro CPU can execute pairs of operations simultaneously, but many rules govern which instruction pairs work well and which do not.

The VTune program helped us determine which operations we could pair to further optimize the MMX code. We used the resulting MMX code in version 8. Coding algorithms to take advantage of MMX hardware requires careful redesign of algorithms and careful coding in assembly language.

As you may have deduced from the data in Table 1, you frequently can increase processing speeds just by carefully redesigning the code you already have. Although the final MMX version of our test algorithm operates 10 times faster than the original C algorithm, most of the
increase came in the first two optimizations.   

In most cases, optimizing existing code yields the greatest returns in the first few optimizations. If you don’t need to squeeze out every bit of performance afforded by MMX while you wait for new software tools, try optimizing the code you already have. T&MW

Fernando Serra works as the Vision Group Manager at Imaging Technology and he is responsible for vision algorithms and software tools. He received a B.S.E.E. degree from Wentworth Institute of Technology in 1986. fernando@imaging.com.

Email
Print
Reprint
Learn RSS

Talkback

We would love your feedback!

Post a comment

» VIEW ALL TALKBACK THREADS

Related Content

Related Content

 

By This Author

There are no other articles written by this author.

Sponsored Links



 
Advertisement
SPONSORED LINKS

More Content

  • Blogs
  • Podcasts

Blogs

  • Martin Rowe
    Rowe's and Columns

    May 28, 2008
    More on Bill and Dave
    In my January 11 posting, "Tell your Bill and Dave Stories," I asked if the HP Way still e...
    More
  • Rick Nelson
    Taking the Measure

    May 6, 2008
    LED bulbs not quite ready for consumer lighting
    Now that people are finally adopting compact fluorescent bulbs, is it time to throw them out and rep...
    More
  • » VIEW ALL BLOGS RSS

Podcasts

Advertisements





NEWSLETTERS
Click on a title below to learn more.

Test Industry News (3 Times Per Month)
Machine-Vision & Inspection (Monthly)
Communications Test (Monthly)
Design, Test & Yield (Monthly)
Automotive, Aerospace & Defense (Monthly)
Instrumentation (Monthly)
Resource Center E-Alert (Monthly)
©2008 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites