The challenge of multisite test
Translating the economic benefits of parallel memory test to non-memory devices.
Greg Smith, Consultant -- Test & Measurement World, 2/1/2006
![]() |
| TABLES: Memory vs. non-memory devices Factors affecting test cost READ OTHER FEBRUARY ARTICLES: Contents, February 2006 |
It seems simple. If you want to cut the cost of test for an IC, you should double or even quadruple the number of devices you test in parallel. After all, memory manufacturers have proven the value of this technique beyond a shadow of doubt—it's becoming standard practice to test 128 DRAMs in parallel (Ref. 1). Multisite testing has reduced the capital cost of a DRAM test site from $400k in 1997 to about $27k in 2004, allowing test costs for memories to stay constant or even decrease, even though densities have increased from 64 Mbits to 1 Gbit over the same period.
Why shouldn't the same math apply to all devices? To some extent it does, but most test engineers know that things are never as simple as they seem. Memories and SOCs are very different, and analyzing those differences can help you see why increasing the number of test sites may not necessarily result in cost savings on some testers.
In previous generations of non-memory testers, the combination of pin count, BIST/DFT features, and mixed-signal cores in non-memory devices conspired the keep most production test solutions to a maximum of dual site. Most testers did not provide the specialized features that allow the tester to independently synchronize to multiple devices in parallel, robbing efficiency from multisite solutions. Only recently have some ATE manufacturers delivered testers that provide sufficiently high-density digital and mixed-signal instruments and architectures that are capable of supporting massive multisite for non-memory devices.
The two key differences between memory and non-memory devices are test times and total production volumes. Big production runs and long test times make memories ideally suited for massively parallel testing. A memory tester equipped with 128 sites testing a 1-Gbit memory with a 128-s test time will have an output of about 3600 units per hour (UPH). A non-memory tester with four sites testing a device with a 4-s test time has the same throughput. An attempt to reduce test cost for this device by going to a 16-site test would theoretically produce 14,400 UPH, but other facets of production are likely to limit the payback from creating a massive multisite test solution.
Multisite efficiencyThe relative efficiency of testing memories and SOCs can be very different (Ref. 2). The efficiency of memory testers is largely irrelevant due to the algorithmic nature of memory test and the ability to generate test stimulus and process test results in per-site hardware. The test list for memories consists of a small number of tests that have long execution times, so relatively little time is spent setting up the tester compared with the actual test time. Non-memory devices, conversely, have test lists that can be thousands of tests long, and each test may require only a few milliseconds to perform.
Bottlenecks in the tester architecture become more and more noticeable as the number of sites increases. Every element of the tester design must be optimized for multisite efficiency. The efficiency of DC tests depends upon the ability to quickly sequence these tests under pattern control, eliminating any serial programming of tester hardware. The efficiency of mixed-signal tests depend upon the ability to move and analyze captured data quickly while testing continues in the foreground. The efficiency of many mixed-signal and digital tests depend upon the ability of the tester to independently synchronize (or match) on each site in parallel, otherwise these tests must be done serially. In other words, the tester must be designed from the ground up with a fully parallel architecture.

Figure 1. High efficiency is crucial to low cost of test. A tester must be more than 75% efficient to provide any real benefit beyond quad site. To be cost-effective beyond eight sites, a tester must be more than 90% efficient.
Figure 1, adapted from Ref. 2, shows that a tester must be more than 75% efficient to provide any real benefit beyond quad site. To be cost-effective beyond eight sites, a tester must be more than 90% efficient. Only a parallel architecture tester will be able to achieve this level of efficiency in production.
A crucial element of the test cell is the device handler. Pick-and-place handlers (P&P handlers) that can handle a large number of package types and a wide range of device pin counts are often used for non-memory devices. These handlers can be easily changed from one package style to another, and many support testing at ambient, cold, and hot temperatures. Inside the handler, the device goes through four basic stages:
- waiting to be tested in an input tray;
- being loaded into a carrier in the handler and brought to the correct temperature for testing;
- placed into the test socket, tested, and then placed back into the carrier; and
- sorted, where good devices and bad devices are placed into separate output trays.
P&P handlers are able to perform all four of these processes in parallel. The handler takes into account anticipated thermal soak times and presumed test times to determine how many devices to queue in the soak chamber and how many devices to sort in parallel. Like the tester, the handler represents a reasonable tradeoff between throughput and expense.
Two main factors determine the throughput of a handler:
- Index time, the time required to remove tested devices from the test sockets and install fresh, untested devices. For P&P handlers, index times range from 0.4 to 0.8 s. Index time must be added to the test time of the device when calculating throughput. On some handlers, index times increase with the number of parallel sites.
- Maximum throughput, the maximum number of devices that the P&P handler can process in a given time period, if the actual test time is zero. The maximum throughput gives an indication of how many devices can be accommodated in the thermal soak chamber and how quickly devices can be sorted after testing is completed. Current handlers offer throughput of 5000 to 8000 UPH.
Unfortunately, the index time and the maximum throughput for most handlers differ depending on factors such as the number of parallel test sites, the size and type of the device package, and the dimensions of the device trays. Test temperature also can have a marked effect on the throughput.
Handler manufacturers will generally provide throughput curves specific to the handler model and change kit for each device type to help customers calculate expected performance. These curves represent the peak performance of the handler under ideal conditions. On a real production floor, variations in package dimensions and misalignments in carrier trays result in handler jams. Usually, an operator quickly clears these jams with a few keystrokes on the handler control panel, but while the jam condition exists, no material moves through the handler. Also, if the jam occurs in the thermal chamber or in the mechanism that presents the devices to the test sockets, the operator may need to open the handler to clear the jammed devices. Handler jams are specified using two parameters:
- Jam rate, the average number of devices processed between handler jams. For P&P handlers, jam rates will range from 1 in 10,000 to nearly 1 in 5000. How tightly device package dimensions are maintained can affect the jam rate, as can the weight of the device, because heavier devices are less likely to be mishandled. Test temperature also has an effect, with cold temperature testing tending to have the highest jam rates.
- Mean time to assist (MTTA), the amount of time required to clear a jam. Most of the time, a quick key press takes care of the jam in less than a minute, but a few jams require the operator to open the handler or break the setup and can bring down a test cell for as long as an hour. Also, the MTTA assumes that an operator is immediately available to service the test cell, instead of doing other work. For an operator working a few test cells, a reasonable MTTA is 2 to 5 min.
For multisite testing, it is crucial to remember that the jam rate is related purely to the number of devices being handled. Therefore, the selection of a handler with the lowest possible jam rate is critical to maximizing throughput. Also, the constraints will be very different for wafer probe test where throughput can be much higher and jams are not an issue.
Testing device lotsTesting ICs is a batch process. A batch, or "lot," of devices is loaded into a handler, tested, and unloaded. Then, a fresh lot is loaded, and so on. While the loading and unloading takes place, the test cell is idle.
When a lot completes, an operator summarizes the test results, unloads and labels the trays of good and bad devices, and loads fresh material into the handler. The amount of time needed for this end-of-lot (EOL) processing is almost independent of lot size but will vary depending on the level of automation. It also depends on the number of test cells each operator covers. If the operator is doing something else when a lot completes, a test cell will stand idle until loaded with fresh material. Informal manufacturer surveys indicate that a reasonable estimate for EOL processing is 5 to 10 min, mainly depending upon the number of test cells an operator manages.
The impact of this idle time on test cost is a function of the size of the lot and the amount of time required for EOL processing. The larger the lot, the longer it will take to run through the test cell, meaning that the test cell is idled less frequently and therefore more efficient. If efficiency were the only thing driving lot size, then large lots would be the best choice. Unfortunately, lot size depends on factors that often push manufacturers to make lots smaller, not larger. Customers want to keep work-in-progress inventories low and are unwilling to accept large lots, and semiconductor manufacturers are reticent to build large quantities and hold them in finished-goods inventory. In general, lot sizes are usually between 1000 and 10,000 devices. At the beginning of a production run, lots tend to be smaller, increasing in size as yields improve.
Consider a case where the throughput of a quad-site solution is 8000 devices per hour. A 2000-device lot could be tested in 15 min. If EOL processing takes an additional 10 min, during which the test cell is idle, then the test cell is idle 40% of the time, driving up the real cost of test dramatically. In contrast, if a single-site solution is implemented, testing the same 2000-unit lot may take 120 min. In this case, with the same 10-min EOL processing time, the test cell is idle only 8% of the time.
![]() |
| Figure 2. Device 1, a wireless base-band device with embedded memory, shows steadily decreasing cost of test up to an octal site implementation, but device 2, a similar device but without embedded memory and produced in lower volumes, is 30% more expensive to test at octal site than at dual site. |
The challenge is to understand all of these effects and determine what type of setup will be the most the most cost-effective. Two specific cases help tie the effects together:
- Device 1 is a wireless base-band device with a big embedded memory. It has 80 active pins, DACs, ADCs, and multiple processor cores. Because of the embedded memory, the test time is extremely long at 15 s. Demand for this device is high, and the expected production rate is approximately 1 million devices per month for the next year. Lot sizes are 5000 devices.
- Device 2 is a wireless-networking base-band device with the same pin count. It also has DACs, ADCs, and processor cores, but no embedded memory. Without the memory, and because of some effective DFT, test time is a blistering 5 s. Production is ramping up, and 10,000 devices will be shipped per month for the next year. Volumes are moderate—lot size is 1000 devices.
Our fictional test engineer has a tester that can be configured to test any number of sites from 1 to 16, and the multisite efficiency is a respectable 95%. She has selected a P&P handler with an index time of 0.5 s and a maximum throughput of 7000 devices/hr for quad, octal, and hex configurations. For dual site, this handler has a throughput of 3500 devices/hr. For single site, throughput is 1750. The handler jam rate is 1 in 5000, and MTTA is 2 min.
![]() |
| Figure 3. Lot size can have an effect on cost of test. Here, increasing lot size from 1000 to 10,000 cuts test cost 20%. |
Minimizing the cost of test for complex non-memory devices requires more thought than just doubling the number of devices tested in parallel. Even though memory devices have shown that massive multisite is a valuable strategy to minimize cost of test, non-memory devices present a different challenge to test cell throughput. Table 2 includes some of the major factors that can influence the economics of multisite solutions.
The one constant in semiconductor test is change. Tester and handler manufacturers are constantly refining technology and working with device manufacturers to explore new technologies to break these barriers. Other types of handlers, including strip-test handlers and matrix handlers, have been developed to handle devices in groups rather than individually. Also, P&P handlers are constantly improving to offer higher throughputs, lower jam rates, and advanced features to minimize MTTA. As these technologies come on line, the economics of multisite testing will evolve, and reductions in cost of test will continue.
| Author Information |
| Greg Smith was a consultant specializing in semiconductor ATE and handling systems when he wrote this article. He has now joined Teradyne in a technical marketing role. He previously held leadership roles in product development, marketing, and applications at LTX. |
| REFERENCES |
|
|























