Global TMW:
Login  |  Register          Free Newsletter Subscription
Subscribe
Email
Print
Reprint
Learn RSS

Big servers at Big Blue

IBM engineers perform a myriad of tests on the company's server products, with automation holding the key to success.

Martin Rowe, Senior Technical Editor -- Test & Measurement World, 6/1/2004

Serving up the servers

RESEARCH TRIANGLE PARK, NC—If you checked your e-mail, browsed the Web, or used eBay today, you probably used an IBM Intel-based eServer. The eServer series, which consists of IBM's xSeries, BladeCenter, and IntelliStation servers, delivers e-mail, hosts Web pages, and lets people around the world share files and printers. There might even be an IBM server in your facility.

Brian Trainor leads nearly 200 design-and-verification engineers and technicians who test IBM's BladeCenter and xSeries servers.


At a sprawling campus near Raleigh, the computer giant employs more than 14,000 people. As you'd expect, the facility is filled with the latest computer technology. Wireless LANs let employees carry their notebook computers (IBM only, of course) throughout a building. They check their e-mail in the halls and in conference rooms.

At any one time, the company has about 60 server designs undergoing design verification. ("Serving up the servers ,"  describes the product lines.) Brian Trainor, program director for system validation, leads a staff of about 100 employees and 100 contractors in developing and running hundreds of design-verification tests for the servers. Another 80 people who develop hardware and software test tools work under the direction of Trainor's peer-level manager Sidney Chow, program director for servers and workstations. Chow's group also qualifies third-party peripherals for use with IBM eServers. "Cutting corners is not an option," says Chow. "Quality still takes priority over time to market." His staff has developed many tools that Trainor's group uses during design verification.

Test engineers take over

After design engineers are convinced that a server's hardware functions properly, the design-validation team takes over. The tests they perform run the gamut from EMC and signal-integrity tests to functional and interoperability tests under simulated traffic conditions.

Figure 1. IBM engineers develop test tools for several operating systems. The number of tools in use has increased in the last several years, with particular growth coming in Linux-based tools. Courtesy of IBM.

Why does IBM need nearly 300 people to perform design validation on 60 products? Each product must work with a myriad of peripherals, I/O cards, and operating systems (OSs). While you may think that IBM servers run just four operating systems (Windows, Linux, Netware, and Unix), the four consist of multiple flavors, all of which need testing. Figure 1 shows that the number of test tools in use at IBM exploded from 1999 through 2003. The number of Microsoft-based tools grew from 2000 to 2002, with Linux-based tools seeing growth in 2003.

When design engineers first power up a prototype, they use NK—an IBM OS used for in-house testing. Through NK's command-line interface, engineers get access to hardware functions that are difficult or impossible to reach through a commercial OS.

Chow's engineers write software tools for Windows, Linux, Unix, and NK to test servers at the bus level, subsystem level, and system level. The test tools emulate processors, exercise I/O buses, and measure performance under high traffic conditions, and they also control test equipment to ensure automated, repeatable tests.

Tool-development lead and test-strategy engineer Will Atherton and some colleagues developed a tool for exercising a server's PCI and PCI-X I/O bus through an Agilent Technologies bus exerciser/analyzer card. Under software control, the card executes read and write functions, measures data-transfer times, and checks for system and parity errors. It also monitors the bus for throughput and detects protocol violations. "Our test scripts communicate to the card through Agilent's application programming interface (API)," notes Atherton. "That lets us add intelligence beyond what we can get from the card's supplied software and hardware."

Atherton also helped develop a custom adapter that connects the Agilent card to IBM's BladeCenter servers. These servers use a nonstandard PCI connector, which lets IBM engineers design servers with smaller-than-usual PCI expansion cards. "The custom adapter lets us leverage our technology investments, yet ensure the quality of our design," says Atherton.

Will Atherton develops test tools that automate design verication.

This adapter isn't the only one that IBM engineers use on the PCI bus. Engineers Candice Coletrane and Chuck Beyer use an adapter in the design-analysis lab where they study I/O bus waveforms for signal integrity.

The adapter, from Nexus Technology, provides bus access for a 136-channel Tektronix logic analyzer so the engineers can view bus timing. They also use the analyzer to measure parameters such as setup time, hold time, rise time, fall time, and overshoot, and they use its four-channel analog-output module to send the analog signals to a scope.

When Coletrane and Beyer need to measure signals at points other than on a server's I/O bus, they use other custom fixtures, one of which holds a card vertically while probe stands hold scope probes in place. Naturally, the two engineers rely on various software tools to automate test equipment, make measurements, and analyze data. Beyer notes that they control test equipment over both IEEE 488 and Ethernet links, but "IEEE 488 is fading out."

Keep cool

Environmental testing is important to IBM because, as Trainor notes, "Cooling is our greatest design challenge. Customers want ever more dense servers that run at the highest possible speeds." In the environmental stress and analysis lab, where temperature chambers line the walls, Raymond Greggs and others tests the reliability of new servers.

When testing a server, Greggs uses custom automation software to control the UUT and the temperature chamber. IBM specifies that its servers will function properly at temperatures from 10°C to 35°C, so Greggs and his co-workers test beyond that range to ensure the servers operate as expected. A typical cycle consists of ramping a UUT's temperature from 45°C down to 0°C, powering off the UUT, then applying power and increasing the temperature in 10°C steps while exercising the UUT at each temperature.

Greggs also runs functional tests at the extremes of a server's power-supply voltages. A BladeCenter chassis, which holds 14 server "blades," uses up to four 1800-W power supplies that run on 220 VAC. For telecom applications, a BladeCenter runs at -48 VDC. (Table 1 shows the voltage ranges used in this test.)

IBM servers also must pass through another "environmental" lab on their way to production: EMC. The company's North Carolina facility has four 10-m chambers where 15 EMC engineers and technicians test for radiated emissions and immunity. Numerous smaller rooms house test facilities for conducted emissions, conducted immunity, and ESD immunity. EMC lab manager Randy Smith says that automated test software written in-house has greatly cut testing time and improved repeatability.

Not your average lab

Most of the testing that IBM engineers perform on servers takes place outside an environmental or EMC chamber. Subsystem and system-level tests take place in the "Superlab," a conglomeration of test sites that design engineers and test engineers share.

In the Superlab's Engineering Maintenance Effectiveness Test (EMET) area, engineers such as Quitisha Underwood test server designs for error handling. One test uses an American Arium in-circuit emulator that emulates the server's Intel processor. The emulator injects errors into the system that the engineers can't inject through hardware and it lets them see how a server handles errors.

For example, the emulator can inject an error-correction-code memory error at a specified memory location. This allows the server's power-on self-test or diagnostics to detect the failure. In contrast, hardware-induced memory errors corrupt the whole memory module, preventing diagnostic detection. "An error-handling test takes about 2 to 3 hours," reports Underwood. "Without automated test tools, the same test would take days."

Another test stand in the EMET lab lets engineers evaluate a server's ability to detect and locate memory errors. A BladeCenter server, for example, can diagnose memory errors down to the DIMM modules located on the blades. When a simulated catastrophic double-bit error occurs, an onboard LED lights to indicate a faulty memory module. Upon notification of an error, an engineer removes the troubled server blade from its chassis (no need to power down). A capacitor keeps enough power on the blade so an engineer can push a button and see the illuminated LED.

Temperature is another cause of hardware failures, and BladeCenter chassis contain management modules that keep track of these faults. A BladeCenter's blades have temperature sensors on each of their processors (up to four per blade). The management modules monitor each sensor and store the data for analysis. Engineers can access the modules and their data through a browser.

A BladeCenter's management modules also control the chassis cooling fans and monitor the power-supply voltages. Firmware in the management modules stores measurement limits, and the modules must generate the proper error messages when faults occur.

A team of engineers use software tools that simulate error conditions and verify that the server sends the correct error messages. To simulate hardware errors such as those on a PCI bus, IBM engineers use a remarkably simple tool—a manual switch box.

Putting it all together

While these low-level subsystem tests are important, they don't prove that a server is production worthy. An integral functions test (IFT) lets IBM engineers test the performance of major subsystems such as the PCI bus and other I/O ports.

In an IFT test, engineers put new server designs through months of functional testing. They test subsystems, processors, memory cables, PCI/PCI-X buses, electrical and optical connections, and networking protocols.

To run an IFT test, engineers simulate heavy user traffic conditions on numerous server configurations and operating systems. They use software tools to verify that:

  • a server runs Web servers, e-mail clients, TCP/IP stacks, and graphics applications;
  • device drivers work in multiple operating systems;
  • a server can update its BIOS upon receiving the data from a remote location;
  • a person can remotely view a server's boot-process functions; and
  • the server's PCI and PCI-X bus properly interacts with peripherals.

When a new server completes these and other tests, IBM's test engineers are convinced that it works. But a server must also work with peripherals such as hard disk drives, CD-ROM drives, graphics cards, mice, LAN adapters, and more. Here, the marketplace simulation and stress (MSS) test takes over. It simulates a typical installation and brings together the discrete functional areas of the server to perform a final test.

Engineers in Sidney Chow's group test all IBM-manufactured peripherals to ensure they're "server proven." Other peripheral manufacturers want their products to work in IBM servers, too, so they perform their own interoperability testing as part of IBM's quality program.

When IBM certifies a peripheral as "server proven," it places the peripheral into a database that anyone can access from the Web (www.pc.ibm.com/us/compat). The company also maintains an internal compatibility database for operating systems, applications, and device drivers. Customers can use the public test data when specifying a server's configuration.

Table 1. BladeCenter voltage test conditions
Nominal condition High test limit Low test limit
240 VAC 265 VAC 180 VAC
–48 VDC –38 VDC –55 VDC
50–60-Hz AC cycles 63-Hz AC cycles 47-Hz AC cycles


Partners in test
IBM uses test equipment from various companies, including:

Agilent Technologies
www.tm.agilent.com
American Arium
www.arium.com
Amplifier Research (now AR Worldwide)
www.amplifiers.com
Antenna Research
www.ara-inc.com
Computer Access Technology
www.catc.com
ETS-Lindgren
www.ets-lindgren.com
Fluke
www.fluke.com
Futureplus Systems
www.futureplus.com
Haefely
www.haefely.com
Nexus Technology
www.busboards.com
Pacific Power Source
www.pacificpower.com
Rohde & Schwarz
www.rohde-schwarz.com
Tektronix
www.tektronix.com
Thermo Keytek
www.thermokeytek.com
 

 

Serving up the servers

Consisting of several models and form factors, IBM eServer products—BladeCenter, xSeries, and IntelliStation systems—deliver e-mail, host Web pages, and connect users to shared resources such as disk drives and printers (www.pc.ibm.com/us/eserver/xseries/). IBM introduced the BladeCenter (figure) in 2002, making it the company's first chassis-based eServer. Most of my visit to IBM focused on the BladeCenter line.

A BladeCenter consists of a chassis that can hold up to 14 server "blades." Courtesy of IBM.

A BladeCenter consists of a 7U-size chassis that contains management modules that manage conditions for up to 14 server cards, or "blades," containing one or two processors (blades with four processors require two slots). A blade contains as much processing power as a single 1U-size rack-mounted server. Thus, a single 7U-size BladeCenter can handle as much traffic as 14 1U-size rack-mounted servers. A fully loaded BladeCenter chassis can support up to 65,100 users (Ref 1).

Also in the eServer line, xSeries servers come in rack-mounted boxes, while Intellistation servers come in stand-alone tower configurations. Depending on the model, an eServer board can support up to eight Intel Xeon processors (scalable to 16), memory up to 64 Gbytes, and hard disks up to 1761 Gbytes. All servers connect to other network components over Gigabit Ethernet links.

Reference
  1. "Worldwide configuration and operations guide," www.pc.ibm.com/us/eserver/xseries/pdf/cogfeb2004-ww.pdf, p. 9.
Email
Print
Reprint
Learn RSS

Talkback

We would love your feedback!

Post a comment

» VIEW ALL TALKBACK THREADS

Related Content

Related Content

 

By This Author

Sponsored Links



 
Advertisement
SPONSORED LINKS

More Content

  • Blogs
  • Podcasts

Blogs

  • Rick Nelson
    Taking the Measure

    August 28, 2008
    What’s your battery IQ?
    What features do you look for in a battery, and do you know which battery technologies to choose to ...
    More
  • Rick Nelson
    Taking the Measure

    August 27, 2008
    Jim Williams gets a shout-out in Forbes
    Forbes magazine has discovered that Silicon Valley isn’t all “slick marketing pitches, s...
    More
  • » VIEW ALL BLOGS RSS

Podcasts

Advertisements





NEWSLETTERS
Click on a title below to learn more.

Test Industry News (3 Times Per Month)
Machine-Vision & Inspection (Monthly)
Communications Test (Monthly)
Design, Test & Yield (Monthly)
Automotive, Aerospace & Defense (Monthly)
Instrumentation (Monthly)
Resource Center E-Alert (Monthly)
©2008 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites