Protocol Analyzers Test Fibre Channel Systems
A protocol analyzer can help you isolate and analyze problems in Fibre Channel networks.
Paul Levin and Greg Beutler, Xyratex International, Irvine, CA -- Test & Measurement World, 8/1/1999
Fibre Channel systems are used in computer networks that transfer large quantities of information from place to place. Such networks might transfer video information from workstation to workstation, or to and from workstations and arrays of disks. Fibre Channel provides plenty of bandwidth, it operates over long distances, and it offers all the “hooks” necessary to permit users to mix packets of video, audio, graphics, and control information using a variety of protocols, such as Internet protocol (IP) or SCSI. (See “Key Fibre Channel Features,”) Thus, it provides an excellent medium for many types of computer networks. But someone must ensure that a Fibre Channel network operates properly. The day will come when a Fibre Channel network doesn’t work as well as expected. Response times may start to climb or occasional video or audio packets won’t arrive at a destination in time, or they will fail to arrive at all. You must be ready to find out why a network isn’t working properly and how to get it back on track. The most basic tool you can use to test a Fibre Channel system is a protocol analyzer, which may include performance-monitoring software. When you insert a protocol analyzer in a Fibre Channel loop or network, it acts only as an observer. It neither initiates nor terminates any “traffic,” nor does it alter the network’s traffic in any way. A Fibre Channel network employs unidirectional links that go in a daisy chain—often called a loop—from device to device. To use a protocol analyzer to monitor the performance or behavior of a device in a network, you monitor the information that goes into the device and the information that comes out of it. Consequently, all Fibre Channel protocol analyzers and performance monitors are two-port devices (Fig. 1). In the example shown in the figure, the protocol analyzer uses Port A to monitor the traffic flowing to the storage system and uses Port B to monitor the traffic the storage system transmits back to the workstation.
Stamp Your Traffic A performance monitor—usually software that runs on a protocol analyzer—indicates traffic levels, traffic statistics, and basic error conditions for the information going past the analyzer (Fig. 2). Specifically, Fibre Channel performance measurements include Ø data rates in bytes/s and frames/s; Ø link utilization; Ø traffic characteristics; Ø error conditions; Ø code violations (CV) or “illegal” 10-bit codes; Ø cyclic redundancy check (CRC) failures; and Ø loop-initialization-procedure events.
Because Fibre Channel data rates approach 100 Mbytes/s, a protocol analyzer cannot save every piece of information it reads—at least not for long. So, instrument suppliers provide selective triggering that lets users choose the specific pieces of traffic to save. At its simplest, this type of trigger is analogous to a trigger on an oscilloscope. Unlike a scope, though, the protocol analyzer’s trigger lets it start and stop acquiring traffic many times so it can collect separate but similar sections of traffic for later analysis. Usually, acquisition of traffic information starts on a match of trigger conditions with either particular fields within frames or with protocol signals that indicate specific events. If your protocol analyzer provides performance-monitor functions (Fig. 2), you can have the analyzer trigger acquisitions based on the occurrence of specific throughput values or error conditions. Filters Select Traffic To aid in capturing the proper traffic, instruments provide data filters. A filter might limit the instrument to capturing traffic only from a particular source. The filter also can detect particular types of commands or responses, or it can filter just the first n bytes in each frame. When a Fibre Channel system includes several sets of computers, workstations, and arrays of disks that operate over a fabric, you’ll need several sets of test gear that must operate in concert. (Fabric is a Fibre Channel term that simply means a system of routers and switches.) In many cases, you won’t have access to the internal fabric connections, so you will have to synchronize your test instruments—usually with a direct cable connection—so you can identify and correlate what happens on one part of a network with what happens on another part (Fig. 3).
Now that you know more about the tools available to help solve Fibre Channel problems, how can you apply them? First, you need to check the integrity of the Fibre Channel network itself. Error logs maintained by workstations and smart hubs in the fabric can indicate problems. If your Fibre Channel system already includes a protocol analyzer, the analyzer will provide log information, too. (Some users leave an analyzer constantly connected to a network.) The logged information helps you assess whether or not the network’s electrical or optical links are operating properly. Fibre Channel’s stated objective is to operate with a bit error rate of less than 1 in 1012, or roughly 3 errors/hr. Most Fibre Channel users report error rates considerably lower than that. If the error log reports unexpected loop-initialization procedures or more than one or two code-violation or CRC errors in an hour, you’ll need to examine the integrity of the entire network. Loop-initialization procedures (LIPs)—required steps that ensure a network is restored and functioning properly—generally don’t occur unless the network contains a defective device or the network has a break. LIPs also occur when someone adds or removes a device from a Fibre Channel network. Check Network Integrity First The Accredited Standards Committee ANSI-X3T11 is now working to resolve the problem of identifying sources of errors in a network. The original standard provided for error reporting using a Link Error Status function, but due to ambiguities in the standard, no manufacturer implemented error-reporting in a Fibre Channel product. Until manufacturers provide error-reporting functions, though, you have no easy way to poll the network to determine which port is the first in the network to detect errors. So, as a first step in testing a network, measure the signal power at the receiving end of the network to determine if it’s below the expected value. If it is, the network probably suffers from a link-integrity problem. If network integrity is fine, yet the network still doesn’t function properly, a protocol analyzer and performance monitor will help locate the source of errors. Start at the network’s origin and use the protocol analyzer to bridge each device on the network, one at a time, until the protocol analyzer captures errors worth analyzing (more than the few-per-hour rate). You’ll see a sharp increase in the error rate when you bridge the offending part of the network. Look for an Overload To help analyze the cause of such a situation, you should set a performance monitor’s threshold to trigger the protocol analyzer at, say, 90% network capacity. Analysis of events just before and through a period of peak bus use may indicate why so much traffic was trying to get onto the network at once. Even if the Fibre Channel network doesn’t reach full capacity overall, an individual device on the network may overload. Overloading can occur when one device gets burdened with so many I/O requests that it cannot properly process them. At the same time, other arrays of disks may remain idle. A protocol analyzer can collect long sequences of only frame headers that you can then analyze If one device seems to be particularly busy, isolate its traffic and study it in more detail. The study may show that the network administrator needs to replace a disk storage system with a faster unit, or that a data structure should be spread between several storage systems to apportion network traffic more evenly among them. As a preventive measure, you should monitor the capacity of the network for all the devices at least weekly, at least to start. As you gain confidence in your measurements, you can stretch monitoring periods. Data from monitoring a network’s activity can reveal increased overall Fibre Channel use or increased use of a particular device in the network. If you notice an increasing response time of storage units—the time it takes to respond to a request for data—or if you notice a missing data effect—discarded frames during periods of congested traffic—you should capture detailed information continuously. Protocol analyzers provide a wrap mode that captures data continuously by writing the newest information over the oldest. The analyzer stops capturing traffic when a preset condition triggers it. In this way, you can analyze the traffic on the network up to and including the trigger condition. Remember, that you can set up a protocol analyzer or performance monitor to trigger on many different conditions or faults to help you determine the cause of problems. Home in on Errors If you need still more detailed information, run additional tests, but save only frames corresponding to a particular device ID, thereby capturing more frames over a longer period for the device that you suspect of causing problems. Then, when you find the frames that seem to cause problems, adjust triggering to capture more traffic in the vicinity of those frames. T&MW FOR FURTHER READING More information about Fibre Channel is available from the Fibre Channel Loop Community (Saratoga, CA), an organization that supports Fibre Channel education and standards, www.fibrechannel.com. |




















