Lost Frames

Definition

A lost frame is one which was sent on the network but is not successfully received by the destination node.  When a frame cannot be received, it is dropped by one of the networking components involved at the receiving node.  Dropped frames may occur inside the network equipment (hubs, routers or switches) but the receiving node has no way of knowing when such frame loss has occurred.

From a trace standpoint, a lost frame is one that is dropped by the receiving node.  A trace of network activity where a frame is dropped inside the network (and NOT at the receiving node) is still a valid trace even without that frame BECAUSE THIS IS EXACTLY WHAT THIS NODE HAS EXPERIENCED.  However, if this node itself cannot receive all frames sent and thus the trace is missing some frames that the node should have received, then this is considered a lost frame.

The only frames in danger of being lost are inbound frames.  This is because an outbound frame is already queued in the memory of the trace machine and won't be dequeued until it is successfully sent to the physical media.  On a receiving node, there is a bottleneck in trying to copy the frame from the physical network into the queues managed by protocol drivers.  If this bottleneck is overrun then the trace will not contain all of the frames it should contain.  This is a situation which this document is attempting to avoid.

Types of Lost Frames

There are 3 types of frame loss.  Each of these can be categorized by its source:
 
Type
Source
Root Cause
Constraint
Reportable by NTRACE
1 LAN Hardware Volume of arriving traffic to the LAN adapter is greater than the adapter can process or queue. adapter throughput and/or adapter buffers Sometimes1
2 NDIS Software - possibly NTRACE Volume of arriving traffic to the NDIS software environment is greater than the NDIS drivers can process.

If lost frames are not occurring when tracing is not operational, then the likely component at fault is Network Trace.

CPU Sometimes1
3 NTRACE Volume of arriving traffic to Network Trace is greater than can be queued in its available "intermediate" buffers. RAM and/or GDT Selectors2 Yes

Processing Speed versus Buffering

It should be understood that the only reason to queue traffic stems from the lack of available CPU cycles to process the frames at the time they arrive.  In other words, if the system can process the arriving frames as fast as they arrive, then no intermediate buffering is ever needed!  Buffering (queueing) is a technique for handling a temporary lack of CPU cycles.

Type 3 frame loss is reported by NTRACE during tracing (see below).  If this type of frame loss occurs infrequently, at times of peak traffic volumes, then increasing the intermediate buffers will be an effective way of dealing with this condition.

However, if type 3 frame loss is reported in ever increasing numbers of frames, this would indicate that buffering is NOT an effective way of addressing this issue.  This is indicative of a severe and chronic lack of CPU cycles.  In other words, it indicates that the frame loss is really a type 2 loss and not a true type 3.

Likely Scenarios for Encountering Frame Loss

All types of frame loss are most likely to be encountered during periods of heavy traffic.  This is also known as high network utilization.  High utilization might be defined as utilization which exceeds 40% for a significant period of time (multiple seconds at a time).  However, this is environment specific and is only to be used as a guideline.  Examples of such conditions:
  1. WSOD/RIPL "boot storms"
  2. LAN segments with:

Determining When Frame Loss Occurs

Both type 1 and type 2 frame loss manifest themselves by the hardware reporting that frames have been lost.  Without help from the hardware, there is no way for Network Trace to know how many frames were dropped because the hardware or software was "busy".  In addition, Network Trace has no way of determining whether the frame loss reported by the hardware is due to "purely hardware" issues or due to the software being too busy to handle new frames.  Likewise, Network Trace is completely dependent upon the LAN hardware for the accuracy of these reported statistics.  It is possible that the LAN hardware could be understating its statistics if it is overrun severely.

During tracing Network Trace will display statistics in 15 second intervals.  One column in this report is for tracking frame loss, it is appropriately called "Lost".  This column can indicate both types of frame loss.

If the numeric value in this column (on the "Frames" line) is greater than zero, then type 3 frame loss (due to a lack of intermediate buffers) is occurring.  Network Trace is able to keep an accurate count of this type of frame loss and this number is the cumulative number of lost frames through this interval.   In addition, a summary of the buffer usage will be displayed at the end of the trace, to give an indication of the number of buffers needed to prevent this type of frame loss.

In addition, there may be a character displayed to the right of the lost column (again, on the "Frames" line):
 
Character
Meaning
a space/no visible character
No hardware frame loss was detected.
?
Network Trace is reporting that in this reporting interval, it cannot obtain updated statistics from the LAN hardware.  This will only occur on hardware that requires statistics to be updated "On Demand" (see the next section).  When Network Trace attempted to force a statistics update, the LAN hardware returned a failure and no valid statistics could be measured in this interval.  This is an indication that frame loss is to be suspected but cannot be confirmed.
*
Network Trace has obtained notification of hardware frame loss during this interval.

The query option for NTRACE.EXE can be used to determine if the current LAN hardware supports statistics. This is important because Network Trace cannot detect hardware frame loss without assistance from the hardware itself.  By definition, if the hardware is dropping frames then the software will not have any way of tracking this, except by the statistics/counters which the hardware prepares.  By running NTRACE -q, a report is generated with an entry for "Statistics".  There are 3 possible values for this entry:
 
Reported Support
Meaning
NTRACE Can Report Type 1 and Type 2 Loss
YES
The LAN hardware is advertising that it will report on frame loss AND that its statistics are always up to date. Yes
On Demand
The LAN hardware is advertising that it will report on frame loss BUT its statistics must be forced to update.  NTRACE will properly update the statistics every report interval to handle this situation. Yes
NO
The LAN hardware reports that it does not maintain statistics. There is no way for Network Trace to access this information because this information is not maintained by the hardware. No

If an '*' is displayed during tracing, a full report will be printed of all reported hardware frame loss.  The following indicators MAY be present IF the hardware supports statistics AND IF the value of the associated statistic was changed while tracing was in operation.

The following is displayed regardless of the type of LAN hardware:

Network hardware errors detected during this session:
        Total frames with CRC error              x
        Total frames discarded - no buffer space x
        Total frames discarded - hardware error  x

If using Ethernet LAN hardware:

        Total frames with alignment error        x
        Total frames with overrun error          x

If using Token-Ring LAN hardware:

        Total frames with FCS or code violation  x
        Frames recognized - no buffer available  x
        5 half-bit time transition absences      x
        A/C errors                               x
        Frame copied errors                      x

Statistics for which there are no change during tracing are not displayed.

Minimizing Frame Loss

  1. Type 1: If the frame loss is due to a purely hardware related condition:
    1. If switching technologies are in use, tracing can be taken from a port for which only a subset of other ports are copied.  This will dramatically reduce the number of frames (i.e. the LAN utilization) at the trace machine.  See the LAN Switch Port Monitoring section for more details.
    2. Ensure that a hardware failure or defect is not hampering tracing.  A faulty network adapter or sometimes even a faulty hub or switch might contribute to lost frames in the network.  If frame loss is occurring inside the network (not in the tracing node) then this is not a problem in the trace itself but is a real "production" problem needing resolution!
    3. While not always practical, replacing the LAN hardware being used can make a big difference.  This pertains to both the LAN adapter and the network hub or switch.
  2. Type 2: If the frame loss is due to a software "busy" condition (Remember: this will manifest itself as hardware loss and so it is not possible to determine the difference between this and purely hardware related frame loss):
    1. If currently using protocol mode, consider using service mode.  This is the single most significant change that can be made to resolve this problem.  The service mode has been specially designed and optimized to minimize or eliminate this type of frame loss.  It has been tested on extremely heavy network volumes without dropping frames.
    2. Stop all applications not required to recreate the problem. If you are running the Network Trace on a server stop all non-critical services.  This will reduce the CPU utilization on this machine.
    3. While not always practical, increasing the speed of the CPU on the trace machine will help in this case.
    4. Move the Network Trace to a dedicated computer to do the trace. A workstation with adequate memory, a fast processor and minimal network traffic directed at it is preferred.  This is often referenced as a "3rd party install" and represents the optimal conditions for running Network Trace.  Tracing from a machine that is not participating in the communication flows on the local network can reduce or eliminate lost frames.
    5. Consider running in slice mode. If you do not require the full frames in the resulting trace, the -s parameter may be used to invoke slice mode.  This will reduce the amount of time Network Trace must spend copying any particular frame.  This is especially critical in situations where very large frame sizes are encountered.  In these cases Network Trace will spend more relative time copying very large frames and many small frames may be lost during this period, if the network is heavily loaded.  There is no specific size which consitutes "large", this is environment specific.   Since Token-Ring can have frame sizes up to 18,000 bytes, whereas Ethernet is limited to 1518 maximum, the problem may be more prevalent in Token-Ring environments.
  3. Type 3: If frame loss is due to lack of buffers:
    1. Increase the number of global memory segments (Segs parameter).
    2. The proper number of segments can be determined using:
    3. If the number you require is greater than the MaxBufSegs value set in PROTOCOL.INI you will need to change this and reboot the system.
    4. Consider running in slice mode. If you do not require the full frames in the resulting trace, the -s parameter may be used to invoke slice mode.  This will reduce the number of buffers needed by reducing the amount of data to be copied.

© 2000 Golden Code Development Corporation.  ALL RIGHTS RESERVED