Lost Frames
Definition
A lost frame is one which was sent on the network but is not successfully
received by the destination node. When a frame cannot be received,
it is dropped by one of the networking components involved at the receiving
node. Dropped frames may occur inside the network equipment (hubs,
routers or switches) but the receiving node has no way of knowing when
such frame loss has occurred.
From a trace standpoint, a lost frame is one that is dropped by the
receiving node. A trace of network activity where a frame is dropped
inside the network (and NOT at the receiving node) is still a valid trace
even without that frame BECAUSE THIS IS EXACTLY WHAT THIS NODE HAS EXPERIENCED.
However, if this node itself cannot receive all frames sent and thus the
trace is missing some frames that the node should have received, then this
is considered a lost frame.
The only frames in danger of being lost are inbound frames. This
is because an outbound frame is already queued in the memory of the trace
machine and won't be dequeued until it is successfully sent to the physical
media. On a receiving node, there is a bottleneck in trying to copy
the frame from the physical network into the queues managed by protocol
drivers. If this bottleneck is overrun then the trace will not contain
all of the frames it should contain. This is a situation which this
document is attempting to avoid.
Types of Lost Frames
There are 3 types of frame loss. Each of these can be categorized
by its source:
Type
|
Source
|
Root Cause
|
Constraint
|
Reportable by NTRACE
|
1 |
LAN Hardware |
Volume of arriving traffic to the LAN adapter is greater than the adapter
can process or queue. |
adapter throughput and/or adapter buffers |
Sometimes1 |
2 |
NDIS Software - possibly NTRACE |
Volume of arriving traffic to the NDIS software environment is greater
than the NDIS drivers can process.
If lost frames are not occurring when tracing is not operational, then
the likely component at fault is Network Trace. |
CPU |
Sometimes1 |
3 |
NTRACE |
Volume of arriving traffic to Network Trace is greater than can be
queued in its available "intermediate" buffers. |
RAM and/or GDT Selectors2 |
Yes |
Processing Speed versus Buffering
It should be understood that the only reason to queue
traffic stems from the lack of available CPU cycles to process the frames
at the time they arrive. In other words, if the system can process
the arriving frames as fast as they arrive, then no intermediate buffering
is ever needed! Buffering (queueing) is a technique for handling
a temporary lack of CPU cycles.
Type 3 frame loss is reported by NTRACE during
tracing (see below). If this type of frame loss occurs infrequently,
at times of peak traffic volumes, then increasing the intermediate buffers
will be an effective way of dealing with this condition.
However, if type 3 frame loss is reported in ever
increasing numbers of frames, this would indicate that buffering is NOT
an effective way of addressing this issue. This is indicative of
a severe and chronic lack of CPU cycles. In other words, it indicates
that the frame loss is really a type 2 loss and not a true type 3.
Likely Scenarios for Encountering Frame Loss
All types of frame loss are most likely to be encountered
during periods of heavy traffic.
This is also known as high network utilization. High utilization
might be defined as utilization which exceeds 40% for a significant period
of time (multiple seconds at a time). However, this is environment
specific and is only to be used as a guideline. Examples of such
conditions:
-
WSOD/RIPL "boot storms"
-
LAN segments with:
-
large numbers of nodes
-
heavy traffic involving large frames
-
batch oriented large file transfers
-
loading of large network applications
Determining When Frame Loss Occurs
Both type 1 and type 2 frame loss manifest themselves by the hardware reporting
that frames have been lost. Without help from the hardware, there
is no way for Network Trace to know how many frames were dropped because
the hardware or software was "busy". In addition,
Network Trace has no way of determining whether the frame loss reported
by the hardware is due to "purely hardware" issues or due to the software
being too busy to handle new frames. Likewise, Network Trace is completely
dependent upon the LAN hardware for the accuracy of these reported statistics.
It is possible that the LAN hardware could be understating its statistics
if it is overrun severely.
During tracing Network Trace will display statistics in 15 second intervals.
One column in this report is for tracking frame loss, it is appropriately
called "Lost". This column can indicate both types of frame loss.
If the numeric value in this column (on the "Frames"
line) is greater than zero, then type 3 frame loss (due to a lack of intermediate
buffers) is occurring. Network Trace is able to keep an accurate
count of this type of frame loss and this number is the cumulative number
of lost frames through this interval. In addition, a summary
of the buffer usage will be displayed at the end of the trace, to give
an indication of the number of buffers needed to prevent this type of frame
loss.
In addition, there may be a character displayed to the right of the
lost column (again, on the "Frames" line):
Character
|
Meaning
|
a space/no visible character
|
No hardware frame loss was detected. |
?
|
Network Trace is reporting that in this reporting interval, it cannot
obtain updated statistics from the LAN hardware. This will only occur
on hardware that requires statistics to be updated "On Demand" (see the
next section). When Network Trace attempted to force a statistics
update, the LAN hardware returned a failure and no valid statistics could
be measured in this interval. This is an indication that frame loss
is to be suspected but cannot be confirmed. |
*
|
Network Trace has obtained notification of hardware frame loss during
this interval. |
The query option for NTRACE.EXE can
be used to determine if the current LAN hardware supports statistics. This
is important because Network Trace cannot detect hardware frame loss without
assistance from the hardware itself. By definition, if the hardware
is dropping frames then the software will not have any way of tracking
this, except by the statistics/counters which the hardware prepares.
By running NTRACE -q, a report is generated with an entry for "Statistics".
There are 3 possible values for this entry:
Reported Support
|
Meaning
|
NTRACE Can Report Type 1 and Type 2 Loss
|
YES
|
The LAN hardware is advertising that it will report on frame loss AND
that its statistics are always up to date. |
Yes |
On Demand
|
The LAN hardware is advertising that it will report on frame loss BUT
its statistics must be forced to update. NTRACE will properly update
the statistics every report interval to handle this situation. |
Yes |
NO
|
The LAN hardware reports that it does not maintain statistics. There
is no way for Network Trace to access this information because this information
is not maintained by the hardware. |
No |
If an '*' is displayed during tracing, a full
report will be printed of all reported hardware frame loss. The following
indicators MAY be present IF the hardware supports statistics AND IF the
value of the associated statistic was changed while tracing was in operation.
The following is displayed regardless of the type of LAN hardware:
Network hardware errors detected during this session:
Total frames with CRC
error
x
Total frames discarded
- no buffer space x
Total frames discarded
- hardware error x
If using Ethernet LAN hardware:
Total frames with alignment
error x
Total frames with overrun
error x
If using Token-Ring LAN hardware:
Total frames with FCS
or code violation x
Frames recognized -
no buffer available x
5 half-bit time transition
absences x
A/C errors
x
Frame copied errors
x
Statistics for which there are no change during
tracing are not displayed.
Minimizing Frame Loss
-
Type 1: If the frame loss is due to a purely hardware related condition:
-
If switching technologies are in use, tracing can be taken from a port
for which only a subset of other ports are copied. This will dramatically
reduce the number of frames (i.e. the LAN utilization) at the trace machine.
See the LAN Switch Port Monitoring section for
more details.
-
Ensure that a hardware failure or defect is not hampering tracing.
A faulty network adapter or sometimes even a faulty hub or switch might
contribute to lost frames in the network. If frame loss is occurring
inside the network (not in the tracing node) then this is not a problem
in the trace itself but is a real "production" problem needing resolution!
-
While not always practical, replacing the LAN hardware being used can make
a big difference. This pertains to both the LAN adapter and the network
hub or switch.
-
LAN adapters differ greatly in their ability to handle high utilization.
Busmastering adapters and adapters that use DMA transfers offer more robust
throughput and queuing capabilities.
-
Moving to a switched LAN will reduce the network utilization to its minimum
levels. Moving to a switch does not normally help LAN nodes with
shared services like a server because these devices are almost always the
end point in every network session.
-
Type 2: If the frame loss is due to a software "busy" condition (Remember:
this will manifest itself as hardware loss and so it is not possible to
determine the difference between this and purely hardware related frame
loss):
-
If currently using protocol mode,
consider using service mode.
This is the single most significant change that can be made to resolve
this problem. The service mode has been specially designed and optimized
to minimize or eliminate this type of frame loss. It has been tested
on extremely heavy network volumes without dropping frames.
-
Stop all applications not required to recreate the problem. If you are
running the Network Trace on a server stop all non-critical services.
This will reduce the CPU utilization on this machine.
-
While not always practical, increasing the speed of the CPU on the trace
machine will help in this case.
-
Move the Network Trace to a dedicated computer to do the trace. A workstation
with adequate memory, a fast processor and minimal network traffic directed
at it is preferred. This is often referenced as a "3rd party install"
and represents the optimal conditions for running Network Trace.
Tracing from a machine that is not participating in the communication flows
on the local network can reduce or eliminate lost frames.
-
Consider running in slice mode. If you do not require the full frames in
the resulting trace, the -s parameter may be used to invoke slice mode.
This will reduce the amount of time Network Trace must spend copying any
particular frame. This is especially critical in situations where
very large frame sizes are encountered. In these cases Network Trace
will spend more relative time copying very large frames and many small
frames may be lost during this period, if the network is heavily loaded.
There is no specific size which consitutes "large", this is environment
specific. Since Token-Ring can have frame sizes up to 18,000
bytes, whereas Ethernet is limited to 1518 maximum, the problem may be
more prevalent in Token-Ring environments.
-
Type 3: If frame loss is due to lack of buffers:
-
Increase the number of global memory segments (Segs
parameter).
-
The proper number of segments can be determined using:
-
The information under the heading of System
Memory Usage in this manual provides a guide to calculate the proper
number of segments.
-
A report of the number of buffers used and an estimate of the number of
additional buffers needed is displayed at the end of every trace.
This is "real world" feedback and can be useful, but it only is a single
data point.
-
If the number you require is greater than the MaxBufSegs value set in PROTOCOL.INI
you will need to change this and reboot the system.
-
Consider running in slice mode. If you do not require the full frames in
the resulting trace, the -s parameter may be used to invoke slice mode.
This will reduce the number of buffers needed by reducing the amount of
data to be copied.
© 2000 Golden
Code Development Corporation. ALL RIGHTS RESERVED