7 Warning Signs Your Network is About to Fail – Part 1
I am often shown some very basic information such as a page of ping or trace route responses, or a slow reacting PC screen and asked if the network responsible for this evidence is facing impending doom. Sometimes these are an indication of an underlying problem but generally they are false positives – for any number of reasons. When network problems become noticeable to your users to the point of causing them regular headaches you could already be facing a complex problem. Complex problems have the propensity of creeping up on us and biting us when we least expect them. Sometimes we can identify a specific event that coincided with the problem appearing but when we roll back that action the problem is still there. I have witnessed these situations many times before. The last event was the straw that broke the camel’s back and now it is broken it seems impossible to repair. These situations can be very frustrating and very costly to resolve.
It would be great to have a crystal ball and know exactly when our networks were about to let us down but as we all know these do not exist. The next best thing is to have effective network monitoring. We can look at the history of our circuits and services and it provides us with a record with which we can compare current trends. This can also be done automatically with thresholds set to raise alarms if the thresholds are met or exceeded.
But, what if we don’t have a crystal ball or network monitoring? Are we doomed? No of course not! But we do need to have a plan ready to action should our network let us down, and we need to keep our eyes and ears peeled as we may just pick up on some early warnings that our network is under pressure and potentially about to fail. Our network users may be our best sources of performance information in the absence of any network monitoring facilities.
In part 1 of this blog I take a look at some of the potential problems that can and do occur and in part 2 I look at 7 warning signs your network may be about to fail. I have identified many problems like these when I have been asked to troubleshoot networks on behalf of third parties, the same type of problem often crops up across many diverse networks. Other than definite equipment failures, many other problems are the result of congestion. Something doesn’t perform as well as expected so the data gets held up. Therefore, I mainly talk about congestion when referring to network problems.
Local area networks tend to be running at high or very high speeds so if they are getting very congested this may not even become apparent users. Even when congested LAN switches can still move frames and packets from one port to another at very high speed. This is exactly what they are designed for – constantly moving data frames and packets from one port to another. However, when there is a bottleneck causing one or more of the switches to buffer data and that can then affect applications and become noticeable to users.
In my experience LAN congestion usually only lasts for short periods. There are exceptions to this, but these are usually very extreme congestion situations or bottlenecks resulting from poor configuration. Data moves at very high speed across LANs so problems tend to dissipate very quickly too. However, that doesn’t make these problems any less frustrating to users.
Symptoms include: momentary application freezes that may occur regularly throughout the day or during peak periods such as first and last thing in the morning and afternoon when the majority of people login and out of their applications or print. In schools, academies and colleges this may coincide with the beginning and end of every lesson period. These application freezes may be random, occurring several times throughout the day but with no obvious tie in to any business activity.
The most common causes of LAN congestion are usually easy to spot. For example, if you have a host supporting a number of virtual servers connected to the LAN at 1Gbps and you have 200 users across your office, each connected to the LAN switch at 1Gbps you may have a problem with contention on the switch port connected to the host. I hasten to add, this may not actually be a problem as so many factors have to be taken into consideration, including: the function of the Virtual Servers, the type of business, the applications, etc. Even so, on the face of it, a physical port speed contention ratio of 200:1 would certainly make me investigate this further.
While on the subject of port contention, there are some LAN congestion issues that are occurring at the access port. Wireless Access Points have been around for many years and historically they have been considered as access type devices. However, over the years as each generation is introduced the wireless access speeds are increasing to the point where many are now capable of exceeding the wired port speed. If we have Access Points capable of wireless speeds of over 100Mbps we are introducing unnecessary contention if we connect that into the LAN via a 100Mbps switch port.
Another cause of momentary apparent LAN congestion is the result of frame flooding. Frame flooding occurs when a switch does not have a record of the destination MAC address. If a switch has no record of the MAC address it has no other option other than to flood it to all ports in an attempt to deliver data. Flooding can also be a symptom of asymmetric routing when the core of the network comprises dual layer three switches and the mismatch between the MAC table address expiration (typically 5 minutes) and ARP table expiration (typically 4 hours) do not match. I have seen this on several third party troubleshooting engagements.
Wireless LAN Problems
Where do I start? Wireless LANs are susceptible to a plethora of airborne interference sources ranging from other Access Points in your own network and external networks, other wireless devices in your network or nearby, microwave ovens, mobile phones, and more. If the Wireless LAN is installed and configured correctly for your specific environment, in most cases you should expect trouble free operation. If it isn’t, you are potentially in trouble!
Establishing whether a problem you are experiencing is caused by the wireless network or the underlying infrastructure should be relatively simple to prove, assuming you have a device that has the option of a wired connection as well as wireless. It is vitally important to understand that the quality of wireless interfaces across all devices varies considerably, even across the same manufacturer.
In contrast to the LAN congestion symptoms described above, Wide Area Networks can suffer considerably with congestion, and when they do the congestion tends to last for an extended period. This is because the WAN links are typically much slower than LAN links so the congestion takes a lot longer to clear. I have seen WAN links become congested and remain congested for the entire business day, preventing users from logging in to their systems and accessing mission critical applications.
WAN congestion can be very expensive to resolve but it is important to note that simply throwing more money and bandwidth at the problem is rarely the best course of action. The most important thing to do is to get a handle on your traffic. We need to understand the traffic flows intimately. When we know what needs to traverse the WAN congestion may be reduced by filtering out unnecessary traffic and by applying Quality of Service. It may also be possible to reduce WAN congestion by educating users in appropriate use of IT. In some of my WAN performance troubleshooting engagements I have discovered that the majority of WAN traffic (almost 100% in some instances) has been recreational traffic, users watching videos and playing internet radio channels to the point where the business applications are totally down.
Virtual Private Networks typically carry data in encrypted tunnels. They can suffer with similar problems to WANs (some WANs are created using site to site VPNs) but the additional layers of tunnelling and encryption protocols add to the complexity of the data flow and can introduce problems that would not otherwise be there.
Internet congestion is possibly the most common out of all of the congestion types because there are so many people accessing the Internet for a variety of reasons. As with WAN congestion, if the situation is particularly bad it can take a long time for the congestion to clear. Typical symptoms are slow response, web pages not rendering or web pages failing to open.
As with any problem determination, the most important first step is to localise the problem. This can require a lengthy process of elimination but usually time spent on this process rarely goes wasted. There is a lot you can do to shorten this process, the obvious time saver is an accurate and detailed diagram. I cannot recall a single occasion when I have been asked to investigate network performance issues and there has been a current (or any) network diagram. Some of the diagrams I have been given have been worse than nothing because they were so misleading I would have been better off just diving straight in and creating the diagram as I progressed. I can remember some very frustrating conversations with Heads of IT in fairly large organisations when they realised their network team were unable to produce a diagram showing their connectivity between two or more of their sites.
Need help with your network?
Contact us using the form below…