Did Teams Go Down For You?

Here’s how to avoid an outage taking out your communications.

In the realm of modern business, the ability to maintain seamless operations despite unforeseen disruptions is of utmost importance. The recent CrowdStrike incident and the Microsoft outage just prior to that serve as reminders of the importance of having robust contingency plans in place. As businesses increasingly rely on digital infrastructure, it becomes crucial to ensure that no single point of failure can cripple their operations.

Note: And again today, Microsoft is in the middle of another 365 and Azure outage.

What is a Single Point of Failure?

A “single point of failure” (SPOF) refers to a component in a system that, if it fails, would cause the entire system to stop functioning. This could be a piece of hardware, software, or even a process or person that is critical to operations. If this single point fails, it can lead to significant downtime, productivity losses, and potentially severe financial consequences or loss of customer trust. Avoiding SPOFs involves implementing redundancy and backup solutions to ensure that no single failure can bring the entire system down.

What Happened?

Even before the CrowdStrike failure on July 19th, Microsoft had it’s own outage a few hours prior, taking Teams and other 365 apps down due to an “Azure Storage availability event.”

“Microsoft confirmed that the problem stemmed from a configuration change within its back-end workloads. That caused “an interruption” between its compute and storage resources, ultimately leading to connectivity failures and the Azure outage. Those connectivity failures affected its downstream Microsoft 365 apps and services, which hinge on those connections. In other words, Microsoft accidentally broke its cloud.” (UC Today)

That outage, which took out Microsoft Teams along with other 365 applications, highlighted the vulnerability of relying on a single communication platform. For several hours, businesses experienced disruptions that hampered their ability to coordinate and communicate effectively.

With 300 Million Active Teams Users, that’s a huge amount of disruption for organizations around the world.

Graph showing the increase of Microsoft Teams users from 2 million in 2017 to 300 Million in 2023.
Number of Active MS Teams Users (Millions) by year. (Statista.com)

What are ways to avoid single points of failure?

While outages like the MS and CrowdStrike events are going to happen (And more frequently, according to MarketWatch, indeed Azure went down again while we were uploading this article) there are things you can do to reduce the pain experienced when something fails.

1. Create redundancy in your system

The most effective strategy to mitigate the impact of such failures is the implementation of technology redundancy. By creating alternative pathways and backup systems, organizations can significantly reduce the likelihood of complete service disruption. While you can’t make everything redundant, you can identify the crucial capabilities required to operate your organization and prioritize redundancy for them.

2. Perform regular maintenance and updates

Regularly updating software and firmware helps close security vulnerabilities and ensures that systems benefit from the latest performance enhancements. Routine maintenance, such as replacing aging hardware and verifying backup systems, can prevent unexpected failures and ensure operational continuity. In the case of cloud systems, you have less control, so you should investigate closely what a cloud vendor’s policies for such work is and what their track record is like.

3. Back up your data

Regularly creating and testing backups ensures that critical information can be restored in the event of a system failure or data loss. Implementing multiple backup solutions, such as cloud and local storage, enhances data protection and recovery options. Again, with cloud vendors, much of the responsibility falls on their shoulders so you should dig into what their policies and practice is.

4. Train employees and document processes

This piece is two-fold. Providing regular training ensures that employees are aware of best practices and procedures for handling potential failures, reducing human error. But if you have a single failure point person, thorough documentation of processes and protocols enables quick recovery and continuity of operations in the event of staffing changes or unexpected incidents.

Redundancy is key

The concept of avoiding a single point of failure extends beyond communication platforms. It applies to every aspect of a business’s telephony infrastructure. Whether it’s voice calls, video conferencing, or messaging services, relying on a single solution can expose businesses to significant risks. Implementing redundant systems and backup solutions is essential for maintaining operational continuity in the face of unexpected disruptions.

How could Teams + 8×8 have mitigated this downtime?

The 8×8 unified platform for contact center, business phone, video, chat, and APIs helps companies of any size deliver differentiated customer experiences. Had the 300M Teams users had additional functionality through 8×8 integrated with Microsoft Teams (besides making 8×8 enormously happy about the monopoly they’d have), all the customers affected by the July 19th outage would have still been able to function while MS fixed the issue. In the event of a Teams outage, 8×8 ensures that phone calls and other critical communications can continue without interruption. This integration not only enhances resilience but also boosts overall productivity and reliability with capabilities that Teams alone doesn’t provide. This includes contact center, SMS, and fax.

By having a multi-layered approach to communications solutions, businesses can ensure that even if one system fails, others can continue to provide essential functions. This is particularly critical in industries where constant communication is vital, such as healthcare, finance, and customer service.

How it works

8×8 integrates closely with Teams to provide capabilities that cost extra or aren’t’ available on the Teams platform. This includes the ability to make and receive calls outside of Teams. 8×8 provides a full phone system to Teams. Your customers can call you, users can make calls, and you can set up the detailed routing you need for your daily operations. Your end-users who are already in Teams can use the Teams client to seamlessly use 8×8 services including capabilities like contact center, fax, and SMS (which are not available in Teams). Non-Teams users and basic phones don’t need a Teams seat. You can have a phone in the break room or on a warehouse dock without the cost of a Teams account. These phones can still call Teams users or make and receive outside calls because 8×8 is a redundant cloud phone system.

The combination of Teams and 8×8 provides formidable reliability. Microsoft provides Teams and it already has a robust infrastructure to prevent outage. If Teams is out, 8×8 will stay up and even your Teams users can still make and receive calls using 8×8 resources. This provides communications’ redundancy. The 8×8 system is designed from the ground up for extensive reliability with a guaranteed 99.999% reliability.

Conclusion

The recent incidents involving Microsoft Teams and CrowdStrike serve as valuable lessons for businesses. Ensuring that no aspect of your telephony infrastructure relies solely on a single solution is crucial for maintaining stability and flexibility. By integrating systems like 8×8 with Microsoft Teams and adopting multi-layered telephony measures, businesses can safeguard against disruptions and ensure seamless operations. Investing in redundancy and diverse solutions is not just a precautionary measure; it is a strategic imperative for any business aiming to thrive in today’s interconnected world.

Ready to reduce your single points of failure in your communications systems? Vertical Experts are ready and able to help.