• Home

2024-02-02 Summary of Outage (Resolved)

Written by Kevin Dutkiewicz

Updated at February 6th, 2024

Contact Us

  • The Essentials
    FAQs Forms
  • Announcements
    Carrier Events mFax Events Platform Events Release Notes
  • Billing Administration
    Datagate OneBill
  • Faxing
    mFax - Analog mFax - Digital Native Fax
  • Hardware & Software
    Manual Configuration Provisioning NDP Axis Cisco Fanvil Grandstream Polycom Snom Yealink Mobile Applications Desktop Applications Mobile-X SNAPbuilder TeamMate Connector UC Integrator
  • Hosted Voice
    Auto Attendants Branding Call Queues Call Routing CDRs Conferencing E-911 Features Fraud Integrations Inventory / Phone Numbers Local & Toll Free Porting Onboarding Recommendations SNAP.HD SIP Trunking SMS / MMS Users Voicemail Caller ID
  • Troubleshooting
    VoIPmonitor Firewalls PBX
  • Ray's Stuff
+ More

Table of Contents

Affected Services: Event Summary Event Timeline Root Cause Future Preventative Action

 

Event Description: Outage caused by DNS

Event Start Time: 2024-02-02 23:22 EST

Event End Time: 2024-02-03 01:05 EST

RFO Issue Date: 2024-02-06

 

Affected Services:

  • Inbound and outbound calling
  • Phone and device registration
  • Access to Manager Portal

Event Summary

The domain ucaasnetwork.com registration lapsed causing DNS resolvers unable to resolve anything for that domain. This caused services to fail as DNS caching was updated with no or incorrect information. Once the issue was identified, domain registration was renewed and services came back online quickly. Over the next 48 hours, some endpoints were still affected as their DNS providers took longer than expected to refresh their caches.

Event Timeline

2024-02-02 23:22 PM EST - First alert received from Insight regarding failed HTTPS health check to core1-atl
2024-02-02 23:36 PM EST - Steven responded immediately and began checking Apache, NMS, Manager Portal accessibility, and Insight monitoring. Noticed a drop in registrations on all servers
2024-02-02 23:44 PM EST - Contacted additional support 
2024-02-03 00:06 AM EST - Nodeping alerts began coming in. First indication that this was related to DNS. Began checking Constellix
2024-02-03 00:21 AM EST - Garrett noticed the ucaasnetwork.com domain was expired
2024-02-03 00:25 AM EST - Steven began attempting to contact Kevin and Ray to declare a major incident.
2024-02-03 00:30 AM EST - Attempts to log into domain registrar are hindered by MFA authentication going to non-public email address
2024-02-03 00:48 AM EST - Jack retrieved renewal email and MFA code direclty from Exchange
2024-02-03 00:54 AM EST - War Room/ Major incident created by Jack
2024-02-03 00:56 AM EST - Steven successfully renewed ucaasnetwork.com
2024-02-03 01:05 AM EST - Announcement posted to Discord, Uptime Robot, and Partner Central by Jack.

Root Cause

Domain registration for ucaasnetwork.com lapsed due to misconfigured alerting. Alerts were being sent to a legacy email address without copying new workflows. 

Future Preventative Action

An internal project was created and will be worked to isolate the monitoring lapse and create long-term scalable solutions. Additional criteria and alerting options will also be explored as part of the project

 

crisis template emergency plan

Was this article helpful?

Yes
No
Give feedback about this article

Related Articles

Knowledge Base Software powered by Helpjuice

Expand