• Home
  • Announcements

2025/05/30 WebSocket Connections For GRR, LAS, PHX and IAD Servers Failing To Connect

Written by Steven Spaulding

Updated at June 3rd, 2025

Contact Us

  • The Essentials
    FAQs Forms
  • Announcements
    Carrier Events mFax Events Platform Events Release Notes
  • Billing Administration
    Datagate OneBill
  • Faxing
    mFax - Analog mFax - Digital Native Fax
  • Hardware & Software
    Manual Configuration Provisioning NDP Axis Cisco Fanvil Grandstream Polycom Snom Yealink Mobile Applications Desktop Applications Mobile-X SNAPbuilder TeamMate Connector UC Integrator
  • Hosted Voice
    Auto Attendants Branding Call Queues Call Routing CDRs Conferencing E-911 Features Fraud Integrations Inventory / Phone Numbers Local & Toll Free Porting Onboarding Recommendations SNAP.HD SIP Trunking SMS / MMS Users Voicemail Caller ID
  • Troubleshooting
    VoIPmonitor Firewalls PBX
  • Ray's Stuff
+ More

Table of Contents

Affected Services  Event Timeline May 30th, 2025 May 31st, 2025 Root Cause Impact Summary Future Preventative Action Immediate preventative action taken: Long-term action:

Affected Services 

  • SNAPmobile Web

Event Timeline

May 30th, 2025

11:06 AM ET – Our monitoring system alerted support to failed WebSocket connections for the GRR, LAS, PHX, and IAD core servers.

11:12 AM ET – Our support verified the connection failure and began investigating.

11:31 AM ET – Vendor support was engaged after identifying an SSL failure on the WebSocket service.

11:41 AM ET – Our support contacted the vendor support by phone to escalate the ticket.

12:34 PM ET – We made the decision to cut over all web phone connections to ATL and implemented the change.

12:37 PM ET – We confirmed that the cutover was successful, and connections had been restored to ATL.

12:38 PM ET – Vendor support confirmed the failure and began updating the default Apache configuration files to reflect the correct SSL file.

13:15 PM ET – We observed that the GRR and LAS servers had successfully been updated and the SSL information successfully loaded into the WebSocket service.

13:47 PM ET – All remaining default Apache configuration files were updated to reference the correct SSL, with the SSL information reloaded into the WebSocket service.

May 31st, 2025

14:56 PM ET – After monitoring for stability for 24 hours, temporary rerouting of all WebSocket connections to the ATL server was removed and all WebSocket connections were returned to their original servers.

Root Cause

The default SSL file for the GRR, LAS, PHX, and IAD core servers had inadvertently been updated to reference a different SSL file name than what the core server's FQDN was, which caused a common name mismatch error when performing a certificate check.
 
This was due to a malformed database entry update that was applied during the maintenance window and did not immediately trigger any alerts. This led to the default SSL entry in the database being scheduled for replication to all other servers.
 
The following day, when the system automation for syncing certificates between servers was triggered, the default Apache configuration, which contains the SSL file to be referenced for the ATL core server, was replicated to the other servers in the cluster.

Impact Summary

  • Web phones were unable to fully connect to the GRR, LAS, PHX and IAD servers due to an SSL certificate common name mismatch error 
  • Automatic failover to alternate servers did not activate because web phones were partially registered. Manual failover to the ATL server was enacted to restore functionality to all clients while we worked with the vendor to fully resolve.

Future Preventative Action

Immediate preventative action taken:

Database entries that control default SSL replication were corrected to prevent incorrect SSL information being set incorrectly on the other servers. Any incorrect SSL entries were also removed.

Long-term action:

While working with the vendors engineering team, we identified areas of improvement for additional checks when installing their new certificate management service and provided recommendations on improvement to prevent incorrectly synced default configuration files.
 
As well an improvement request was submitted to the vendor to account for the partial registrations and to include these scenarios in the automatic failover mechanism.
unnamed piece titleless report mir major incident major incident report report websocket websockets may 30 websocket outage

Was this article helpful?

Yes
No
Give feedback about this article

Related Articles

  • Announcement Policy
  • Known Issues
  • 2024-07-15 Atlanta Core Server Outage (Resolved)
  • 2025-04-14 Call Recordings Unavailable in Manager Portal
  • 2025/01/29 - Inbound and Outbound calls failing on the LAS and GRR servers (Resolved)

Knowledge Base Software powered by Helpjuice

Expand