Issues with CloudLinux Network - "Bad Gateway. Unable to register at CLN server"
Incident Report for CloudLinux
Postmortem

While we continue monitoring the situation with the CloudLinux Network, we highly appreciate your patience. Thank you for staying with us, for your understanding. Right now, the ongoing issues with the CLN have been resolved. If you still experience some issues, please reach out to our Support team; we are glad to help.

We would like to shed more light on what has happened with the CLN during recent events. Recently, we've been migrating our CLN backend from Spacewalk to new software and updating the rest of the backend. The migration itself went successfully, and we did see some progress compared to Spacewalk's performance right away. However, the newer version had some issues, causing configuration problems on the CLN end. Reverting the upgrade would bring back older "legacy" issues that Spacewalk had, so the best course of action was to proceed further and resolve any new problems that we encounter.

Usually, the service can handle load spikes rather well, RHN Proxy Server can handle incoming requests and distribute them between CLN services. Moreover, caching helps us to ease the load even more. While we were resolving those problems, the overall load on the CLN has unexpectedly increased since the cache was cleared during the upgrade.
When the load spikes happen, we had to enable limits on the simultaneous connections (up to 500, and then up to 800 connections). This has stabilized the network issues and helped the majority of the customers to successfully update/register the system.
Rest assured that we will make adjustments to our infrastructure and software update process to avoid such situations in the future.

Posted Apr 30, 2021 - 18:49 UTC

Resolved
We're glad to report that this incident has been resolved.
Posted Apr 30, 2021 - 18:42 UTC
Monitoring
While we continue working on this case, several reports indicate that the CloudLinux Network services are operational.
We'll monitor the situation for this night, but for now, it looks like issues with CLN have been resolved.
Posted Apr 26, 2021 - 18:33 UTC
Update
The situation around CLN Services is improving; we have increased the number of simultaneous connections from 500 to 800. We hear back from the customers about the successful registration of their servers. We'll continue monitoring the situation, and we will keep you posted.
Posted Apr 26, 2021 - 12:55 UTC
Update
We regret to inform you that there are still some ongoing issues with the CloudLinux Network in place. We've mobilized all our Development & Infrastructure resources to resolve all the issues for good, and we will let you know as soon as the CLN is up and running again.

Meanwhile, we have some good news. We've managed to stabilize a limited number of concurrent connections to the CloudLinux Network, up to 500 at the same time. What this means for end-users is that it is possible to complete the yum transaction/register a server/etc. when there are free connection slots available.
For example, when you register a CloudLinux OS server, you take up one slot of 500. Once the registration is over, it becomes available to everyone else.

Several reports from our customers show that running yum transactions/registration commands a couple of times helps to finish them. If you can hold off with the registration/update, we advise waiting until we confirm that the CLN services are fully restored.
Posted Apr 26, 2021 - 11:03 UTC
Update
Unfortunately, we are still experiencing issues with the CLN.
Updates with yum and license activation may be performed with delay or fail with a timeout error.
Posted Apr 24, 2021 - 20:23 UTC
Identified
The issue has been identified and our CLN team is hard working on the fix.
Posted Apr 23, 2021 - 06:59 UTC
Update
Various licensing issues are expected.
Posted Apr 21, 2021 - 12:16 UTC
Investigating
We regret to inform you that there are issues with the CloudLinux Network services. You may experience difficulties while:
- updating the server via 'yum update';
- register/unregister a CloudLinux OS instance.

We are aware of the situation, and we are working on resolving it as soon as possible.
We will keep you posted here.
Posted Apr 20, 2021 - 10:14 UTC
This incident affected: CloudLinux Network.