In follow up to our regular updates on status.hexonet.net, we have captured the following details to provide further insights to our dedicated and supportive customers. Thank you again for your understanding during the brief downtime.
Commencing on the morning of Wednesday, April 28th 2022, 03:15 +0000 (UTC) the HEXONET Control Panel was only partially available.
This was caused by a bug that, only under certain conditions, when searching for a punycode domain in our Control Panel search bar caused a server crash and restart. The suggestion engine modified the given term into an invalid input.
A solution has been put in place and fully availability was restored after 2022-04-28 06:30 +0000 (UTC).
A simple search for a punycode domain in our search on hexonet.net triggers our search engine to modify the entered term to provide alternatives. The library we use to convert punycode is amongst the most popular solutions and was even part of the nodeJS framework until recently: https://www.npmjs.com/package/punycode
The library implements RFC-3492 which extends RFC-3490 and mentions about the “toUnicode” method:
ToUnicode never fails. If any step fails, then the original input sequence is returned immediately in that step.
It turned out it does fail with a certain input that is searched for. Our suggestion engine modified that term and added a fitting TLD, seemingly fitting for the given input. This resulting term is an invalid input and caused a server crash/restart. Strangely, we only found this particular term being returned by a punycode conversion to trigger the bug. As such, the likelyhood to trigger the bug itself was extremely small.
Our implementation has a retry mechanism and will re-send the query after a little while if it doesn’t get the search results. In that case, it was never getting it because the query was crashing the server, so it was retrying and crashing the server repeatedly.
Our assumption is that possibly a user left their browser open and never got the search result so it was retrying and continuously crashing the server.
Our 2 Control Panel servers share a server for session management, and our loadbalancer directs the traffic evenly between servers, so that repeating query would randomly crash both servers.
The query extends the user session, and the session is managed in the server that wasn’t affected by the error, so that user session would continuously get extended.
In sum, only under rare conditions and only because of the given scenario with our Control Panel setup it was able to result in repeatedly crashing the Control Panel.
After a preliminary fix a full but yet relatively simple patch was subsequently deployed to catch errors occuring after conversion of punycode domains by our search engine.
We noticed and received first reports that the availability of the HEXONET Control Panel was only partially given.
Analysis started.
A fix has been put into place. Full availability of the HEXONET Control Panel has been restored.
The data integrety remained fully, at no point in time any data has been compromised.
__
As always, we remain dedicated to your success and hope this background information supports your understanding for all that transpired. If you have any questions or would like to connect with our team, please always contact us at help@hexonet.support.
Your HEXONET Team