Earlier today, one of the nodes in our proxy layer was replaced by the Auto Scaling Group in US-West-1 region. The replacement provisioned successfully, and it did not throw any alerts. It successfully received state from other nodes in the proxy layer. However, due to an outdated CloudFormation template, the node was missing a key setting that was needed for connecting to the proper Solr nodes. This created a situation where the node reported healthy even while responding with 500-level errors to all requests.
We were able to identify the issue and applied the missing setting to the new node, which resolved the issue.
Our Operations team is currently investigating the root cause with our Platform team and will be taking appropriate measures to avoid a reoccurrence. If you have any questions or concerns, please let us know at support@websolr.com.