Elevated 503 errors in US-West-1
Incident Report for Websolr
Postmortem

Earlier today, one of the nodes in our proxy layer was replaced by the Auto Scaling Group in US-West-1 region. The replacement provisioned successfully, and it did not throw any alerts. It successfully received state from other nodes in the proxy layer. However, due to an outdated CloudFormation template, the node was missing a key setting that was needed for connecting to the proper Solr nodes. This created a situation where the node reported healthy even while responding with 500-level errors to all requests.

We were able to identify the issue and applied the missing setting to the new node, which resolved the issue.

Our Operations team is currently investigating the root cause with our Platform team and will be taking appropriate measures to avoid a reoccurrence. If you have any questions or concerns, please let us know at support@websolr.com.

Posted May 22, 2020 - 17:51 UTC

Resolved
This incident has been resolved.
Posted May 22, 2020 - 17:41 UTC
Investigating
We are currently investigating the issue.
Posted May 22, 2020 - 17:13 UTC
This incident affected: Region Health: AWS US West (California).