Slow search response
Incident Report for UNIFI
Postmortem

ROOT CAUSE ANALYSIS: An automatic forced upgrade of an AWS OpenSearch instance caused the backend search service to fail and remain unreachable when restarted.

  • From notification page: “Service software update R20211203-P2 available. Update will be automatically installed after 2021-12-15T21:07:06Z if no action is taken.”

REMEDIATION: The search cluster was re-indexed with the new release, and an EventBridge alerting rule was configured to alert DevOps engineers for remediation before a forced upgrade in the future.

Posted Jan 20, 2022 - 11:41 CST

Resolved
Search service again operating as normal. Our apologies for the disruption, and the DevOps team is putting additional monitoring on forced refreshes of the AWS ElasticSearch service. (NOTE: in UNIFI 4.0, searches are conducted locally and ElasticSearch is no longer required.)
Posted Jan 18, 2022 - 21:23 CST
Update
The ElasticSearch cluster has been rebuilt and the search index is repopulating (typically 2 hours).
Posted Jan 18, 2022 - 20:19 CST
Update
Efforts are continuing to investigate recovery from the forced restart.
Posted Jan 18, 2022 - 15:16 CST
Update
ElasticSearch cluster upgrade still processing. AWS forced upgrade notifications being configured in monitoring system (OpsGenie.)
Posted Jan 18, 2022 - 10:44 CST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jan 18, 2022 - 10:12 CST
Update
The ElasticSearch cluster experienced a forced restart (Service software update R20211203-P2 started.)
Posted Jan 18, 2022 - 10:11 CST
Investigating
We are experiencing longer than normal search response. The DevOps team is investigating.
Posted Jan 18, 2022 - 10:07 CST
This incident affected: Desktop Client and Web Portal.