OpsCruise can help you detect and resolve a wide range of problems that you might be facing in your Kubernetes environment from how to set thresholds to automating your root cause analysis analysis.
What’s happening with pods in my Kubernetes clusters, and do I need to do anything?
Browser receives a 500 error. A pod is continually restarting, which one?
OpsCruise continually integrates insights on containers, pods, and nodes from Kubernetes events and configuration to detect restarts, pending issues and other problems with all Kubernetes objects.
Beyond traditional monitoring tools, OpsCruise checks if the provisioned limits for the containers are appropriate, including cases where pod rebalancing is needed to mitigate potential evictions.
Tired of chasing metrics and thresholds, and trying to guess what thresholds to set?
Onboarding an app with 60 microservices. How do I set up alerts?
OpsCruise removes the guesswork in determining which metrics to monitor and where to set thresholds. Deriving 'influencer metrics' and 'auto-thresholding' are features of OpsCruise’s contextual AI which learns the behavior of each container. Dynamic metric selection and thresholds will enable you to detect emerging, otherwise undetectable problems.
Take the guess work required of traditional monitoring tools.
Flying blind on interactions between microservices across clusters?
One service in a cluster is having problems with a service in another cluster. Which one and why?
OpsCruise's Operational Tracing tracks traffic in real time between all components without requiring any application changes. You can now see how transactions flow through the system. By linking application metrics, Kubernetes and the infrastructure across the container path into one dynamic graph, you get an integrated view of the ‘fenced’ local context around the faults.
No other monitoring solution has so accurately 'stitched' together today's modern microservices environment.
How do you know that a container image update was the problem cause?
Customer unhappy with response time of a business service. One or more pods were updated. Which ones are those and how are they behaving?
One of the most challenging problems is confirming that an application code update creating ‘under the radar’ changes is the source of a problem. OpsCruise automatically detects application behavior changes (e.g., cache hits or Disk I/O changes that result in resource and service demands for the same incoming load) using its adaptive machine learning that updates the learned behavior model through continuous monitoring. Combining this behavior model change detection with detecting changes such as image updates enables pinpointing the root cause.