Causation not Correlation, Real-Time and Non-Intrusive

OpsCruise eliminates your war-room chaos by automated AI-driven fault isolation, pinpointing the sources of the problem and enabling fast resolution without intrusive code instrumentation.

Intelligent Application Observability for Kubernetes

Predictive Actionable Insights that Provide Clarity amid the Chaos

Traditional monitoring tools are fundamentally flawed for modern apps. OpsCruise is purpose built for containerized K8s applications and that means clearer insights, faster resolution, lower cost and happier customers.

Automated fault isolation

OpsCruise does the heavy lifting to isolate faults before they cause fallout and points you to the right areas to fix.

Automated Root Cause Analysis - the Holy Grail Quest of IT Operations

Automated root cause analysis (RCA) has long been the holy grail quest for Operations teams. With cloud native applications, the challenges of RCA are significantly exacerbated thanks to the sheer scale of the number of objects and their dependencies, the obfuscation from layered virtualizations, and the ephemeral and dynamic nature of microservices.

OpsCruise Introduces a Fundamentally Different Approach

OpsCruise is unique in automating the causal analysis process with its fundamentally different approach to application observability and without need for intrusive code instrumentation.  

OpsCruise builds on its auto-discovered application understanding both structure and behavior, predictive detection of problems, and operational flows for extracting global dependencies. To automate the complete causal isolation process, OpsCruise logically combines all inferences with operational knowledge. In essence, OpsCruise pre-identifies, collects and analyzes the diverse set of data on anomalies, the environment and the current application state to answer the 5 Whys - leading to automatically isolating the problem source for resolution.



It Starts with Understanding Your Application Behavior

You cannot comprehend what is going on in your application, much less find a problem and fix it, if you do not know how the application is expected to react to the current demands and behave. OpsCruise starts with profiling the application via a ML-based predictive model once it discovers and builds the Application Graph representing its full-stack structure.

The updatable and dynamic Behavior Model is built on 10s of metrics of the container or service and captures its correct operating region. It is central to the early detection of any problem(s) when they surface.



Find Emerging Problems Using Model-Driven Anomaly Detection

One unique challenge posed by microservice applications is that multiple bottlenecks and anomalies can co-occur given the high degree of communication between containers or services. OpsCruise uses a divide-an-conquer to address this challenge - this is crucial to automating the causal analysis step.

The learned behavior models used predictively at runtime across all containers checks for deviations from expected telemetry values. These deviations are raised as alerts and displayed on the application map.



Why Did the Anomaly Occur - Explain

Reasons for a container anomaly can vary widely, from unexpected resource usage, unanticipated higher demand requests, or even unusual number of calls made to other services or containers.

To understand what caused the anomaly, an associated interpretation algorithm surfaces the top reasons for the anomaly once it is identified. OpsCruise finds answers to the Whys for the next step in fault isolation. These explanations are important in analyzing the interrelationships between anomalies for the next step of global analysis.



Fault Isolation - Think Globally to Enact Locally

Identifying the source of the problem is one of the biggest challenges in the application in face of co-occurring multiple anomalies that create vexing “alert showers”. Simple correlations cannot not reveal causation.

OpsCruise exploits dependencies it captures from its auto-discovered application structure, and the temporal dependencies between the services from flow analytics. Combining the information related to these dependencies enables eliminating dependent anomalies that result from other anomalies.
The example below shows how real time flow tracing captures received and transmitted traffic between services containers of the application.



Knowledge-Based Reasoning: Bringing it all together

Cutting down war room time and MTTR means reducing the manual efforts to isolate a problem with sufficient detail to identify the required corrective action. Here OpsCruise’s innovation is using an embedded knowledge based approach to reason and inference from application understanding, and various insights.

OpsCruise’s automated reasoning engine uses a diagnostics engine that embeds knowledge on how to investigate application problems. This means it knows what questions to ask based on the nature and location of the problem, and environment in which it occurred.  For example, OpsCruise understands that an increase in ingress latency could be attributed to multiple causes including an unexpected increase in demand, an image change, or a pod failure.
The resulting summary is presented to the Ops team in a visual summary with all accompanying details. In the example below, the Cause tab for an application latency SLO breach highlights the two responsible anomalous containers that contributed to the latency increase in the path. The fishbone (Ishikawa) diagram for one of those containers provides overlaid explanations of what caused the anomaly.



Start your free trial now

Get ready to be amazed in 3 minutes or less

Try OpsCruise