OpsCruise loves Open Source. We embed and support a curated set of the most popular open source monitoring tools and standards - enabling a future safe architecture where you own your data.
The infrastructure and application tech stack (OS, containers, orchestration, DBs, Msg, Analytics) is open source itself. Monitoring is a natural extension.
New instrumentation standards and community driven tools through CNCF (Cloud Native Computing Foundation)
Changing monitoring priorities for modern apps
Own your data - Instrumentation can be collected once and leveraged beyond monitoring (e.g. security, capacity planning, operational analytics)
Cost of proprietary monitoring tools has skyrocketed, especially in the cloud
While these tools provide a cost-effective future-safe foundation for instrumentation and serve small organizations in pre-production well, collectively, they are siloed and lack the ease-of-use, integration and automation that organizations will need to scale.
OpsCruise offers a smart-layer on top of these to provide actionable insights for managing application performance and health, without compromising the monitoring layers’ functionality and deployment. This smart-layer include a common object model, user interface, SSO/RBAC, analytics and automation to simplify operations, predict degradations and accelerate troubleshooting.
The monitoring architecture is based on Open Telemetry (OTel) that provides a set of APIs, SDKs, tooling and integrations designed to create and manage telemetry including Metrics, Logs and Traces.
Kubernetes has taken the leading position in containerized application management with its thoughtful architecture, extensibility, scalability combined with well-designed APIs and object models. OpsCruise uses the APIs to discover and monitor Kubernetes objects throughout their entire lifecycle.
OpsCruise supports a variety of deployment options for Kubernetes. A DIY install or cloud options of managed Kubernetes such as EKS, AKS and GKE are all supported seamlessly.
Other new multi-cloud orchestration and management options which are based on Kubernetes such as OpenShift and VMWare Tanzu are also supported.
In addition to Kubernetes, OpsCruise also supports Nomad, a high quality orchestration solution with a loyal following.
Before the advent of Orchestration systems, application configurations for both deployment as well as functionality were ad hoc and diverse. With the growth of tools like Ansible, Chef, Puppet a measure of consistency emerged. However, there was a need to maintain these configurations using CMDB tools which required significant oversight for upkeep.
However, with Kubernetes and objects such as CRDs, Config Maps and Deployment manifests, a road to configuration standardization has emerged. OpsCruise mines these artifacts to discover and monitor components of the application. Besides extracting application componentry configurations automatically, OpsCruise also eliminates users needing to manually update configurations for dynamic microservice applications.
Prometheus introduced scraping as a way to collect metrics, and has grown to the most popular metrics sync. A combination of simplicity, high performance, flexibility and an extensible design has increased its popularity.
Metrics are not only needed for operations but are very useful for application design, business analytics, change management and product adoption analytics.
Using Prometheus and complementary architectural extensions such as Thanos offer an enterprise a high-quality, self-managed metric store environment.
The Prometheus ecosystem offers an ever-growing number of exporters which act as glues between it and existing and incompatible or non-standard metric mechanisms. They provide a migration path for enterprises to adopt OpenTelemetry. In time, products will support the OTel metrics emitting mechanisms and subsume these exporters.
OpsCruise recommends and leverages Prometheus to collect and forward metrics to OpsCruise for easily building real-time observability.
Logs continue to be a critical part of monitoring. Better instrumentation using metrics and tracing provide structured data that enhance application monitoring. However, logs provide useful additional information and the tools for logging that are native to Kubernetes environment are needed.
Loki is an example of a open source tool that has been designed for Kubernetes and the containerized environment. It provides auto-labeling, auto-deployment, compression and log analytics features that cover the needs of the microservice application environment.
OpsCruise recommends and leverages Loki to gather, filter and forward logs. Loki also provides integrations and APIs that allow the enterprise to mine the logs beyond operations. OpsCruise can also work in conjunction with other popular logging platforms including Splunk, Elastic and SumoLogic.
Prior to the microservice design approach, applications used to comprise a smaller set of thicker modules. It was easy to figure out the connections between them. These connections were also stable as the modules were also stable.
With the advent of microservices, applications now comprise a larger number of thinner modules that are both dynamically scalable and ephemeral. This makes it hard to ‘know’ the application in terms of dependencies across microservices. While code-based tracing provides a way to capture interactions between services, it comes at a heavy price in terms of time to analyze and infrastructure costs to support.
OpsCruise leverages the Extended Berkeley Packet Filter (eBPF) feature of Linux to achieve flow capture without touching the application components, or relying on network proxies or service meshes. Using eBPF and other mechanisms, OpsCruise weaves the various data to form a comprehensive view of the application.
In addition, if a Service Mesh, such as Istio, is already deployed, OpsCruise configures and leverages the mesh to gather and forward metrics to build the application topology.
A key component of the OTel design is the trace. Applications can be instrumented for tracing by an auto-instrumentation mechanism, although the depth of coverage varies by technology. For example, auto-instrumentation of Java applications is much better than for other languages. To add application knowledge and context, a developer’s involvement is needed.
Jaeger is a CNCF end-to-end distributed tracing tool used to collect and store trace data emitted by applications. OpsCruise recommends and leverages Jaeger to collect and forward traces for more complete observability. We enable an automated directed tracing capability when we identify a slow down in a class of transactions.
Customer experience has typically been addressed in two ways.
While OpsCruise can ingest metrics from such commercial tools, the standards based approach is still in its infancy. Open Telemetry is being extended to cover browser/mobile app instrumentation. This will allow a seamless integration to the rest of the instrumentation of the application.
Open source tools like the BlackBox exporter provide light and yet functional synthetic monitoring features that are easy to set up and integrate into the rest of the monitoring environment. OpsCruise leverages this exporter to provide metrics on the availability of the application environment.
As modules are deployed, the CD pipeline executes various stages using a variety of artifacts. A deploy step makes changes to the Kubernetes environment by either replacing images and/or the configuration data used by images.
While Kubernetes events provide events that record these changes, the upstream pipeline is external to Kubernetes. Spinnaker and Jenkins are popular CI/CD tools from the CNCF which OpsCruise recommends and leverages. Pipelines, execution events and artifacts are collected and tracked. They are then correlated to changes in Kubernetes.
OpsCruise also provides metrics that can be used by the canary stages for Go/No-go decisions for a build. CI /CD pipelines, events and artifacts have not achieved a level of standardization as yet, so support from multiple tools require custom plugins.