
Microservices and the Myth of Fault Isolation
Microservices do not automatically deliver fault isolation by design. They replace one obvious forest fire with a sprawling network of subtle, cascading brush fires.
Microservices do not automatically deliver fault isolation by design. They replace one obvious forest fire with a sprawling network of subtle, cascading brush fires.
Severin shares insights into his career path, including his involvement with AppDynamics and Cisco, and his current role at Causely, where he focuses on OpenTelemetry and causal reasoning for root cause analysis.
This article has been reposted with permission from CIO Dive.
When a provider slows down, Causely shows exactly how the impact ripples across your services and identifies the external API as the root cause.
Causal reasoning with AI agents enable proactive incident prevention, automated remediation, and a path toward autonomous service reliability.
We’ll recap OTel logging best practices, explore how to use logs effectively in troubleshooting without drowning in data, walk through a tutorial workflow you can apply today, and show how Causely operationalizes this approach automatically at scale.
This post explores four architecture patterns where standalone Docker is not only justified but recommended.
Watch the video to see how Causely turns “Lag High” chaos into confident, informed action in seconds.
Most developers use automatic instrumentation without knowing how it actually works. This post breaks down the key techniques behind it—not to build your own, but to understand what’s really happening when things "just work."
In this short video, we show how Causely pinpoints the exact code change that triggered cascading performance issues — without requiring you to sift through logs or build custom dashboards.
More telemetry doesn’t guarantee more understanding. In many cases, it gives you the illusion of control while silently eroding your ability to reason about the system.
In 'Rethinking Reliability for Distributed Systems,' Endre Sara shared a common story: a large-scale customer, running mature microservices in Kubernetes with full observability coverage, still struggles to understand what’s broken during a high-stakes business event.
Causely product
In this short demo, we show how Ask Causely shifts incident response from a fire drill to a focused, high-context workflow.
observability
A few weeks back, I joined Charity Majors, Paige Cruz, Avi Freedman, Shahar Azulay, and Adam LaGreca for a roundtable on the state of modern observability. It was an honest conversation about where we are, what’s broken, and where things are heading. You can read the full summary on
Causely product
Grafana gives teams the power to visualize everything - but on Day 0, when your dashboards are live and alerts start firing, what your team really needs is clarity. That’s why we built the new Causely plugin for Grafana. In just minutes, Causely connects to your telemetry sources and
Blog
“Root Cause Analysis” (RCA) is one of the most overloaded terms in modern engineering. Some call a tagged log line RCA. Others label time-series correlation dashboards or AI-generated summaries as RCA. Some reduce noise by filtering or hiding secondary and cascading alarms. And recently large language models (LLMs) have entered
Causality
When it comes to observability and IT operations, our goal should be to get humans out of the loop as much as possible.
integration
With Causely, you can see the why behind what’s happening without having to leave your Grafana interface.
Webinar
“You actually cannot do meaningful reasoning especially when it comes to root cause analysis with LLMs or machine learning alone. You need more than that.” -Shmuel Kliger, Founder of Causely
Causely product
A version upgrade. A schema change. And suddenly, a critical service stalls. MySQL 8’s hidden metadata locking behavior has tripped up even the most prepared teams. We captured this knowledge — and now, Causely can pinpoint it. If you’ve learned about how Causely works, you already know that our
Causality
Assuring service reliability is the most critical goal of IT. It was never easy, and it is getting increasingly complex as businesses require greater speed, agility, and scalability to stay competitive and respond quickly to changing market demands. These needs are driving the adoption of microservices architectures, enabling organizations to
Blog
At Causely, we don’t just ship software – we run a reasoning platform designed to detect, diagnose, and resolve failure conditions with minimal human intervention. Our own cloud-native application runs in a highly distributed environment, with dozens of interdependent microservices communicating in real-time. It’s complex, dynamic, and constantly evolving—
Blog
Implementing OpenTelemetry at the core of our observability strategy for Causely’s SaaS product was a natural decision. This post shares context on our rationale and how the combination of OpenTelemetry and causal reasoning underpin our platform.
DevOps & SRE
In this DevOps Toolkit episode, Endre Sara joins Viktor Farcic for an Ask Me Anything session.