From Data to Diagnosis: Explaining AI Models in Sepsis Care

In the AI/ML healthcare landscape, explaining model behavior is imperative for clinical implications and practice.

I had an impactful learning experience building a sepsis prediction model with my team at NUARI. We used the Medical Information Mart forIntensive Care Database (MIMIC-IV), which includes patient data collected from Beth Israel Medical Center from 2008 to 2019. We examined patients presenting to the emergency room who were admitted to the hospital and diagnosed with sepsis upon discharge, and compared them with a similar but non-septic patient cohort. Our goal was to build a machine learning model to predict sepsis accurately.

When I started this project, I knew accuracy alone wouldn’t be enough. In medicine, transparency and understanding are just as critical as performance metrics. Doctors need tounderstand why an AI model makes certain predictions to integrate its insights into clinical decision-making. This is where AI explainability comes in.

Why AI Explainability Matters in Healthcare

Sepsis is a life-threatening condition where every minute counts — each one-hour delay in antimicrobials significantly increases patient mortality. A sepsis prediction model could alert doctors earlier, potentially saving lives. However, without knowing why the model flagged a patient as high-risk, or what factors contributed to this flag, its predictions might be dismissed or misunderstood. This is the “black box” problem: a model’s inner workings are often too complex to interpret at a glance.

To overcome this challenge, I used SHapley Additive exPlanations (SHAP) — a powerful tool for explaining AI models by breaking down individual predictions into understandable contributions.

How SHAP Helped Explain the Model

SHAP assigns each feature in a prediction a “contribution score,” showing how much that feature pushes the prediction toward a higher or lower risk of sepsis. For example, if a patient’s heart rate is elevated, SHAP might assign it a positive score, indicating it increased the model’s prediction of sepsis risk. After creating my models, I used SHAP to calculate the contribution scores of each feature.

Here's what worked well in my project: 

-Patient-Specific Insights: SHAP created force plots highlighting how individual patient features influenced risk predictions. This made it easier to explain high-risk alerts to clinicians

-Global Feature Importance: Summary plots ranked features ranked features by overall impact, showing trends like systolic blood pressure (SBP) being a key indicator of sepsis. It's important to think about whether SHAP makes sense clinically. Here, SBP as a key predictor of sepsis makes sense, because sepsis is defined by poor perfusion and low blood pressure. The SBP is also used to calculate the qSOFA score (a widely used sepsis risk calculator) to predict sepsis mortality. Other top features across all model predictions include the white blood cell count, components of the Glasgow Coma Scale (GCS), and temperature, all of which are components of either the SOFA or qSOFA scores.

SHAP Summary Plot for a Model

Here you can see an example of a SHAP summary plot generated for one of the models from this project. On the x-axis is the absolute value of the contribution score (the scores can be positive or negative) and on the y-axis is the name of the feature. The SBP is the most important feature contributing to the predictions made by this model.

Exploring Other Explainability Tools

While SHAP was central to my project, Python has several other explainability libraries worth mentioning:-

  • LIME (Local Interpretable Model-Agnostic Explanations): Similar to SHAP, LIME generates simple explanations for individual predictions. It’s quick but less comprehensive for global trends.
  • ELI5: A straightforward tool for explaining linear models and feature importances. It’s handy for debugging but lacks deep interpretability for complex models.
  • Captum: PyTorch users can leverage Captum for integrated gradients and deep learning explainability.
  • Skater: A versatile library supporting rule-based explanations and feature attribution methods.

Lessons Learned on AI Explainability

Explainability is more than a technical feature — it builds trust between AI developers and clinicians. In healthcare, where decisions are high-stakes, models must not only be accurate but also transparent.

For my sepsis prediction project, using SHAP didn’t just make the AI model explainable — it made it useful. The insights helped improve the model’s design, uncover biases, and make its recommendations clearer for healthcare teams. This approach turned a complex AI system into a trusted clinical tool.

Have you used explainability tools in your projects? Let me know what’s worked for you and what challenges you’ve faced!