The Complexity Tax: Why We Build Models We’ll Never Deploy

At Nomis, we’ve developed a practice that might seem counterintuitive: we routinely build the most sophisticated models possible, knowing from the start that we’ll never put them into production.

This isn’t wasted effort. It’s the foundation of how we deliver models that actually work in banking, where “work” means something very specific. A model that performs brilliantly but can’t survive a conversation with your model risk management team isn’t a model you can use. And when examiners ask why a customer was flagged for a retention offer, “the neural network said so” isn’t an answer.

What we’ve learned is that the path to the best deployable model runs directly through the models you’d never deploy. Here’s why.

The Complexity Tax

When we take on a new modeling engagement—say, predicting which deposit customers are at risk of attrition—we don’t start with the interpretable models that will ultimately go into production. We start at the other end: convolutional neural networks, gradient-boosted trees, transformer architectures. Models with hundreds of thousands of parameters that can detect patterns no human would ever specify.

The results are striking. In a recent engagement with a regional bank, our neural network models achieved validation AUCs above 0.82. The top decile of scored customers had a 45% event rate, meaning nearly half of the customers the model flagged as highest-risk actually did shrink their relationships. The discrimination ratio between the highest and lowest risk deciles exceeded 100x.

These are exceptional results. They’re also results we knew we couldn’t ship.
The logistic regression we ultimately delivered achieved a validation AUC of 0.72. Strong performance, but meaningfully below what the neural network could do. In the top decile, about 35% of flagged customers experienced the outcome—still powerful discrimination, but a 10-percentage-point gap from what was theoretically achievable.

We call this gap the complexity tax. It’s the performance you leave on the table when you choose interpretability over raw predictive power.

The chart on the right makes this concrete. Looking at how many actual attrition events each model captures in its top-scored customers, the neural network reaches 75-80% coverage by the third decile. The logistic regression gets to about 60%. That 20-percentage-point gap—the shaded area—is the complexity tax.

Why Banks Pay It

If neural networks perform so much better, why would any rational institution choose the simpler model?

Because banking isn’t an academic exercise. Every model deployed for customer-facing decisions needs to satisfy three constituencies: the business users who act on its outputs, the risk teams who validate its soundness, and the regulators who examine its fairness and reliability.
A neural network with 100,000 parameters can tell you that Customer ABC has a 73% probability of attrition. It cannot tell you why. The model has learned some complex interaction of balance patterns, transaction behaviors, and account characteristics, but that learning is distributed across layers of mathematical transformations that resist human interpretation.

A logistic regression with 15 variables can tell you that Customer ABC’s risk score is elevated because their balance volatility over the past 12 months is in the top quintile, they don’t hold a term deposit, and their checking balance has declined in six of the last nine months. A relationship manager can look at that explanation and say, “Yes, that makes sense. I should call them.” A model validator can assess whether those drivers are conceptually sound. An examiner can evaluate whether the model is treating customers appropriately.
This is the tradeoff: the neural network is more right, but the logistic regression is more usable.

Why We Build Both

If we’re going to deliver a logistic regression anyway, why spend the time on neural networks?
Three reasons.

First, the neural network establishes the performance ceiling. Without it, we don’t know whether our logistic regression is capturing most of the available signal or leaving significant performance on the table. When we deliver a model with a 0.72 AUC, we can tell our client with confidence: “A substantially more complex approach would get you to 0.82, but you’d lose explainability. You’re getting about 85% of the theoretical maximum while maintaining full transparency.” That context changes how the model is perceived and used.

Second, the complex models guide our feature engineering. Neural networks automatically learn interactions and non-linearities that we’d never think to specify manually. By studying which feature domains drive performance in the black-box models—through techniques like ablation studies and feature importance analysis—we gain insight into what the simpler models should include. In the regional bank engagement, our neural network exploration revealed that multi-scale temporal patterns (how customers behaved at 3-month, 6-month, and 12-month horizons) carried significant signal. We translated that insight into engineered features for the logistic regression: balance trajectory over time, volatility measures at different lookback windows, momentum indicators. The interpretable model improved because the uninterpretable one showed us where to look.

Third, it future-proofs the relationship. Model risk appetite is evolving. Institutions that five years ago would only accept logistic regression are now having serious conversations about gradient-boosted trees. By building the full model stack, we’re prepared when a client says, “We’re comfortable with more complexity now—what would we gain?” We don’t have to restart from scratch; we have the answer ready.

Our Process

We’ve formalized this into a structured workflow:

Feature exploration: Build initial models across multiple architectures using raw and lightly engineered features. Establish baseline performance and identify which data domains carry signal.
Ceiling establishment: Train sophisticated models (neural networks, XGBoost) to find the upper bound of what’s achievable given the available data. This becomes the benchmark against which we measure everything else.
Signal extraction: Analyze the complex models to understand what’s driving their performance. Which features matter most? What time horizons are predictive? Where are the non-linearities?
Interpretable translation: Engineer features for logistic regression informed by what we learned from the complex models. Test binning strategies, interaction effects, and variable transformations.
Production model selection: Deliver a model that balances performance, interpretability, and stability. Document the complexity tax explicitly so stakeholders understand what they’re getting, and what they’re choosing to forgo.

In a typical engagement, we evaluate 300-400 model variants across this process. The final deliverable might have 12-20 coefficients, but those coefficients were selected from a search space that only becomes visible through the more complex approaches.

Looking Ahead

The performance-interpretability tradeoff isn’t static. Techniques for explaining complex models are improving. Regulatory guidance is evolving. We’re increasingly hearing from banking partners that they’re open to making pragmatic moves along this spectrum and accepting somewhat less interpretability for meaningfully better performance.

Behind the scenes, we’ve been shifting our workflows and technologies to better support this evolution. The model development infrastructure that lets us explore neural networks today positions us to deliver more sophisticated production models tomorrow, when and where institutions are ready for them.

But the core insight won’t change: you can’t make an informed tradeoff if you don’t know what you’re trading off against. Building models we’ll never deploy is how we ensure our clients always know.

To learn more about how Nomis approaches moving beyond AI pilots to deployments that deliver real customer value. register for our upcoming virtual fireside chat on February 4th at 2 PM ET: Beyond the Hype: Deploying AI that Delivers Customer Value: https://thefinancialbrand.com/banking-webinars/beyond-the-hype-deploying-ai-that-delivers-customer-value

To learn more about Nomis’ approach to behavioral modeling and price optimization, contact us at sales@nomissolutions.com or visit nomissolutions.com.

Share this post

By Wes West, Chief Analytics Officer

The Complexity Tax

Why Banks Pay It

Why We Build Both

Our Process

Looking Ahead

Schedule a demo