Custom Likelihoods in PyMC: One-Inflated Beta Regression for Loan Repayment
When a borrower takes out a personal loan, they might repay every penny, default entirely, or land anywhere in between. The interesting variable is the fraction eventually recovered: a number between 0 and 1 for each loan in the portfolio. Plot the distribution across thousands of loans and it looks like a smooth Beta curve with a tall spike bolted on at the right edge — a mass of borrowers who repaid in full. That spike is good news for the lender, but a headache for the modeller. Standard Beta regression handles continuous outcomes on (0, 1), but it cannot produce a point mass at the boundary. Logistic regression predicts a binary paid-or-not label, throwing away the partial repayment information. Neither tool fits the data you actually have. In our first PyMC post, we built hierarchical models using built-in distributions. In the second, we handled non-standard likelihoods with pm.Potential for right-censored survival data. This post takes the final step: writing a piecewise log-likelihood from scratch for a mixture of continuous and discrete components. By the end, you will construct a One-Inflated Beta (OIB) regression in PyMC, hand-code the Beta log-density, and infer how borrower characteristics drive both the probability of full repayment and the expected partial repayment fraction. Click the badge below to open the full interactive notebook: We will generate synthetic loan data for 2,000 borrowers, fit an OIB regression model, and recover the true data-generating parameters. import numpy as np import pymc as pm import pytensor.tensor as pt import arviz as az import matplotlib.pyplot as plt np.random.seed(42) # --- Generate synthetic loan data --- N = 2000 credit_score = np.random.normal(0, 1, N) loan_to_value = np.random.normal(0, 1, N) interest_rate = np.random.normal(0, 1, N) income_ratio = np.random.normal(0, 1, N) X = np.column_stack([credit_score, loan_to_value, interest_rate, income_ratio]) feature_names = ['credit_score', 'loan_to_value', 'interest_rate', 'income_ratio'] # True parameters true_psi = np.array([0.5, 0.8, -0.6, -0.4, 0.3]) # pi coefficients true_delta = np.array([0.3, 0.5, -0.3, -0.2, 0.2]) # theta coefficients true_phi = 5.0 # Beta precision # Per-loan probability of full repayment (logistic link) logit_pi = true_psi[0] + X @ true_psi[1:] pi_true = 1 / (1 + np.exp(-logit_pi)) # Per-loan mean partial repayment (logistic link) logit_theta = true_delta[0] + X @ true_delta[1:] theta_true = 1 / (1 + np.exp(-logit_theta)) # Beta shape parameters from mean-precision alpha_true = theta_true * true_phi beta_true = (1 - theta_true) * true_phi # Sample from the OIB mixture u = np.random.uniform(size=N) repayment = np.where(u 1$). Even though the switch selects the other branch, the gradient through the infinite branch corrupts the NUTS sampler. The result: 100% divergence rate. The pm.Potential approach avoids this entirely. By pre-splitting observations into fully-repaid and partial groups, the Beta density is never evaluated at the boundary. This is the same pattern we used for censored data in survival analysis: separate the observation types, compute each group's log-likelihood independently, and add them as Potential terms. The trade-off is that pm.Potential does not enable pm.sample_posterior_predictive out of the box (you need to write manual prediction code, as we did). For many production workflows, that is a minor inconvenience compared to the reliability gain. Our sampling configuration follows the original code that inspired this tutorial: 3,000 tuning steps with 1,000 posterior draws per chain. The long warm-up helps the NUTS sampler adapt its step size to the geometry of the piecewise likelihood. 4 chains for convergence diagnostics. With $\hat{R}$ and effective sample size, four chains provide reliable evidence that the sampler has explored the full posterior. target_accept=0.95 raises the acceptance threshold from the default 0.8, which reduces divergences in models with sharp likelihood boundaries. init='jitter+adapt_diag' initialises each chain near the prior mean with small random perturbations. A practical note from the original code: if covariates have very different scales (e.g., one ranges from 0 to 1 while another ranges from 0 to 200), the default jitter of roughly $\pm 1$ can push initial coefficient values far from reasonable territory. Standardising covariates beforehand, as we did, avoids this. The OIB model assumes that exactly-one observations arise from a fundamentally different process than partial observations. If instead you have data with a spike at zero (e.g., insurance claims where most customers file nothing), you want a zero-inflated model. If you have spikes at both boundaries, you need a zero-and-one-inflated Beta (ZOIB). For data with no boundary spikes at all, standard Beta regression (via MLE or Bayesian inference) is simpler and sufficient. The extra complexity of the OIB mixture is only justified when the data genuinely contains a discrete mass at the boundary. The OIB model sits at the intersection of two lines of research: Beta regression for bounded continuous data, and inflated distributions for boundary spikes. The foundation is the Beta regression model introduced by Silvia Ferrari and Francisco Cribari-Neto in their 2004 paper "Beta Regression for Modelling Rates and Proportions" (Journal of Applied Statistics, 27(7), 799-815). They observed that rates, proportions, and fractions appear everywhere in applied statistics, yet researchers typically transform them (logit, arcsine) and apply linear regression. This is problematic because the transformation distorts the error structure and complicates interpretation. Their key insight was to model the response directly as Beta-distributed, using the mean-precision parameterisation we adopted: where $0 0$ is the precision. Ferrari and Cribari-Neto showed that this is a natural exponential family model when parameterised through $\mu$, and proposed a logit link for the mean: "The proposed model is useful for situations where the variable of interest is continuous and restricted to the interval (0, 1). [...] A convenient parameterisation of the beta density in terms of the mean and a precision parameter is used." Their framework supports maximum likelihood estimation, but the Bayesian extension (which we use) adds uncertainty quantification and regularisation through priors. The connection to MLE is direct: the posterior mode of our model with flat priors equals the MLE of Ferrari and Cribari-Neto's model. The standard Beta has support on the open interval (0, 1), so it cannot assign positive probability to the boundaries 0 or 1. Raydonal Ospina and Silvia Ferrari addressed this in "Inflated Beta Distributions" (Statistical Papers, 51(1), 111-126, 2010). They defined a class of mixed continuous-discrete distributions: This is exactly the piecewise density we implemented with pm.Potential. The parameter $\pi$ controls the inflation: the probability of observing the boundary value. Ospina and Ferrari also developed zero-inflated and zero-and-one-inflated variants for different boundary patterns. "In many practical situations, the variable of interest is continuous in the open standard unit interval but may also assume the extreme values zero and/or one with positive probabilities. [...] We introduce a class of inflated beta distributions." Their work established the theoretical properties (moments, maximum likelihood estimation, score functions) that underpin our Bayesian implementation. The original MLE approach estimates $\pi$, $\mu$, and $\phi$ by maximising the log-likelihood. The Bayesian version replaces optimisation with MCMC sampling, yielding full posterior distributions rather than point estimates. This is particularly valuable for the OIB model because the piecewise likelihood creates a posterior geometry that point estimates cannot capture: the uncertainty in $\pi$ and $\theta$ is correlated, and the posterior for $\phi$ is often skewed. Where Ferrari and Cribari-Neto derived score functions by hand, we supply the log-density components to PyMC and let the NUTS sampler handle the rest. The automatic differentiation in PyTensor computes gradients through the gammaln and log operations, enabling efficient Hamiltonian Monte Carlo. The complete OIB regression procedure: For each observation $i$, compute $\pi_i = \sigma(\psi_0 + \mathbf{x}_i^\top \boldsymbol{\psi})$ (full repayment probability) Compute $\theta_i = \sigma(\delta_0 + \mathbf{x}_i^\top \boldsymbol{\delta})$ (partial repayment mean) Compute Beta shape parameters: $\alpha_i = \theta_i \phi$, $\beta_i = (1 - \theta_i) \phi$ Evaluate the piecewise log-likelihood: $\log \pi_i$ if $y_i = 1$, else $\log(1 - \pi_i) + \log \text{Beta}(y_i \mid \alpha_i, \beta_i)$ Sum across all observations and sample the posterior via NUTS The Beta regression paper: Ferrari, S. and Cribari-Neto, F. (2004). "Beta Regression for Modelling Rates and Proportions." Journal of Applied Statistics, 27(7), 799-815. Inflated distributions: Ospina, R. and Ferrari, S. (2010). "Inflated Beta Distributions." Statistical Papers, 51(1), 111-126. The PyMC CustomDist guide: PyMC documentation on custom distributions Previous in this series: Start with Hierarchical Bayesian Regression, then Bayesian Survival Analysis for the progression from built-in distributions to custom likelihoods. Distribution Explorer — Visualise the Beta distribution and other families used in this model Bayes' Theorem Calculator — Explore Bayesian reasoning interactively Hierarchical Bayesian Regression with PyMC: When Groups Share Strength — Partial pooling and group-level priors in PyMC Bayesian Survival Analysis with PyMC: Modelling Customer Churn — Another custom likelihood built in PyMC From MLE to Bayesian Inference — Why we use priors and posteriors instead of point estimates MCMC Metropolis-Hastings: An Island-Hopping Guide — The sampling engine behind PyMC Use OIB when your outcome is a fraction between 0 and 1 with a spike at the boundary value of 1. Logistic regression discards the partial repayment information by collapsing everything into a binary label. OIB preserves both the probability of full repayment and the distribution of partial repayments, giving you richer predictions. The pm.CustomDist approach evaluates both branches of the piecewise likelihood for every observation during automatic differentiation. When the Beta density is evaluated at the boundary value of 1.0, it returns negative infinity, which corrupts the NUTS sampler gradients and causes 100% divergences. Splitting observations with pm.Potential avoids evaluating the Beta density at the boundary entirely. Instead of the standard shape parameters alpha and beta, the mean-precision form uses mu (the mean, between 0 and 1) and phi (the precision, controlling spread). This is more interpretable: mu directly tells you the expected partial repayment fraction, while phi tells you how concentrated the distribution is around that mean. The standard parameters are recovered as alpha = mu * phi and beta = (1 - mu) * phi. Generate posterior predictive samples by drawing from the fitted model and comparing the resulting distribution to the observed data. The key check is whether the model reproduces both the spike at 1.0 (the proportion of fully repaid loans) and the shape of the continuous partial repayment distribution. If either component is mismatched, the model needs adjustment. Yes, but you would need a Zero-and-One-Inflated Beta (ZOIB) model. This adds a third mixture component for the spike at zero, with its own probability parameter. The piecewise likelihood gains a third branch, but the pm.Potential implementation pattern remains the same: split observations into three groups and add each group's log-likelihood separately.
