Publications | Ruqi Bai

Zeyu Zhou, Tianci Liu, Ruqi Bai, Jing Gao, Murat Kocaoglu, David I. Inouye

October 2024 NeurIPS 2024

Counterfactual Fairness by Combining Factual and Counterfactual Predictions

In high-stakes domains such as healthcare and hiring, the role of machine learning (ML) in decision-making raises significant fairness concerns. This work focuses on Counterfactual Fairness (CF), which posits that an ML model’s outcome on any individual should remain unchanged if they had belonged to a different demographic group. Previous works have proposed methods that guarantee CF. Notwithstanding, their effects on the model’s predictive performance remain largely unclear. To fill this gap, we provide a theoretical study on the inherent trade-off between CF and predictive performance in a model-agnostic manner. We first propose a simple but effective method to cast an optimal but potentially unfair predictor into a fair one with a minimal loss of performance. By analyzing the excess risk incurred by perfect CF, we quantify this inherent trade-off. Further analysis on our method’s performance with access to only incomplete causal knowledge is also conducted. Built upon this, we propose a practical algorithm that can be applied in such scenarios. Experiments on both synthetic and semi-synthetic datasets demonstrate the validity of our analysis and methods.

Ruqi Bai, Zeyu Zhou, Sean Kulinski*, Murat Kocaoglu, David I. Inouye

January 2024 ICLR 2024

Towards Characterizing Domain Counterfactuals for Invertible Latent Causal Models

Answering counterfactual queries has many important applications such as knowledge discovery and explainability, but is challenging when causal variables are unobserved and we only see a projection onto an observation space, for instance, image pixels. One approach is to recover the latent Structural Causal Model (SCM), but this typically needs unrealistic assumptions, such as linearity of the causal mechanisms. Another approach is to use na"ive ML approximations, such as generative models, to generate counterfactual samples; however, these lack guarantees of accuracy. In this work, we strive to strike a balance between practicality and theoretical guarantees by focusing on a specific type of causal query called \emph{domain counterfactuals}, which hypothesizes what a sample would have looked like if it had been generated in a different domain (or environment). Concretely, by only assuming invertibility, sparse domain interventions and access to observational data from different domains, we aim to improve domain counterfactual estimation both theoretically and practically with less restrictive assumptions. We define \emph{domain counterfactually equivalent} models and prove necessary and sufficient properties for equivalent models that provide a tight characterization of the domain counterfactual equivalence classes. Building upon this result, we prove that every equivalence class contains a model where all intervened variables are at the end when topologically sorted by the causal DAG. This surprising result suggests that a model design that only allows intervention in the last $k$ latent variables may improve model estimation for counterfactuals. We then test this model design on extensive simulated and image-based experiments which show the sparse canonical model indeed improves counterfactual estimation over baseline non-sparse models.

Ruqi Bai, Saurabh Bagchi, David I. Inouye

January 2024 ICLR 2024 Spotlight

Benchmarking Algorithms for Federated Domain Generalization

While prior federated learning (FL) methods mainly consider client heterogeneity, we focus on the Federated Domain Generalization (DG) task, which introduces train-test heterogeneity in the FL context. Existing evaluations in this field are limited in terms of the scale of the clients and dataset diversity. Thus, we propose a Federated DG benchmark that aim to test the limits of current methods with high client heterogeneity, large numbers of clients, and diverse datasets. Towards this objective, we introduce a novel data partition method that allows us to distribute any domain dataset among few or many clients while controlling client heterogeneity. We then introduce and apply our methodology to evaluate $13$ Federated DG methods, which include centralized DG methods adapted to the FL context, FL methods that handle client heterogeneity, and methods designed specifically for Federated DG on $7$ datasets. Our results suggest that, despite some progress, significant performance gaps remain in Federated DG, especially when evaluating with a large number of clients, high client heterogeneity, or more realistic datasets. Furthermore, our extendable benchmark code will be publicly released to aid in benchmarking future Federated DG approaches.

Ruqi Bai, David I. Inouye

February 2023

FedLOE Federated Domain Generalization via Locally Overfit Ensemble

In federated learning (FL), clients typically access data from just one distribution. Ideally, the learned models would generalize to out-of-distribution (OOD), i.e., domain generalization (DG). However, centralized DG methods cannot easily be adapted in the domain separation context and prior federated DG methods perform poorly when the number of clients is large. To address these challenges, we revisit the classic mixture-of-experts (MoE) idea by viewing each client as an expert on its own dataset. From this perspective, simple federated averaging can be seen as a type of iterative MoE, where the amount of local training determines the strength of each expert. Contrast to the FL communication-performance trade-off, we theoretically demonstrate that in linear cases and empirically validate in deep models that reducing communication frequency can effectively enhance DG performance, surpassing their centralized counterparts. Building on this, we further propose an additional MoE strategy to combine the client-specific classifier heads via standard DG objectives. Our proposed method empirically outperforms multiple centralized and federated baselines for domain generalization on several real datasets.