Summary of Papers

June 2025

I come across a lot of interesting papers that may or may not be directly related to my research. I will be updating this page with summaries of such papers so that I can look back and utilize the findings if necessary.

MEMO: Test Time Robustness via Adaptation and Augmentation (Neurips 2022)

The paper proposes a simple approach that can be used in any test setting where the model is probabilistic and adaptable. The idea is to apply different data augmentations of a test sample and then adapt the model parameters by minimizing the entropy of the model’s average output distribution across the augmentations. The high-level objective of this adaptation is to ensure the model makes the same predictions across several augmentations.

Is Your Neural Network at Risk? The Pitfall of Adaptive Gradient Optimizers, 2024

Summary: This paper presents an interesting observation on model robustness based on the type of gradient optimizer used during training. Their extensive empirical and theoretical analysis shows that models trained with SGD optimizer exhibit higher robustness against adversarial perturbations compared to adaptive optimizers like RMSProp and ADAM.

Why does this happen? The paper performs a frequency analysis to explain this behavior. It demonstrates that natural datasets contain certain frequencies that do not significantly impact standard generalization performance. However, these irrelevant frequencies make a model vulnerable to adversarial perturbations, depending on the optimizer used.

Feature Purification: How Adversarial Training Performs Robust Deep Learning, 2021

Summary: This paper discusses the impact of adversarial training in making models robust against adversarial perturbations. Adversarial perturbations arise due to the accumulation of dense mixtures in hidden weights during training. The goal of adversarial training is to remove such mixtures and purify hidden weights.

The paper also proves that training a model solely on natural data results in non-robustness to adversarial perturbations. Additionally, it argues that adversarial training, even with weaker attacks like FGSM, can significantly increase provable robustness against such perturbations.

Position: Explain to Question not to Justify, ICML 2024

Summary: This position paper categorizes XAI research into two groups: human/value-oriented explanations (BLUE XAI) and model/validation-oriented explanations (RED XAI). It argues that RED XAI, which focuses on questioning models, spotting bugs, and debugging, is underexplored. The authors assert that explanations should empower model developers rather than serve end-users, as AI professionals need better techniques for debugging models to ensure safe AI.

The authors also deconstruct some common fallacies in XAI:

Fallacy 1: “Interpretability is a binary concept and models can be divided into interpretable vs. black boxes.”

Although models like linear regression or decision trees are often labeled as transparent, the authors argue that this division is misleading. Even tree-based or linear models can be difficult to analyze if they involve a very large number of variables, especially in real-world applications where models handle thousands of variables.

Fallacy 2: “A single XAI silver bullet exists, and we just need to find this best method.”

Different XAI methods explain different components of a model, each making different assumptions. Therefore, no universal explanation method can serve as the ultimate solution.

Fallacy 3: “The illusion of a ‘true explanation.’”

This fallacy suggests that explanation quality can be judged solely by comparing it to expert answers. However, what if a mismatch between an explanation and ground truth is due not to a bad explanation method but to an inherently flawed model?