The lecture is part II of the cycle Cybersecurity I-IV (data protection, explainability, robustness/attacks, certification) and includes:
- Introduction, Motivation, Definitions
Taxonomy and assumptions, brief overview of ideas from different categories, representation of explanation results, saliency maps, feature importance, first applications
- Black-box (model-agnostic) explanations
Additive feature attribution method and properties, LIME and other variants, SHAP, Shapley values, from local explanation to global understanding Implementation details regarding neighborhood construction (on-manifold explanations), risks of random perturbations
- White-box (model-specific) explanations
LRP, DTD, DeepLIFT, Grad-CAM, Counterfactual explanations
- Information-theoretic explanation methods
Information decomposition, causality, theory of representation learning
- Application and Implementation
Debugging, model extraction, challenges, trade-off (e.g., explainability vs. privacy)