For an all-in-one reference, I just recently found a textbook that I’ve found to be very satisfactory, but it does gloss over some of the finer aspects of Causality that aren’t necessarily related to ML. It’s called “Elements of Causal Inference: Foundations and Learning Algorithms” by Jonas Peters.
For Causality and Causal Inference introduction, application to stats and a comprehensive reference to the mathematics, I recommend reading “The Book of Why” by Judea Pearl, then “Causal Inference in Statistics: A Primer”, and “Causality: Models, Reasoning and Inference (2nd ed)” by the same author.
One of the main books the ML one draws its work from is “Probabilistic Reasoning in Intelligent Systems” by Judea Pearl.
Here are the links to the books:
https://www.amazon.com/gp/aw/d/B075CR9QBJ/ref=tmm_kin_title_0?ie=UTF8&qid=1585529211&sr=8-3
The text by Daphne Koller I hear is also good, but haven’t looked at it myself.
Based on what you said, you’ve got more than enough mathematics under your belt to get through all of these texts. CI math isn’t really that hard actually. It’s basically just Structural Equation Models and Bayesian Networks erected on top of the very basics of Graph Theory.
I hope this helps!