chain rule (calculus)
Chain rule
Chain rule is a formula that expresses the derivative of the composition of two differentiable functions \[f\] and \[g\] in terms of the derivatives of \[f\] and \[g\]. More precisely, if \[h(x)=f(g(x))\] for every \[x\], then the chain rule is \[h^{\prime}=f^{\prime}(g(x))\cdot g^{\prime}(x)\]. To make the notation \[h^{\prime}=f^{\prime}(g(x))\cdot g^{\prime}(x)\] clearer, it can be written as \[\frac{df(g(x))}{dx}=\frac{df(g(x))}{dg(x)}\cdot \frac{dg(x)}{dx}\]. Do note, we cannot treat this as a fraction and cancel out the \[d(g(x))\], see differentiation notations for reason.
Intuitively, the chain rule states that knowing the instantaneous rate of change of \[z\] relative to \[y\] and that of \[y\] relative to \[x\] allows one to calculate the instantaneous rate of change of \[z\] relative to \[x\] as the product of the two rates of change. As put by George F. Simmons: "If a car travels twice as fast as a bicycle and the bicycle is four times as fast as a walking man, then the car travels eight times as fast as the man.", i.e. \[\frac{dz}{dx}=\frac{dz}{dy}\cdot \frac{dy}{dx}=2\cdot 4=8\].
Another example would be to consider \[y=\sin (x^{2}+1)\], which can be seperated into \[y=\sin(g(x)),g(x)=x^{2}+1\], thus we can solve it by applying \[y^{\prime}=f^{\prime}(g(x))\cdot g^{\prime}(x)=\cos(x^{2}+1)\cdot 2x\].
Proof
Suppose \[(f\circ g)^{\prime}(a)=f^{\prime}(g(a))\cdot g^{\prime}(a)\].
Do note that \[\left( f\circ g \right)^{\prime}(a)\ne \left( f(g(a)) \right)^{\prime}\] as the RHS of the equation implies we're taking the derivative of \[f(g(a))\], which is a constant. Instead, \[\left( f\circ g \right)^{\prime}(a)\] implies the derivative of \[f(g(x))\], evaluated at \[x=a\], or \[\left[ \frac{d}{dx}f(g(x)) \right]_{x=a}\].
Note that \[\Delta_{h}\to0\] as \[h\to0\], the reason is that \[\lim_{h\to0}g(a+h)-g(a)\] approaches zero when \[h\to0\]. Note that the rate in which \[h\] is approaches zero is different compared to \[\Delta_{h}\].
Now, we have two quantities approaching zero at different rates, which is a little bit of a headache. Thus, we can cleverly insert \[\frac{\Delta_{h}}{h}\], which translates to \[\frac{g(a+h)-g(a)}{h}\]. That looks like something rather familiar.
Here, it may look like we have proved the chain rule. However, there are two problems, first being the replacement of \[\lim_{h\to0}\] with \[\lim_{\Delta_{h}\to0}\] and the case where if \[\Delta_{h}=0\] (as you cannot divide by zero).
To resolve both these issues, we alter the proof by a little. We rewrite \[(f\circ g)^{\prime}(a)=\lim_{h\to0}\frac{f(g(a)+\Delta_{h})-f(g(a))}{h}\] in terms of \[\Delta_{h}\] and define \[f(g(a)+\Delta_{h})-f(g(a))=\Phi(h)\cdot \Delta_{h}\]. Note: \[\Phi(h)\] is just another symbol, like \[f(h)\], but to prevent confusion as \[f\] is already defined.
\[\Phi(h)\] is defined as:
Now plugging our second equation into our first equation,
The last thing here is to prove \[\lim_{h\to0}\Phi(h)=f^{\prime}(g(a))\]. Now since, we know that \[\lim_{h\to0}\Phi(h)=\lim_{h\to0}\frac{f(g(a)+\Delta_{h})-f(g(a))}{\Delta_{h}}\], and as established above, \[\Delta_{h}\to0\] as \[h\to0\], thus \[\lim_{h\to0}\Phi(h)=f^{\prime}(g(a))\].
Therefore, we have proven that \[(f\circ g)^{\prime}(a)=f^{\prime}(g(a))\cdot g^{\prime}(a)\].