Three Questions - Optimization Algorithm
All the while we have been talking about Optimization Algorithm which are basically Line Search Based (GD, SGD). I have three questions:
Q1) Why cant we use Quasi-Newton Methods like BFGS which seems computationally cheap (I agree we need to calculate approximate Hessians which may be costly not sure of the trade-off).
Q2) We also want to calculate global minimums, then why not use Derivative free Algorithms e.g. Genetic Algorithm? (Again not sure of the cost)
Q3) There are also trust region based Optimization Algorithms which can be used, but not sure of
the complexity? (Not sure of the cost)
Let me give brief answers to the three questions separately, and then my main answer, which is valid for all of them, at the end.
Q1. We certainly could use BFGS, and in some cases it might bemore appropriate. However, it has not established itself as the norm in the ML community for, I imagine, mainly two reasons:
- It is a second-order optimization algorithm, as opposed to gradient descent, which is first-order. This increases drastically the computational complexity of learning as there's all those second derivatives to calculate now too, which were not present in the usual gradient descent.
- It is simply harder to understand and subsequently teach. Maybe there is a point to using more relaxed versions, say less-memory BFGS but I have never personally explored that.
Q2. Again, it's not wrong to consider, it is simply seen (significantly) less often. Take a look at the answers here:
https://stats.stackexchange.com/questions/249471/when-are-genetic-algorithms-a-good-choice-for-optimization
Q3. Same story, I would recommend reading arXiv:1703.06925.
Overall, my answer to all three questions:
I can see that you have a background in traditional optimization methods. ML is a relatively new field so there are lots of ideas that have simply not been explored well enough yet. Notice how the last paper I sent was submitted to the arXiv less than a year ago, in 2017. It is possible that some of those ideas you list above actually holds a lot of value in certain contexts and we have not realized it yet. Perhaps it could be you that fills in this gap, as you can now combine your previous mathematical knowledge with your newly-acquired ML one