News Regularization by gradient descent and getting rid of pesky learning rates 5 years ago • 4 min read