...
adaptive loss function by using the maximum likelihood estimation. We should maximize the likelihood of the probability distribution or minimize the
negative log-likelihood.
\[
NLLNLL(x, \alpha) = min|_\theta,\alpha \rho(x, \alpha) + log Z(\alpha)
\]