The MCE (Minimum Classification Error) criterion has been commonly used for discriminative learning but there is intrinsic difficulty in applying it to gradient descent methods. As the complete description of classification performance is given by the error-reject tradeoff, we augment the MCE criterion to include not only error but also reject and show that it leads to a smooth loss function which is suitable for gradient descent methods.
The proposed criterion provides a quantitative justification for the loss function in terms of the classification performance. The loss function is adaptively optimized based on the empirical distribution of the classifier output at each iteration of the learning procedure.
Since the proposed method does not need any manual parameter tuning, it is exempt from time consuming trial and error. Nevertherless, experimental results show that the results of the proposed method are better than those of the MCE method with the best tuned parameters.
A comparison with the MMI (Maximum Mutual Information) criterion shows that the proposed criterion has better outlier resistance than that of the MMI.