Classification of machine learning: specifying thresholds


Logistic regression returns probability. You can use the returned probability “as is” (for example, the probability of users clicking on this advertisement is 0.00023), or you can convert the returned probability into a binary value (for example, this email is spam).

If a logistic regression model predicts an e-mail with a return probability of 0.9995, it means that the model predicts that the e-mail is very likely to be spam. In contrast, another email with a predicted score of 0.0003 in the same logistic regression model is likely not spam. But what if an email has a predicted score of 0.6? In order to map logistic regression values to binary categories, you must specify classification thresholds (also known asDecision threshold)。 If the value is higher than the threshold, it means “spam”; If the value is below this threshold, it means “non spam”. People tend to think that the human threshold should always be 0.5, but the threshold depends on the specific problem, so you must adjust it.

be careful: the threshold of “adjusting” logistic regression is different from adjusting super parameters such as learning rate. When selecting a threshold, you need to assess how much you will suffer from making a mistake. For example, mistakenly marking non spam as spam can be very bad. However, while it’s unpleasant to mistakenly mark spam as non spam, it shouldn’t cost you your job.

This work adoptsCC agreement, reprint must indicate the author and the link to this article


Recommended Today

SQL statement of three-level linkage of provinces, cities and counties

The first is the table creation statement Copy codeThe code is as follows: CREATE TABLE `t_address_province` ( `id` INT AUTO_ Increment primary key comment ‘primary key’,`Code ` char (6) not null comment ‘province code’,`Name ` varchar (40) not null comment ‘province name’)Engine = InnoDB default charset = utf8 comment = ‘province information table’; CREATE TABLE […]