Philosophical question on logistic regression: why isn't the optimal threshold value trained? ...
What to do with someone that cheated their way through university and a PhD program?
Is it acceptable to use working hours to read general interest books?
Why didn't the Space Shuttle bounce back into space as many times as possible so as to lose a lot of kinetic energy up there?
How do I reattach a shelf to the wall when it ripped out of the wall?
Unknown code in script
How exactly does Hawking radiation decrease the mass of black holes?
Contradiction proof for inequality of P and NP?
Why do games have consumables?
How much of a wave function must reside inside event horizon for it to be consumed by the black hole?
First instead of 1 when referencing
How does the mezzoloth's teleportation work?
How to find if a column is referenced in a computed column?
Has a Nobel Peace laureate ever been accused of war crimes?
`microtype`: Set Minimum Width of a Space
Can I criticise the more senior developers around me for not writing clean code?
All ASCII characters with a given bit count
Are there moral objections to a life motivated purely by money? How to sway a person from this lifestyle?
What is the term for a person whose job is to place products on shelves in stores?
As an international instructor, should I openly talk about my accent?
The weakest link
Raising a bilingual kid. When should we introduce the majority language?
Mistake in years of experience in resume?
A faster way to compute the largest prime factor
Co-worker works way more than he should
Philosophical question on logistic regression: why isn't the optimal threshold value trained?
Unicorn Meta Zoo #1: Why another podcast?
Announcing the arrival of Valued Associate #679: Cesar ManaraWhy is accuracy not the best measure for assessing classification models?Why isn't Logistic Regression called Logistic Classification?Classification probability thresholdIs accuracy an improper scoring rule in a binary classification setting?ROC and false positive rate with over samplingGEE Logistic Model with Subject Specific Predictions?How to find the optimal cp value in rpart doing cross validation manually?Optimal cut-off calculation in logistic regressionDo I do threshold selection for my logit model on the testing or training subset?ROC curves from cross-validation are identical/overlaid and AUC is the same for each foldTurning Roc curve threshold by cross validationDetermine the cutoff threshold for binary classification models using cross validationHow are the training and cross-validation metrics calculated in H2O?Is it valid to use ROC calculated during test/validation to interpret results of final production model?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
Usually in logistic regression, we fit a model and get some predictions on the training set. We then cross-validate on those training predictions (something like here) and decide the optimal threshold value based on something like the ROC curve.
Why don't we incorporate cross-validation of the threshold INTO the actual model, and train the whole thing end-to-end?
logistic cross-validation optimization roc threshold
$endgroup$
add a comment |
$begingroup$
Usually in logistic regression, we fit a model and get some predictions on the training set. We then cross-validate on those training predictions (something like here) and decide the optimal threshold value based on something like the ROC curve.
Why don't we incorporate cross-validation of the threshold INTO the actual model, and train the whole thing end-to-end?
logistic cross-validation optimization roc threshold
$endgroup$
$begingroup$
Possible duplicate of Classification probability threshold
$endgroup$
– EdM
1 hour ago
2
$begingroup$
That thread is certainly related, but I wouldn't call it a duplicate.
$endgroup$
– gung♦
1 hour ago
add a comment |
$begingroup$
Usually in logistic regression, we fit a model and get some predictions on the training set. We then cross-validate on those training predictions (something like here) and decide the optimal threshold value based on something like the ROC curve.
Why don't we incorporate cross-validation of the threshold INTO the actual model, and train the whole thing end-to-end?
logistic cross-validation optimization roc threshold
$endgroup$
Usually in logistic regression, we fit a model and get some predictions on the training set. We then cross-validate on those training predictions (something like here) and decide the optimal threshold value based on something like the ROC curve.
Why don't we incorporate cross-validation of the threshold INTO the actual model, and train the whole thing end-to-end?
logistic cross-validation optimization roc threshold
logistic cross-validation optimization roc threshold
edited 19 mins ago
StatsSorceress
asked 1 hour ago
StatsSorceressStatsSorceress
16218
16218
$begingroup$
Possible duplicate of Classification probability threshold
$endgroup$
– EdM
1 hour ago
2
$begingroup$
That thread is certainly related, but I wouldn't call it a duplicate.
$endgroup$
– gung♦
1 hour ago
add a comment |
$begingroup$
Possible duplicate of Classification probability threshold
$endgroup$
– EdM
1 hour ago
2
$begingroup$
That thread is certainly related, but I wouldn't call it a duplicate.
$endgroup$
– gung♦
1 hour ago
$begingroup$
Possible duplicate of Classification probability threshold
$endgroup$
– EdM
1 hour ago
$begingroup$
Possible duplicate of Classification probability threshold
$endgroup$
– EdM
1 hour ago
2
2
$begingroup$
That thread is certainly related, but I wouldn't call it a duplicate.
$endgroup$
– gung♦
1 hour ago
$begingroup$
That thread is certainly related, but I wouldn't call it a duplicate.
$endgroup$
– gung♦
1 hour ago
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
It isn't because logistic regression isn't a classifier (cf., Why isn't Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, $p$, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.
$endgroup$
$begingroup$
Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
$endgroup$
– StatsSorceress
1 hour ago
1
$begingroup$
You certainly could (@Sycorax's answer speaks to that possibility). But because that isn't what LR itself is, but rather some ad hoc augmentation, you would need to code up the full optimization scheme yourself. Note BTW, that Frank Harrell has pointed out that process will lead to what might be considered an inferior model by many standards.
$endgroup$
– gung♦
1 hour ago
$begingroup$
Hmm. I read the accepted answer in the related question here, and I agree with it in theory, but sometimes in machine learning classification applications we don't care about the relative error types, we just care about "correct classification". In that case, could you train end-to-end as I describe?
$endgroup$
– StatsSorceress
55 mins ago
add a comment |
$begingroup$
It's because the optimal threshold is not only a function of the true positive rate (TPR), the false positive rate (FPR), accuracy or whatever else. The other crucial ingredient is the cost and the payoff of correct and wrong decisions.
If your target is a common cold, your response to a positive test is to prescribe two aspirin, and the cost of a true untreated positive is an unnecessary two days' worth of headaches, then your optimal decision (not classification!) threshold is quite different than if your target is some life-threatening disease, and your decision is (a) some comparatively simple procedure like an appendectomy, or (b) a major intervention like months of chemotherapy! And note that although your target variable may be binary (sick/healthy), your decisions may have more values (send home with two aspirin/run more tests/admit to hospital and watch/operate immediately).
Bottom line: if you know your cost structure and all the different decisions, you can certainly train a decision support system (DSS) directly, which includes a probabilistic classification or prediction. I would, however, strongly argue that discretizing predictions or classifications via thresholds is not the right way to go about this.
See also my answer to the earlier "Classification probability threshold" thread. Or this answer of mine. Or that one.
$endgroup$
add a comment |
$begingroup$
Regardless of the underlying model, we can work out the sampling distributions of TPR and FPR at a threshold. This implies that we can characterize the variability in TPR and FPR at some threshold, and we can back into a desired error rate trade-off.
A ROC curve is a little bit deceptive because the only thing that you control is the threshold, however the plot displays TPR and FPR, which are functions of the threshold. Moreover, the TPR and FPR are both statistics, so they are subject to the vagaries of random sampling. This implies that if you were to repeat the procedure (say by cross-validation), you could come up with a different FPR and TPR at some specific threshold value.
However, if we can estimate the variability in the TPR and FPR, then repeating the ROC procedure is not necessary. We just pick a threshold such that the endpoints of a confidence interval (with some width) are acceptable. That is, pick the model so that the FPR is plausibly below some researcher-specified maximum, and/or the TPR is plausibly above some researcher-specified minimum. If your model can't attain your targets, you'll have to build a better model.
Of course, what TPR and FPR values are tolerable in your usage will be context-dependent.
For more information, see ROC Curves for Continuous Data
by Wojtek J. Krzanowski and David J. Hand.
$endgroup$
$begingroup$
This doesn't really answer my question, but it's a very nice description of ROC curves.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
I was asking why we don't train the threshold instead of choosing it after training the model.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
How would you train a threshold?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
$endgroup$
– StatsSorceress
1 hour ago
|
show 5 more comments
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405041%2fphilosophical-question-on-logistic-regression-why-isnt-the-optimal-threshold-v%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
It isn't because logistic regression isn't a classifier (cf., Why isn't Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, $p$, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.
$endgroup$
$begingroup$
Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
$endgroup$
– StatsSorceress
1 hour ago
1
$begingroup$
You certainly could (@Sycorax's answer speaks to that possibility). But because that isn't what LR itself is, but rather some ad hoc augmentation, you would need to code up the full optimization scheme yourself. Note BTW, that Frank Harrell has pointed out that process will lead to what might be considered an inferior model by many standards.
$endgroup$
– gung♦
1 hour ago
$begingroup$
Hmm. I read the accepted answer in the related question here, and I agree with it in theory, but sometimes in machine learning classification applications we don't care about the relative error types, we just care about "correct classification". In that case, could you train end-to-end as I describe?
$endgroup$
– StatsSorceress
55 mins ago
add a comment |
$begingroup$
It isn't because logistic regression isn't a classifier (cf., Why isn't Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, $p$, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.
$endgroup$
$begingroup$
Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
$endgroup$
– StatsSorceress
1 hour ago
1
$begingroup$
You certainly could (@Sycorax's answer speaks to that possibility). But because that isn't what LR itself is, but rather some ad hoc augmentation, you would need to code up the full optimization scheme yourself. Note BTW, that Frank Harrell has pointed out that process will lead to what might be considered an inferior model by many standards.
$endgroup$
– gung♦
1 hour ago
$begingroup$
Hmm. I read the accepted answer in the related question here, and I agree with it in theory, but sometimes in machine learning classification applications we don't care about the relative error types, we just care about "correct classification". In that case, could you train end-to-end as I describe?
$endgroup$
– StatsSorceress
55 mins ago
add a comment |
$begingroup$
It isn't because logistic regression isn't a classifier (cf., Why isn't Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, $p$, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.
$endgroup$
It isn't because logistic regression isn't a classifier (cf., Why isn't Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, $p$, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.
answered 1 hour ago
gung♦gung
110k34268539
110k34268539
$begingroup$
Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
$endgroup$
– StatsSorceress
1 hour ago
1
$begingroup$
You certainly could (@Sycorax's answer speaks to that possibility). But because that isn't what LR itself is, but rather some ad hoc augmentation, you would need to code up the full optimization scheme yourself. Note BTW, that Frank Harrell has pointed out that process will lead to what might be considered an inferior model by many standards.
$endgroup$
– gung♦
1 hour ago
$begingroup$
Hmm. I read the accepted answer in the related question here, and I agree with it in theory, but sometimes in machine learning classification applications we don't care about the relative error types, we just care about "correct classification". In that case, could you train end-to-end as I describe?
$endgroup$
– StatsSorceress
55 mins ago
add a comment |
$begingroup$
Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
$endgroup$
– StatsSorceress
1 hour ago
1
$begingroup$
You certainly could (@Sycorax's answer speaks to that possibility). But because that isn't what LR itself is, but rather some ad hoc augmentation, you would need to code up the full optimization scheme yourself. Note BTW, that Frank Harrell has pointed out that process will lead to what might be considered an inferior model by many standards.
$endgroup$
– gung♦
1 hour ago
$begingroup$
Hmm. I read the accepted answer in the related question here, and I agree with it in theory, but sometimes in machine learning classification applications we don't care about the relative error types, we just care about "correct classification". In that case, could you train end-to-end as I describe?
$endgroup$
– StatsSorceress
55 mins ago
$begingroup$
Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
Okay, I understand that part of the theory (thank you for that eloquent explanation!) but why can't we incorporate the classification aspect into the model? That is, why can't we find p, then find the threshold, and train the whole thing end-to-end to minimize some loss?
$endgroup$
– StatsSorceress
1 hour ago
1
1
$begingroup$
You certainly could (@Sycorax's answer speaks to that possibility). But because that isn't what LR itself is, but rather some ad hoc augmentation, you would need to code up the full optimization scheme yourself. Note BTW, that Frank Harrell has pointed out that process will lead to what might be considered an inferior model by many standards.
$endgroup$
– gung♦
1 hour ago
$begingroup$
You certainly could (@Sycorax's answer speaks to that possibility). But because that isn't what LR itself is, but rather some ad hoc augmentation, you would need to code up the full optimization scheme yourself. Note BTW, that Frank Harrell has pointed out that process will lead to what might be considered an inferior model by many standards.
$endgroup$
– gung♦
1 hour ago
$begingroup$
Hmm. I read the accepted answer in the related question here, and I agree with it in theory, but sometimes in machine learning classification applications we don't care about the relative error types, we just care about "correct classification". In that case, could you train end-to-end as I describe?
$endgroup$
– StatsSorceress
55 mins ago
$begingroup$
Hmm. I read the accepted answer in the related question here, and I agree with it in theory, but sometimes in machine learning classification applications we don't care about the relative error types, we just care about "correct classification". In that case, could you train end-to-end as I describe?
$endgroup$
– StatsSorceress
55 mins ago
add a comment |
$begingroup$
It's because the optimal threshold is not only a function of the true positive rate (TPR), the false positive rate (FPR), accuracy or whatever else. The other crucial ingredient is the cost and the payoff of correct and wrong decisions.
If your target is a common cold, your response to a positive test is to prescribe two aspirin, and the cost of a true untreated positive is an unnecessary two days' worth of headaches, then your optimal decision (not classification!) threshold is quite different than if your target is some life-threatening disease, and your decision is (a) some comparatively simple procedure like an appendectomy, or (b) a major intervention like months of chemotherapy! And note that although your target variable may be binary (sick/healthy), your decisions may have more values (send home with two aspirin/run more tests/admit to hospital and watch/operate immediately).
Bottom line: if you know your cost structure and all the different decisions, you can certainly train a decision support system (DSS) directly, which includes a probabilistic classification or prediction. I would, however, strongly argue that discretizing predictions or classifications via thresholds is not the right way to go about this.
See also my answer to the earlier "Classification probability threshold" thread. Or this answer of mine. Or that one.
$endgroup$
add a comment |
$begingroup$
It's because the optimal threshold is not only a function of the true positive rate (TPR), the false positive rate (FPR), accuracy or whatever else. The other crucial ingredient is the cost and the payoff of correct and wrong decisions.
If your target is a common cold, your response to a positive test is to prescribe two aspirin, and the cost of a true untreated positive is an unnecessary two days' worth of headaches, then your optimal decision (not classification!) threshold is quite different than if your target is some life-threatening disease, and your decision is (a) some comparatively simple procedure like an appendectomy, or (b) a major intervention like months of chemotherapy! And note that although your target variable may be binary (sick/healthy), your decisions may have more values (send home with two aspirin/run more tests/admit to hospital and watch/operate immediately).
Bottom line: if you know your cost structure and all the different decisions, you can certainly train a decision support system (DSS) directly, which includes a probabilistic classification or prediction. I would, however, strongly argue that discretizing predictions or classifications via thresholds is not the right way to go about this.
See also my answer to the earlier "Classification probability threshold" thread. Or this answer of mine. Or that one.
$endgroup$
add a comment |
$begingroup$
It's because the optimal threshold is not only a function of the true positive rate (TPR), the false positive rate (FPR), accuracy or whatever else. The other crucial ingredient is the cost and the payoff of correct and wrong decisions.
If your target is a common cold, your response to a positive test is to prescribe two aspirin, and the cost of a true untreated positive is an unnecessary two days' worth of headaches, then your optimal decision (not classification!) threshold is quite different than if your target is some life-threatening disease, and your decision is (a) some comparatively simple procedure like an appendectomy, or (b) a major intervention like months of chemotherapy! And note that although your target variable may be binary (sick/healthy), your decisions may have more values (send home with two aspirin/run more tests/admit to hospital and watch/operate immediately).
Bottom line: if you know your cost structure and all the different decisions, you can certainly train a decision support system (DSS) directly, which includes a probabilistic classification or prediction. I would, however, strongly argue that discretizing predictions or classifications via thresholds is not the right way to go about this.
See also my answer to the earlier "Classification probability threshold" thread. Or this answer of mine. Or that one.
$endgroup$
It's because the optimal threshold is not only a function of the true positive rate (TPR), the false positive rate (FPR), accuracy or whatever else. The other crucial ingredient is the cost and the payoff of correct and wrong decisions.
If your target is a common cold, your response to a positive test is to prescribe two aspirin, and the cost of a true untreated positive is an unnecessary two days' worth of headaches, then your optimal decision (not classification!) threshold is quite different than if your target is some life-threatening disease, and your decision is (a) some comparatively simple procedure like an appendectomy, or (b) a major intervention like months of chemotherapy! And note that although your target variable may be binary (sick/healthy), your decisions may have more values (send home with two aspirin/run more tests/admit to hospital and watch/operate immediately).
Bottom line: if you know your cost structure and all the different decisions, you can certainly train a decision support system (DSS) directly, which includes a probabilistic classification or prediction. I would, however, strongly argue that discretizing predictions or classifications via thresholds is not the right way to go about this.
See also my answer to the earlier "Classification probability threshold" thread. Or this answer of mine. Or that one.
answered 1 hour ago
Stephan KolassaStephan Kolassa
48.4k8102182
48.4k8102182
add a comment |
add a comment |
$begingroup$
Regardless of the underlying model, we can work out the sampling distributions of TPR and FPR at a threshold. This implies that we can characterize the variability in TPR and FPR at some threshold, and we can back into a desired error rate trade-off.
A ROC curve is a little bit deceptive because the only thing that you control is the threshold, however the plot displays TPR and FPR, which are functions of the threshold. Moreover, the TPR and FPR are both statistics, so they are subject to the vagaries of random sampling. This implies that if you were to repeat the procedure (say by cross-validation), you could come up with a different FPR and TPR at some specific threshold value.
However, if we can estimate the variability in the TPR and FPR, then repeating the ROC procedure is not necessary. We just pick a threshold such that the endpoints of a confidence interval (with some width) are acceptable. That is, pick the model so that the FPR is plausibly below some researcher-specified maximum, and/or the TPR is plausibly above some researcher-specified minimum. If your model can't attain your targets, you'll have to build a better model.
Of course, what TPR and FPR values are tolerable in your usage will be context-dependent.
For more information, see ROC Curves for Continuous Data
by Wojtek J. Krzanowski and David J. Hand.
$endgroup$
$begingroup$
This doesn't really answer my question, but it's a very nice description of ROC curves.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
I was asking why we don't train the threshold instead of choosing it after training the model.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
How would you train a threshold?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
$endgroup$
– StatsSorceress
1 hour ago
|
show 5 more comments
$begingroup$
Regardless of the underlying model, we can work out the sampling distributions of TPR and FPR at a threshold. This implies that we can characterize the variability in TPR and FPR at some threshold, and we can back into a desired error rate trade-off.
A ROC curve is a little bit deceptive because the only thing that you control is the threshold, however the plot displays TPR and FPR, which are functions of the threshold. Moreover, the TPR and FPR are both statistics, so they are subject to the vagaries of random sampling. This implies that if you were to repeat the procedure (say by cross-validation), you could come up with a different FPR and TPR at some specific threshold value.
However, if we can estimate the variability in the TPR and FPR, then repeating the ROC procedure is not necessary. We just pick a threshold such that the endpoints of a confidence interval (with some width) are acceptable. That is, pick the model so that the FPR is plausibly below some researcher-specified maximum, and/or the TPR is plausibly above some researcher-specified minimum. If your model can't attain your targets, you'll have to build a better model.
Of course, what TPR and FPR values are tolerable in your usage will be context-dependent.
For more information, see ROC Curves for Continuous Data
by Wojtek J. Krzanowski and David J. Hand.
$endgroup$
$begingroup$
This doesn't really answer my question, but it's a very nice description of ROC curves.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
I was asking why we don't train the threshold instead of choosing it after training the model.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
How would you train a threshold?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
$endgroup$
– StatsSorceress
1 hour ago
|
show 5 more comments
$begingroup$
Regardless of the underlying model, we can work out the sampling distributions of TPR and FPR at a threshold. This implies that we can characterize the variability in TPR and FPR at some threshold, and we can back into a desired error rate trade-off.
A ROC curve is a little bit deceptive because the only thing that you control is the threshold, however the plot displays TPR and FPR, which are functions of the threshold. Moreover, the TPR and FPR are both statistics, so they are subject to the vagaries of random sampling. This implies that if you were to repeat the procedure (say by cross-validation), you could come up with a different FPR and TPR at some specific threshold value.
However, if we can estimate the variability in the TPR and FPR, then repeating the ROC procedure is not necessary. We just pick a threshold such that the endpoints of a confidence interval (with some width) are acceptable. That is, pick the model so that the FPR is plausibly below some researcher-specified maximum, and/or the TPR is plausibly above some researcher-specified minimum. If your model can't attain your targets, you'll have to build a better model.
Of course, what TPR and FPR values are tolerable in your usage will be context-dependent.
For more information, see ROC Curves for Continuous Data
by Wojtek J. Krzanowski and David J. Hand.
$endgroup$
Regardless of the underlying model, we can work out the sampling distributions of TPR and FPR at a threshold. This implies that we can characterize the variability in TPR and FPR at some threshold, and we can back into a desired error rate trade-off.
A ROC curve is a little bit deceptive because the only thing that you control is the threshold, however the plot displays TPR and FPR, which are functions of the threshold. Moreover, the TPR and FPR are both statistics, so they are subject to the vagaries of random sampling. This implies that if you were to repeat the procedure (say by cross-validation), you could come up with a different FPR and TPR at some specific threshold value.
However, if we can estimate the variability in the TPR and FPR, then repeating the ROC procedure is not necessary. We just pick a threshold such that the endpoints of a confidence interval (with some width) are acceptable. That is, pick the model so that the FPR is plausibly below some researcher-specified maximum, and/or the TPR is plausibly above some researcher-specified minimum. If your model can't attain your targets, you'll have to build a better model.
Of course, what TPR and FPR values are tolerable in your usage will be context-dependent.
For more information, see ROC Curves for Continuous Data
by Wojtek J. Krzanowski and David J. Hand.
edited 1 hour ago
answered 1 hour ago
SycoraxSycorax
43.1k12112208
43.1k12112208
$begingroup$
This doesn't really answer my question, but it's a very nice description of ROC curves.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
I was asking why we don't train the threshold instead of choosing it after training the model.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
How would you train a threshold?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
$endgroup$
– StatsSorceress
1 hour ago
|
show 5 more comments
$begingroup$
This doesn't really answer my question, but it's a very nice description of ROC curves.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
I was asking why we don't train the threshold instead of choosing it after training the model.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
How would you train a threshold?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
This doesn't really answer my question, but it's a very nice description of ROC curves.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
This doesn't really answer my question, but it's a very nice description of ROC curves.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
In what way does this not answer your question? What is your question, if not asking about how to choose a threshold for classification?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
I was asking why we don't train the threshold instead of choosing it after training the model.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
I was asking why we don't train the threshold instead of choosing it after training the model.
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
How would you train a threshold?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
How would you train a threshold?
$endgroup$
– Sycorax
1 hour ago
$begingroup$
Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
$endgroup$
– StatsSorceress
1 hour ago
$begingroup$
Couldn't you find the optimal threshold for each minibatch, and take an average or something? I have a related question here if you're curious: stackoverflow.com/questions/55788153/…
$endgroup$
– StatsSorceress
1 hour ago
|
show 5 more comments
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405041%2fphilosophical-question-on-logistic-regression-why-isnt-the-optimal-threshold-v%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Possible duplicate of Classification probability threshold
$endgroup$
– EdM
1 hour ago
2
$begingroup$
That thread is certainly related, but I wouldn't call it a duplicate.
$endgroup$
– gung♦
1 hour ago