Overfitting and UnderfittingWhat's a real-world example of “overfitting”?What are the reasons why a...
Inferring from (∃x)Fx to (∃x)(∃x)Fx using existential generalization?
Rear brake cable temporary fix possible?
What is better: yes / no radio, or simple checkbox?
Integral inequality of length of curve
Why avoid shared user accounts?
How to replace the content to multiple files?
Getting a UK passport renewed when you have dual nationality and a different name in your second country?
Why did Bush enact a completely different foreign policy to that which he espoused during the 2000 Presidential election campaign?
What to do when being responsible for data protection in your lab, yet advice is ignored?
How to avoid being sexist when trying to employ someone to function in a very sexist environment?
Can I become debt free or should I file for bankruptcy? How do I manage my debt and finances?
Am I a Rude Number?
What's a good word to describe a public place that looks like it wouldn't be rough?
Manipulating a general length function
Quenching swords in dragon blood; why?
Strange Sign on Lab Door
How to implement expandbefore, similarly to expandafter?
Does the "particle exchange" operator have any validity?
Why zero tolerance on nudity in space?
High pressure canisters of air as gun-less projectiles
Using only 1s, make 29 with the minimum number of digits
How experienced do I need to be to go on a photography workshop?
Question about それに following a verb in dictionary form
Why did the villain in the first Men in Black movie care about Earth's Cockroaches?
Overfitting and Underfitting
What's a real-world example of “overfitting”?What are the reasons why a classifier could produce bad results?difference between overtraining and overfittingHow do bias, variance and overfitting relate to each other?Bias and Variance, Overfitting and UnderfittingTest and Training dataset correlation while Splitting the datasetvalidation/training accuracy and overfittingAvoiding snooping and overfittingWhy limiting weights help against overfitting in neural networks?What is a figure of the learned function in binary classification called?How to distinguish overfitting and underfitting from the ROC AUC curve?
$begingroup$
I have made some research about overfitting and underfitting, and I have understood what they exactly are, but I cannot find the reasons.
What are the main reasons for overfitting and underfitting?
Why do we face these two problems in training a model?
machine-learning dataset overfitting
New contributor
$endgroup$
add a comment |
$begingroup$
I have made some research about overfitting and underfitting, and I have understood what they exactly are, but I cannot find the reasons.
What are the main reasons for overfitting and underfitting?
Why do we face these two problems in training a model?
machine-learning dataset overfitting
New contributor
$endgroup$
$begingroup$
You might find What's a real-world example of “overfitting”? useful
$endgroup$
– Silverfish
12 hours ago
add a comment |
$begingroup$
I have made some research about overfitting and underfitting, and I have understood what they exactly are, but I cannot find the reasons.
What are the main reasons for overfitting and underfitting?
Why do we face these two problems in training a model?
machine-learning dataset overfitting
New contributor
$endgroup$
I have made some research about overfitting and underfitting, and I have understood what they exactly are, but I cannot find the reasons.
What are the main reasons for overfitting and underfitting?
Why do we face these two problems in training a model?
machine-learning dataset overfitting
machine-learning dataset overfitting
New contributor
New contributor
edited 1 hour ago
mpiktas
29.4k466130
29.4k466130
New contributor
asked yesterday
GoktugGoktug
1634
1634
New contributor
New contributor
$begingroup$
You might find What's a real-world example of “overfitting”? useful
$endgroup$
– Silverfish
12 hours ago
add a comment |
$begingroup$
You might find What's a real-world example of “overfitting”? useful
$endgroup$
– Silverfish
12 hours ago
$begingroup$
You might find What's a real-world example of “overfitting”? useful
$endgroup$
– Silverfish
12 hours ago
$begingroup$
You might find What's a real-world example of “overfitting”? useful
$endgroup$
– Silverfish
12 hours ago
add a comment |
5 Answers
5
active
oldest
votes
$begingroup$
I'll try to answer in the simplest way. Each of those problems has its own main origin:
Overfitting: Data is noisy, meaning that there are some deviations from reality (because of measurement errors, random influentially factors, non-observed variables and rubbish correlations) that makes us harder to see their true relationship with our explaining factors. Also, it is usually not complete (we don't have examples of everything).
As an example, let's say I am trying to classify boys and girls based on their height, just because that's the only information I have about them. We all know that even though boys are taller on average than girls, there is a huge overlap region so it's impossible to perfectly separate them just with that bit of information. Depending on the density of the data, a sufficiently complex model might be able to achieve a better success rate on this task than is theoretically possible on the training dataset because it could draw boundaries that allow some points to stand alone by themselves. So, if we only have a person who is 2.04 meters tall and she's a woman, then the model could draw a little circle around that area meaning that a random person who is 2.04 meters tall is most likely to be a woman.
The underlying reason for it all is trusting too much in training data (and in the example, the model says that as there is no man with 2.04 height, then it is only possible for women).
Underfitting is the opposite problem, in which the model fails to recognize the real complexities in our data (i.e. the non-random changes in our data). The model assumes that noise is greater than it really is and thus uses a too simplistic shape. So, if the dataset has much more girls than boys for whatever reason, then the model could just classify them all like girls.
In this case, the model didn't trust enough in data and it just assumed that deviations are all noise (and in the example, the model assumes that boys simply do not exist).
Bottom line is that we face these problems because:
- We don't have complete information.
- We don't know how noisy the data is (we don't know how much should we trust it).
- We don't know in advance the underlying function that generated our data, and thus the optimal model complexity.
I hope this helps.
New contributor
$endgroup$
1
$begingroup$
welcome to CV. nice answer, which makes me want to del my answer ...
$endgroup$
– hxd1011
22 hours ago
4
$begingroup$
Was the "button line" instead of "bottom line" intentional?
$endgroup$
– Džuris
19 hours ago
add a comment |
$begingroup$
Overfitting is when a model estimates the variable you are modeling really well on the original data, but it does not estimate well on new data set (hold out, cross validation, forecasting, etc.). You have too many variables or estimators in your model (dummy variables, etc.) and these cause your model to become too sensitive to the noise in your original data. As a result of overfitting on the noise in your original data, the model predicts poorly.
Underfitting is when a model does not estimate the variable well in either the original data or new data. Your model is missing some variables that are necessary to better estimate and predict the behavior of your dependent variable.
The balancing act between over and underfitting is challenging and sometimes without a clear finish line. In modeling econometrics time series, this issue is resolved pretty well with regularization models (LASSO, Ridge Regression, Elastic-Net) that are catered specifically to reducing overfitting by respectively reducing the number of variables in your model, reducing the sensitivity of the coefficients to your data, or a combination of both.
$endgroup$
add a comment |
$begingroup$
Perhaps during your research you came across the following equation:
Error = IrreducibleError + Bias² + Variance
.
Why do we face these two problems in training a model ?
The learning problem itself is basically a trade-off between bias and variance.
What are the main reasons for overfitting and underfitting ?
Short: Noise.
Long: The irreducible error: Measurement errors/fluctuations in the data as well as the part of the target function that cannot be represented by the model. Remeasuring the target variable or changing the hypothesis space (i.e. selecting a different model) changes this component.
Edit (to link to the other answers): Model performance as complexity is varied:
where errorD is the error over the entire distribution D (in practice estimated with test sets).
$endgroup$
1
$begingroup$
I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
$endgroup$
– Gregor
13 hours ago
add a comment |
$begingroup$
What are the main reasons for overfitting and underfitting ?
For overfitting, the model is too complex to fit the training data well. For underfitting, the model is too simple.
Why do we face these two problems in training a model ?
It is hard to pick the "just right" model and parameters for the data.
$endgroup$
add a comment |
$begingroup$
Almost all statistical problems can be stated in the following form:
Given the data $(y, x)$ find $hat{f}$ which produces $hat{y}=hat{f}(x)$.
Make this $hat{f}$ as close as possible to "true" $f$, where $f$ is defined as
$$y = f(x) + varepsilon$$
The temptation is always to make $hat{f}$ produce $hat{y}$ which are very close to the data $y$. But when new data point arrives, or we use data which was not used to construct $hat{f}$ the prediction may be way off. This happens because we are trying to explain $varepsilon$ instead of $f$. When we do this we stray from "true" $f$ and hence when new observation comes in we get a bad prediction. This when overfitting happens.
On the other hand when we find $hat{f}$ the question is always maybe we can get a better $tilde{f}$ which produces better fit and is close to "true" $f$? If we can then we underfitted in the first case.
If you look at the statistical problem this way, fitting the model is always a balance between underfitting and overfitting and any solution is always a compromise. We face this problem because our data is random and noisy.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Goktug is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f395197%2foverfitting-and-underfitting%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I'll try to answer in the simplest way. Each of those problems has its own main origin:
Overfitting: Data is noisy, meaning that there are some deviations from reality (because of measurement errors, random influentially factors, non-observed variables and rubbish correlations) that makes us harder to see their true relationship with our explaining factors. Also, it is usually not complete (we don't have examples of everything).
As an example, let's say I am trying to classify boys and girls based on their height, just because that's the only information I have about them. We all know that even though boys are taller on average than girls, there is a huge overlap region so it's impossible to perfectly separate them just with that bit of information. Depending on the density of the data, a sufficiently complex model might be able to achieve a better success rate on this task than is theoretically possible on the training dataset because it could draw boundaries that allow some points to stand alone by themselves. So, if we only have a person who is 2.04 meters tall and she's a woman, then the model could draw a little circle around that area meaning that a random person who is 2.04 meters tall is most likely to be a woman.
The underlying reason for it all is trusting too much in training data (and in the example, the model says that as there is no man with 2.04 height, then it is only possible for women).
Underfitting is the opposite problem, in which the model fails to recognize the real complexities in our data (i.e. the non-random changes in our data). The model assumes that noise is greater than it really is and thus uses a too simplistic shape. So, if the dataset has much more girls than boys for whatever reason, then the model could just classify them all like girls.
In this case, the model didn't trust enough in data and it just assumed that deviations are all noise (and in the example, the model assumes that boys simply do not exist).
Bottom line is that we face these problems because:
- We don't have complete information.
- We don't know how noisy the data is (we don't know how much should we trust it).
- We don't know in advance the underlying function that generated our data, and thus the optimal model complexity.
I hope this helps.
New contributor
$endgroup$
1
$begingroup$
welcome to CV. nice answer, which makes me want to del my answer ...
$endgroup$
– hxd1011
22 hours ago
4
$begingroup$
Was the "button line" instead of "bottom line" intentional?
$endgroup$
– Džuris
19 hours ago
add a comment |
$begingroup$
I'll try to answer in the simplest way. Each of those problems has its own main origin:
Overfitting: Data is noisy, meaning that there are some deviations from reality (because of measurement errors, random influentially factors, non-observed variables and rubbish correlations) that makes us harder to see their true relationship with our explaining factors. Also, it is usually not complete (we don't have examples of everything).
As an example, let's say I am trying to classify boys and girls based on their height, just because that's the only information I have about them. We all know that even though boys are taller on average than girls, there is a huge overlap region so it's impossible to perfectly separate them just with that bit of information. Depending on the density of the data, a sufficiently complex model might be able to achieve a better success rate on this task than is theoretically possible on the training dataset because it could draw boundaries that allow some points to stand alone by themselves. So, if we only have a person who is 2.04 meters tall and she's a woman, then the model could draw a little circle around that area meaning that a random person who is 2.04 meters tall is most likely to be a woman.
The underlying reason for it all is trusting too much in training data (and in the example, the model says that as there is no man with 2.04 height, then it is only possible for women).
Underfitting is the opposite problem, in which the model fails to recognize the real complexities in our data (i.e. the non-random changes in our data). The model assumes that noise is greater than it really is and thus uses a too simplistic shape. So, if the dataset has much more girls than boys for whatever reason, then the model could just classify them all like girls.
In this case, the model didn't trust enough in data and it just assumed that deviations are all noise (and in the example, the model assumes that boys simply do not exist).
Bottom line is that we face these problems because:
- We don't have complete information.
- We don't know how noisy the data is (we don't know how much should we trust it).
- We don't know in advance the underlying function that generated our data, and thus the optimal model complexity.
I hope this helps.
New contributor
$endgroup$
1
$begingroup$
welcome to CV. nice answer, which makes me want to del my answer ...
$endgroup$
– hxd1011
22 hours ago
4
$begingroup$
Was the "button line" instead of "bottom line" intentional?
$endgroup$
– Džuris
19 hours ago
add a comment |
$begingroup$
I'll try to answer in the simplest way. Each of those problems has its own main origin:
Overfitting: Data is noisy, meaning that there are some deviations from reality (because of measurement errors, random influentially factors, non-observed variables and rubbish correlations) that makes us harder to see their true relationship with our explaining factors. Also, it is usually not complete (we don't have examples of everything).
As an example, let's say I am trying to classify boys and girls based on their height, just because that's the only information I have about them. We all know that even though boys are taller on average than girls, there is a huge overlap region so it's impossible to perfectly separate them just with that bit of information. Depending on the density of the data, a sufficiently complex model might be able to achieve a better success rate on this task than is theoretically possible on the training dataset because it could draw boundaries that allow some points to stand alone by themselves. So, if we only have a person who is 2.04 meters tall and she's a woman, then the model could draw a little circle around that area meaning that a random person who is 2.04 meters tall is most likely to be a woman.
The underlying reason for it all is trusting too much in training data (and in the example, the model says that as there is no man with 2.04 height, then it is only possible for women).
Underfitting is the opposite problem, in which the model fails to recognize the real complexities in our data (i.e. the non-random changes in our data). The model assumes that noise is greater than it really is and thus uses a too simplistic shape. So, if the dataset has much more girls than boys for whatever reason, then the model could just classify them all like girls.
In this case, the model didn't trust enough in data and it just assumed that deviations are all noise (and in the example, the model assumes that boys simply do not exist).
Bottom line is that we face these problems because:
- We don't have complete information.
- We don't know how noisy the data is (we don't know how much should we trust it).
- We don't know in advance the underlying function that generated our data, and thus the optimal model complexity.
I hope this helps.
New contributor
$endgroup$
I'll try to answer in the simplest way. Each of those problems has its own main origin:
Overfitting: Data is noisy, meaning that there are some deviations from reality (because of measurement errors, random influentially factors, non-observed variables and rubbish correlations) that makes us harder to see their true relationship with our explaining factors. Also, it is usually not complete (we don't have examples of everything).
As an example, let's say I am trying to classify boys and girls based on their height, just because that's the only information I have about them. We all know that even though boys are taller on average than girls, there is a huge overlap region so it's impossible to perfectly separate them just with that bit of information. Depending on the density of the data, a sufficiently complex model might be able to achieve a better success rate on this task than is theoretically possible on the training dataset because it could draw boundaries that allow some points to stand alone by themselves. So, if we only have a person who is 2.04 meters tall and she's a woman, then the model could draw a little circle around that area meaning that a random person who is 2.04 meters tall is most likely to be a woman.
The underlying reason for it all is trusting too much in training data (and in the example, the model says that as there is no man with 2.04 height, then it is only possible for women).
Underfitting is the opposite problem, in which the model fails to recognize the real complexities in our data (i.e. the non-random changes in our data). The model assumes that noise is greater than it really is and thus uses a too simplistic shape. So, if the dataset has much more girls than boys for whatever reason, then the model could just classify them all like girls.
In this case, the model didn't trust enough in data and it just assumed that deviations are all noise (and in the example, the model assumes that boys simply do not exist).
Bottom line is that we face these problems because:
- We don't have complete information.
- We don't know how noisy the data is (we don't know how much should we trust it).
- We don't know in advance the underlying function that generated our data, and thus the optimal model complexity.
I hope this helps.
New contributor
edited 16 hours ago
New contributor
answered 22 hours ago
Luis Da SilvaLuis Da Silva
1814
1814
New contributor
New contributor
1
$begingroup$
welcome to CV. nice answer, which makes me want to del my answer ...
$endgroup$
– hxd1011
22 hours ago
4
$begingroup$
Was the "button line" instead of "bottom line" intentional?
$endgroup$
– Džuris
19 hours ago
add a comment |
1
$begingroup$
welcome to CV. nice answer, which makes me want to del my answer ...
$endgroup$
– hxd1011
22 hours ago
4
$begingroup$
Was the "button line" instead of "bottom line" intentional?
$endgroup$
– Džuris
19 hours ago
1
1
$begingroup$
welcome to CV. nice answer, which makes me want to del my answer ...
$endgroup$
– hxd1011
22 hours ago
$begingroup$
welcome to CV. nice answer, which makes me want to del my answer ...
$endgroup$
– hxd1011
22 hours ago
4
4
$begingroup$
Was the "button line" instead of "bottom line" intentional?
$endgroup$
– Džuris
19 hours ago
$begingroup$
Was the "button line" instead of "bottom line" intentional?
$endgroup$
– Džuris
19 hours ago
add a comment |
$begingroup$
Overfitting is when a model estimates the variable you are modeling really well on the original data, but it does not estimate well on new data set (hold out, cross validation, forecasting, etc.). You have too many variables or estimators in your model (dummy variables, etc.) and these cause your model to become too sensitive to the noise in your original data. As a result of overfitting on the noise in your original data, the model predicts poorly.
Underfitting is when a model does not estimate the variable well in either the original data or new data. Your model is missing some variables that are necessary to better estimate and predict the behavior of your dependent variable.
The balancing act between over and underfitting is challenging and sometimes without a clear finish line. In modeling econometrics time series, this issue is resolved pretty well with regularization models (LASSO, Ridge Regression, Elastic-Net) that are catered specifically to reducing overfitting by respectively reducing the number of variables in your model, reducing the sensitivity of the coefficients to your data, or a combination of both.
$endgroup$
add a comment |
$begingroup$
Overfitting is when a model estimates the variable you are modeling really well on the original data, but it does not estimate well on new data set (hold out, cross validation, forecasting, etc.). You have too many variables or estimators in your model (dummy variables, etc.) and these cause your model to become too sensitive to the noise in your original data. As a result of overfitting on the noise in your original data, the model predicts poorly.
Underfitting is when a model does not estimate the variable well in either the original data or new data. Your model is missing some variables that are necessary to better estimate and predict the behavior of your dependent variable.
The balancing act between over and underfitting is challenging and sometimes without a clear finish line. In modeling econometrics time series, this issue is resolved pretty well with regularization models (LASSO, Ridge Regression, Elastic-Net) that are catered specifically to reducing overfitting by respectively reducing the number of variables in your model, reducing the sensitivity of the coefficients to your data, or a combination of both.
$endgroup$
add a comment |
$begingroup$
Overfitting is when a model estimates the variable you are modeling really well on the original data, but it does not estimate well on new data set (hold out, cross validation, forecasting, etc.). You have too many variables or estimators in your model (dummy variables, etc.) and these cause your model to become too sensitive to the noise in your original data. As a result of overfitting on the noise in your original data, the model predicts poorly.
Underfitting is when a model does not estimate the variable well in either the original data or new data. Your model is missing some variables that are necessary to better estimate and predict the behavior of your dependent variable.
The balancing act between over and underfitting is challenging and sometimes without a clear finish line. In modeling econometrics time series, this issue is resolved pretty well with regularization models (LASSO, Ridge Regression, Elastic-Net) that are catered specifically to reducing overfitting by respectively reducing the number of variables in your model, reducing the sensitivity of the coefficients to your data, or a combination of both.
$endgroup$
Overfitting is when a model estimates the variable you are modeling really well on the original data, but it does not estimate well on new data set (hold out, cross validation, forecasting, etc.). You have too many variables or estimators in your model (dummy variables, etc.) and these cause your model to become too sensitive to the noise in your original data. As a result of overfitting on the noise in your original data, the model predicts poorly.
Underfitting is when a model does not estimate the variable well in either the original data or new data. Your model is missing some variables that are necessary to better estimate and predict the behavior of your dependent variable.
The balancing act between over and underfitting is challenging and sometimes without a clear finish line. In modeling econometrics time series, this issue is resolved pretty well with regularization models (LASSO, Ridge Regression, Elastic-Net) that are catered specifically to reducing overfitting by respectively reducing the number of variables in your model, reducing the sensitivity of the coefficients to your data, or a combination of both.
answered 15 hours ago
SympaSympa
4,16732344
4,16732344
add a comment |
add a comment |
$begingroup$
Perhaps during your research you came across the following equation:
Error = IrreducibleError + Bias² + Variance
.
Why do we face these two problems in training a model ?
The learning problem itself is basically a trade-off between bias and variance.
What are the main reasons for overfitting and underfitting ?
Short: Noise.
Long: The irreducible error: Measurement errors/fluctuations in the data as well as the part of the target function that cannot be represented by the model. Remeasuring the target variable or changing the hypothesis space (i.e. selecting a different model) changes this component.
Edit (to link to the other answers): Model performance as complexity is varied:
where errorD is the error over the entire distribution D (in practice estimated with test sets).
$endgroup$
1
$begingroup$
I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
$endgroup$
– Gregor
13 hours ago
add a comment |
$begingroup$
Perhaps during your research you came across the following equation:
Error = IrreducibleError + Bias² + Variance
.
Why do we face these two problems in training a model ?
The learning problem itself is basically a trade-off between bias and variance.
What are the main reasons for overfitting and underfitting ?
Short: Noise.
Long: The irreducible error: Measurement errors/fluctuations in the data as well as the part of the target function that cannot be represented by the model. Remeasuring the target variable or changing the hypothesis space (i.e. selecting a different model) changes this component.
Edit (to link to the other answers): Model performance as complexity is varied:
where errorD is the error over the entire distribution D (in practice estimated with test sets).
$endgroup$
1
$begingroup$
I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
$endgroup$
– Gregor
13 hours ago
add a comment |
$begingroup$
Perhaps during your research you came across the following equation:
Error = IrreducibleError + Bias² + Variance
.
Why do we face these two problems in training a model ?
The learning problem itself is basically a trade-off between bias and variance.
What are the main reasons for overfitting and underfitting ?
Short: Noise.
Long: The irreducible error: Measurement errors/fluctuations in the data as well as the part of the target function that cannot be represented by the model. Remeasuring the target variable or changing the hypothesis space (i.e. selecting a different model) changes this component.
Edit (to link to the other answers): Model performance as complexity is varied:
where errorD is the error over the entire distribution D (in practice estimated with test sets).
$endgroup$
Perhaps during your research you came across the following equation:
Error = IrreducibleError + Bias² + Variance
.
Why do we face these two problems in training a model ?
The learning problem itself is basically a trade-off between bias and variance.
What are the main reasons for overfitting and underfitting ?
Short: Noise.
Long: The irreducible error: Measurement errors/fluctuations in the data as well as the part of the target function that cannot be represented by the model. Remeasuring the target variable or changing the hypothesis space (i.e. selecting a different model) changes this component.
Edit (to link to the other answers): Model performance as complexity is varied:
where errorD is the error over the entire distribution D (in practice estimated with test sets).
edited 2 hours ago
answered 14 hours ago
lnathanlnathan
873420
873420
1
$begingroup$
I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
$endgroup$
– Gregor
13 hours ago
add a comment |
1
$begingroup$
I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
$endgroup$
– Gregor
13 hours ago
1
1
$begingroup$
I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
$endgroup$
– Gregor
13 hours ago
$begingroup$
I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
$endgroup$
– Gregor
13 hours ago
add a comment |
$begingroup$
What are the main reasons for overfitting and underfitting ?
For overfitting, the model is too complex to fit the training data well. For underfitting, the model is too simple.
Why do we face these two problems in training a model ?
It is hard to pick the "just right" model and parameters for the data.
$endgroup$
add a comment |
$begingroup$
What are the main reasons for overfitting and underfitting ?
For overfitting, the model is too complex to fit the training data well. For underfitting, the model is too simple.
Why do we face these two problems in training a model ?
It is hard to pick the "just right" model and parameters for the data.
$endgroup$
add a comment |
$begingroup$
What are the main reasons for overfitting and underfitting ?
For overfitting, the model is too complex to fit the training data well. For underfitting, the model is too simple.
Why do we face these two problems in training a model ?
It is hard to pick the "just right" model and parameters for the data.
$endgroup$
What are the main reasons for overfitting and underfitting ?
For overfitting, the model is too complex to fit the training data well. For underfitting, the model is too simple.
Why do we face these two problems in training a model ?
It is hard to pick the "just right" model and parameters for the data.
answered 22 hours ago
hxd1011hxd1011
18.7k653145
18.7k653145
add a comment |
add a comment |
$begingroup$
Almost all statistical problems can be stated in the following form:
Given the data $(y, x)$ find $hat{f}$ which produces $hat{y}=hat{f}(x)$.
Make this $hat{f}$ as close as possible to "true" $f$, where $f$ is defined as
$$y = f(x) + varepsilon$$
The temptation is always to make $hat{f}$ produce $hat{y}$ which are very close to the data $y$. But when new data point arrives, or we use data which was not used to construct $hat{f}$ the prediction may be way off. This happens because we are trying to explain $varepsilon$ instead of $f$. When we do this we stray from "true" $f$ and hence when new observation comes in we get a bad prediction. This when overfitting happens.
On the other hand when we find $hat{f}$ the question is always maybe we can get a better $tilde{f}$ which produces better fit and is close to "true" $f$? If we can then we underfitted in the first case.
If you look at the statistical problem this way, fitting the model is always a balance between underfitting and overfitting and any solution is always a compromise. We face this problem because our data is random and noisy.
$endgroup$
add a comment |
$begingroup$
Almost all statistical problems can be stated in the following form:
Given the data $(y, x)$ find $hat{f}$ which produces $hat{y}=hat{f}(x)$.
Make this $hat{f}$ as close as possible to "true" $f$, where $f$ is defined as
$$y = f(x) + varepsilon$$
The temptation is always to make $hat{f}$ produce $hat{y}$ which are very close to the data $y$. But when new data point arrives, or we use data which was not used to construct $hat{f}$ the prediction may be way off. This happens because we are trying to explain $varepsilon$ instead of $f$. When we do this we stray from "true" $f$ and hence when new observation comes in we get a bad prediction. This when overfitting happens.
On the other hand when we find $hat{f}$ the question is always maybe we can get a better $tilde{f}$ which produces better fit and is close to "true" $f$? If we can then we underfitted in the first case.
If you look at the statistical problem this way, fitting the model is always a balance between underfitting and overfitting and any solution is always a compromise. We face this problem because our data is random and noisy.
$endgroup$
add a comment |
$begingroup$
Almost all statistical problems can be stated in the following form:
Given the data $(y, x)$ find $hat{f}$ which produces $hat{y}=hat{f}(x)$.
Make this $hat{f}$ as close as possible to "true" $f$, where $f$ is defined as
$$y = f(x) + varepsilon$$
The temptation is always to make $hat{f}$ produce $hat{y}$ which are very close to the data $y$. But when new data point arrives, or we use data which was not used to construct $hat{f}$ the prediction may be way off. This happens because we are trying to explain $varepsilon$ instead of $f$. When we do this we stray from "true" $f$ and hence when new observation comes in we get a bad prediction. This when overfitting happens.
On the other hand when we find $hat{f}$ the question is always maybe we can get a better $tilde{f}$ which produces better fit and is close to "true" $f$? If we can then we underfitted in the first case.
If you look at the statistical problem this way, fitting the model is always a balance between underfitting and overfitting and any solution is always a compromise. We face this problem because our data is random and noisy.
$endgroup$
Almost all statistical problems can be stated in the following form:
Given the data $(y, x)$ find $hat{f}$ which produces $hat{y}=hat{f}(x)$.
Make this $hat{f}$ as close as possible to "true" $f$, where $f$ is defined as
$$y = f(x) + varepsilon$$
The temptation is always to make $hat{f}$ produce $hat{y}$ which are very close to the data $y$. But when new data point arrives, or we use data which was not used to construct $hat{f}$ the prediction may be way off. This happens because we are trying to explain $varepsilon$ instead of $f$. When we do this we stray from "true" $f$ and hence when new observation comes in we get a bad prediction. This when overfitting happens.
On the other hand when we find $hat{f}$ the question is always maybe we can get a better $tilde{f}$ which produces better fit and is close to "true" $f$? If we can then we underfitted in the first case.
If you look at the statistical problem this way, fitting the model is always a balance between underfitting and overfitting and any solution is always a compromise. We face this problem because our data is random and noisy.
answered 57 mins ago
mpiktasmpiktas
29.4k466130
29.4k466130
add a comment |
add a comment |
Goktug is a new contributor. Be nice, and check out our Code of Conduct.
Goktug is a new contributor. Be nice, and check out our Code of Conduct.
Goktug is a new contributor. Be nice, and check out our Code of Conduct.
Goktug is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f395197%2foverfitting-and-underfitting%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
You might find What's a real-world example of “overfitting”? useful
$endgroup$
– Silverfish
12 hours ago