Overfitting and UnderfittingWhat's a real-world example of “overfitting”?What are the reasons why a...

Inferring from (∃x)Fx to (∃x)(∃x)Fx using existential generalization?

Rear brake cable temporary fix possible?

What is better: yes / no radio, or simple checkbox?

Integral inequality of length of curve

Why avoid shared user accounts?

How to replace the content to multiple files?

Getting a UK passport renewed when you have dual nationality and a different name in your second country?

Why did Bush enact a completely different foreign policy to that which he espoused during the 2000 Presidential election campaign?

What to do when being responsible for data protection in your lab, yet advice is ignored?

How to avoid being sexist when trying to employ someone to function in a very sexist environment?

Can I become debt free or should I file for bankruptcy? How do I manage my debt and finances?

Am I a Rude Number?

What's a good word to describe a public place that looks like it wouldn't be rough?

Manipulating a general length function

Quenching swords in dragon blood; why?

Strange Sign on Lab Door

How to implement expandbefore, similarly to expandafter?

Does the "particle exchange" operator have any validity?

Why zero tolerance on nudity in space?

High pressure canisters of air as gun-less projectiles

Using only 1s, make 29 with the minimum number of digits

How experienced do I need to be to go on a photography workshop?

Question about それに following a verb in dictionary form

Why did the villain in the first Men in Black movie care about Earth's Cockroaches?



Overfitting and Underfitting


What's a real-world example of “overfitting”?What are the reasons why a classifier could produce bad results?difference between overtraining and overfittingHow do bias, variance and overfitting relate to each other?Bias and Variance, Overfitting and UnderfittingTest and Training dataset correlation while Splitting the datasetvalidation/training accuracy and overfittingAvoiding snooping and overfittingWhy limiting weights help against overfitting in neural networks?What is a figure of the learned function in binary classification called?How to distinguish overfitting and underfitting from the ROC AUC curve?













12












$begingroup$


I have made some research about overfitting and underfitting, and I have understood what they exactly are, but I cannot find the reasons.



What are the main reasons for overfitting and underfitting?



Why do we face these two problems in training a model?










share|cite|improve this question









New contributor




Goktug is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    You might find What's a real-world example of “overfitting”? useful
    $endgroup$
    – Silverfish
    12 hours ago


















12












$begingroup$


I have made some research about overfitting and underfitting, and I have understood what they exactly are, but I cannot find the reasons.



What are the main reasons for overfitting and underfitting?



Why do we face these two problems in training a model?










share|cite|improve this question









New contributor




Goktug is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    You might find What's a real-world example of “overfitting”? useful
    $endgroup$
    – Silverfish
    12 hours ago
















12












12








12


7



$begingroup$


I have made some research about overfitting and underfitting, and I have understood what they exactly are, but I cannot find the reasons.



What are the main reasons for overfitting and underfitting?



Why do we face these two problems in training a model?










share|cite|improve this question









New contributor




Goktug is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I have made some research about overfitting and underfitting, and I have understood what they exactly are, but I cannot find the reasons.



What are the main reasons for overfitting and underfitting?



Why do we face these two problems in training a model?







machine-learning dataset overfitting






share|cite|improve this question









New contributor




Goktug is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question









New contributor




Goktug is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question








edited 1 hour ago









mpiktas

29.4k466130




29.4k466130






New contributor




Goktug is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









GoktugGoktug

1634




1634




New contributor




Goktug is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Goktug is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Goktug is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    You might find What's a real-world example of “overfitting”? useful
    $endgroup$
    – Silverfish
    12 hours ago




















  • $begingroup$
    You might find What's a real-world example of “overfitting”? useful
    $endgroup$
    – Silverfish
    12 hours ago


















$begingroup$
You might find What's a real-world example of “overfitting”? useful
$endgroup$
– Silverfish
12 hours ago






$begingroup$
You might find What's a real-world example of “overfitting”? useful
$endgroup$
– Silverfish
12 hours ago












5 Answers
5






active

oldest

votes


















18












$begingroup$

I'll try to answer in the simplest way. Each of those problems has its own main origin:



Overfitting: Data is noisy, meaning that there are some deviations from reality (because of measurement errors, random influentially factors, non-observed variables and rubbish correlations) that makes us harder to see their true relationship with our explaining factors. Also, it is usually not complete (we don't have examples of everything).



As an example, let's say I am trying to classify boys and girls based on their height, just because that's the only information I have about them. We all know that even though boys are taller on average than girls, there is a huge overlap region so it's impossible to perfectly separate them just with that bit of information. Depending on the density of the data, a sufficiently complex model might be able to achieve a better success rate on this task than is theoretically possible on the training dataset because it could draw boundaries that allow some points to stand alone by themselves. So, if we only have a person who is 2.04 meters tall and she's a woman, then the model could draw a little circle around that area meaning that a random person who is 2.04 meters tall is most likely to be a woman.



The underlying reason for it all is trusting too much in training data (and in the example, the model says that as there is no man with 2.04 height, then it is only possible for women).



Underfitting is the opposite problem, in which the model fails to recognize the real complexities in our data (i.e. the non-random changes in our data). The model assumes that noise is greater than it really is and thus uses a too simplistic shape. So, if the dataset has much more girls than boys for whatever reason, then the model could just classify them all like girls.



In this case, the model didn't trust enough in data and it just assumed that deviations are all noise (and in the example, the model assumes that boys simply do not exist).



Bottom line is that we face these problems because:




  • We don't have complete information.

  • We don't know how noisy the data is (we don't know how much should we trust it).

  • We don't know in advance the underlying function that generated our data, and thus the optimal model complexity.


I hope this helps.






share|cite|improve this answer










New contributor




Luis Da Silva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$









  • 1




    $begingroup$
    welcome to CV. nice answer, which makes me want to del my answer ...
    $endgroup$
    – hxd1011
    22 hours ago






  • 4




    $begingroup$
    Was the "button line" instead of "bottom line" intentional?
    $endgroup$
    – Džuris
    19 hours ago



















3












$begingroup$

Overfitting is when a model estimates the variable you are modeling really well on the original data, but it does not estimate well on new data set (hold out, cross validation, forecasting, etc.). You have too many variables or estimators in your model (dummy variables, etc.) and these cause your model to become too sensitive to the noise in your original data. As a result of overfitting on the noise in your original data, the model predicts poorly.



Underfitting is when a model does not estimate the variable well in either the original data or new data. Your model is missing some variables that are necessary to better estimate and predict the behavior of your dependent variable.



The balancing act between over and underfitting is challenging and sometimes without a clear finish line. In modeling econometrics time series, this issue is resolved pretty well with regularization models (LASSO, Ridge Regression, Elastic-Net) that are catered specifically to reducing overfitting by respectively reducing the number of variables in your model, reducing the sensitivity of the coefficients to your data, or a combination of both.






share|cite|improve this answer









$endgroup$





















    2












    $begingroup$

    Perhaps during your research you came across the following equation:



    Error = IrreducibleError + Bias² + Variance.




    Why do we face these two problems in training a model ?




    The learning problem itself is basically a trade-off between bias and variance.




    What are the main reasons for overfitting and underfitting ?




    Short: Noise.



    Long: The irreducible error: Measurement errors/fluctuations in the data as well as the part of the target function that cannot be represented by the model. Remeasuring the target variable or changing the hypothesis space (i.e. selecting a different model) changes this component.



    Edit (to link to the other answers): Model performance as complexity is varied:



    .



    where errorD is the error over the entire distribution D (in practice estimated with test sets).






    share|cite|improve this answer











    $endgroup$









    • 1




      $begingroup$
      I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
      $endgroup$
      – Gregor
      13 hours ago



















    1












    $begingroup$


    What are the main reasons for overfitting and underfitting ?




    For overfitting, the model is too complex to fit the training data well. For underfitting, the model is too simple.




    Why do we face these two problems in training a model ?




    It is hard to pick the "just right" model and parameters for the data.






    share|cite|improve this answer









    $endgroup$





















      1












      $begingroup$

      Almost all statistical problems can be stated in the following form:




      1. Given the data $(y, x)$ find $hat{f}$ which produces $hat{y}=hat{f}(x)$.


      2. Make this $hat{f}$ as close as possible to "true" $f$, where $f$ is defined as



      $$y = f(x) + varepsilon$$



      The temptation is always to make $hat{f}$ produce $hat{y}$ which are very close to the data $y$. But when new data point arrives, or we use data which was not used to construct $hat{f}$ the prediction may be way off. This happens because we are trying to explain $varepsilon$ instead of $f$. When we do this we stray from "true" $f$ and hence when new observation comes in we get a bad prediction. This when overfitting happens.



      On the other hand when we find $hat{f}$ the question is always maybe we can get a better $tilde{f}$ which produces better fit and is close to "true" $f$? If we can then we underfitted in the first case.



      If you look at the statistical problem this way, fitting the model is always a balance between underfitting and overfitting and any solution is always a compromise. We face this problem because our data is random and noisy.






      share|cite|improve this answer









      $endgroup$













        Your Answer





        StackExchange.ifUsing("editor", function () {
        return StackExchange.using("mathjaxEditing", function () {
        StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
        StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
        });
        });
        }, "mathjax-editing");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "65"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });






        Goktug is a new contributor. Be nice, and check out our Code of Conduct.










        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f395197%2foverfitting-and-underfitting%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        5 Answers
        5






        active

        oldest

        votes








        5 Answers
        5






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        18












        $begingroup$

        I'll try to answer in the simplest way. Each of those problems has its own main origin:



        Overfitting: Data is noisy, meaning that there are some deviations from reality (because of measurement errors, random influentially factors, non-observed variables and rubbish correlations) that makes us harder to see their true relationship with our explaining factors. Also, it is usually not complete (we don't have examples of everything).



        As an example, let's say I am trying to classify boys and girls based on their height, just because that's the only information I have about them. We all know that even though boys are taller on average than girls, there is a huge overlap region so it's impossible to perfectly separate them just with that bit of information. Depending on the density of the data, a sufficiently complex model might be able to achieve a better success rate on this task than is theoretically possible on the training dataset because it could draw boundaries that allow some points to stand alone by themselves. So, if we only have a person who is 2.04 meters tall and she's a woman, then the model could draw a little circle around that area meaning that a random person who is 2.04 meters tall is most likely to be a woman.



        The underlying reason for it all is trusting too much in training data (and in the example, the model says that as there is no man with 2.04 height, then it is only possible for women).



        Underfitting is the opposite problem, in which the model fails to recognize the real complexities in our data (i.e. the non-random changes in our data). The model assumes that noise is greater than it really is and thus uses a too simplistic shape. So, if the dataset has much more girls than boys for whatever reason, then the model could just classify them all like girls.



        In this case, the model didn't trust enough in data and it just assumed that deviations are all noise (and in the example, the model assumes that boys simply do not exist).



        Bottom line is that we face these problems because:




        • We don't have complete information.

        • We don't know how noisy the data is (we don't know how much should we trust it).

        • We don't know in advance the underlying function that generated our data, and thus the optimal model complexity.


        I hope this helps.






        share|cite|improve this answer










        New contributor




        Luis Da Silva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$









        • 1




          $begingroup$
          welcome to CV. nice answer, which makes me want to del my answer ...
          $endgroup$
          – hxd1011
          22 hours ago






        • 4




          $begingroup$
          Was the "button line" instead of "bottom line" intentional?
          $endgroup$
          – Džuris
          19 hours ago
















        18












        $begingroup$

        I'll try to answer in the simplest way. Each of those problems has its own main origin:



        Overfitting: Data is noisy, meaning that there are some deviations from reality (because of measurement errors, random influentially factors, non-observed variables and rubbish correlations) that makes us harder to see their true relationship with our explaining factors. Also, it is usually not complete (we don't have examples of everything).



        As an example, let's say I am trying to classify boys and girls based on their height, just because that's the only information I have about them. We all know that even though boys are taller on average than girls, there is a huge overlap region so it's impossible to perfectly separate them just with that bit of information. Depending on the density of the data, a sufficiently complex model might be able to achieve a better success rate on this task than is theoretically possible on the training dataset because it could draw boundaries that allow some points to stand alone by themselves. So, if we only have a person who is 2.04 meters tall and she's a woman, then the model could draw a little circle around that area meaning that a random person who is 2.04 meters tall is most likely to be a woman.



        The underlying reason for it all is trusting too much in training data (and in the example, the model says that as there is no man with 2.04 height, then it is only possible for women).



        Underfitting is the opposite problem, in which the model fails to recognize the real complexities in our data (i.e. the non-random changes in our data). The model assumes that noise is greater than it really is and thus uses a too simplistic shape. So, if the dataset has much more girls than boys for whatever reason, then the model could just classify them all like girls.



        In this case, the model didn't trust enough in data and it just assumed that deviations are all noise (and in the example, the model assumes that boys simply do not exist).



        Bottom line is that we face these problems because:




        • We don't have complete information.

        • We don't know how noisy the data is (we don't know how much should we trust it).

        • We don't know in advance the underlying function that generated our data, and thus the optimal model complexity.


        I hope this helps.






        share|cite|improve this answer










        New contributor




        Luis Da Silva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$









        • 1




          $begingroup$
          welcome to CV. nice answer, which makes me want to del my answer ...
          $endgroup$
          – hxd1011
          22 hours ago






        • 4




          $begingroup$
          Was the "button line" instead of "bottom line" intentional?
          $endgroup$
          – Džuris
          19 hours ago














        18












        18








        18





        $begingroup$

        I'll try to answer in the simplest way. Each of those problems has its own main origin:



        Overfitting: Data is noisy, meaning that there are some deviations from reality (because of measurement errors, random influentially factors, non-observed variables and rubbish correlations) that makes us harder to see their true relationship with our explaining factors. Also, it is usually not complete (we don't have examples of everything).



        As an example, let's say I am trying to classify boys and girls based on their height, just because that's the only information I have about them. We all know that even though boys are taller on average than girls, there is a huge overlap region so it's impossible to perfectly separate them just with that bit of information. Depending on the density of the data, a sufficiently complex model might be able to achieve a better success rate on this task than is theoretically possible on the training dataset because it could draw boundaries that allow some points to stand alone by themselves. So, if we only have a person who is 2.04 meters tall and she's a woman, then the model could draw a little circle around that area meaning that a random person who is 2.04 meters tall is most likely to be a woman.



        The underlying reason for it all is trusting too much in training data (and in the example, the model says that as there is no man with 2.04 height, then it is only possible for women).



        Underfitting is the opposite problem, in which the model fails to recognize the real complexities in our data (i.e. the non-random changes in our data). The model assumes that noise is greater than it really is and thus uses a too simplistic shape. So, if the dataset has much more girls than boys for whatever reason, then the model could just classify them all like girls.



        In this case, the model didn't trust enough in data and it just assumed that deviations are all noise (and in the example, the model assumes that boys simply do not exist).



        Bottom line is that we face these problems because:




        • We don't have complete information.

        • We don't know how noisy the data is (we don't know how much should we trust it).

        • We don't know in advance the underlying function that generated our data, and thus the optimal model complexity.


        I hope this helps.






        share|cite|improve this answer










        New contributor




        Luis Da Silva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$



        I'll try to answer in the simplest way. Each of those problems has its own main origin:



        Overfitting: Data is noisy, meaning that there are some deviations from reality (because of measurement errors, random influentially factors, non-observed variables and rubbish correlations) that makes us harder to see their true relationship with our explaining factors. Also, it is usually not complete (we don't have examples of everything).



        As an example, let's say I am trying to classify boys and girls based on their height, just because that's the only information I have about them. We all know that even though boys are taller on average than girls, there is a huge overlap region so it's impossible to perfectly separate them just with that bit of information. Depending on the density of the data, a sufficiently complex model might be able to achieve a better success rate on this task than is theoretically possible on the training dataset because it could draw boundaries that allow some points to stand alone by themselves. So, if we only have a person who is 2.04 meters tall and she's a woman, then the model could draw a little circle around that area meaning that a random person who is 2.04 meters tall is most likely to be a woman.



        The underlying reason for it all is trusting too much in training data (and in the example, the model says that as there is no man with 2.04 height, then it is only possible for women).



        Underfitting is the opposite problem, in which the model fails to recognize the real complexities in our data (i.e. the non-random changes in our data). The model assumes that noise is greater than it really is and thus uses a too simplistic shape. So, if the dataset has much more girls than boys for whatever reason, then the model could just classify them all like girls.



        In this case, the model didn't trust enough in data and it just assumed that deviations are all noise (and in the example, the model assumes that boys simply do not exist).



        Bottom line is that we face these problems because:




        • We don't have complete information.

        • We don't know how noisy the data is (we don't know how much should we trust it).

        • We don't know in advance the underlying function that generated our data, and thus the optimal model complexity.


        I hope this helps.







        share|cite|improve this answer










        New contributor




        Luis Da Silva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        share|cite|improve this answer



        share|cite|improve this answer








        edited 16 hours ago





















        New contributor




        Luis Da Silva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        answered 22 hours ago









        Luis Da SilvaLuis Da Silva

        1814




        1814




        New contributor




        Luis Da Silva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        New contributor





        Luis Da Silva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        Luis Da Silva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.








        • 1




          $begingroup$
          welcome to CV. nice answer, which makes me want to del my answer ...
          $endgroup$
          – hxd1011
          22 hours ago






        • 4




          $begingroup$
          Was the "button line" instead of "bottom line" intentional?
          $endgroup$
          – Džuris
          19 hours ago














        • 1




          $begingroup$
          welcome to CV. nice answer, which makes me want to del my answer ...
          $endgroup$
          – hxd1011
          22 hours ago






        • 4




          $begingroup$
          Was the "button line" instead of "bottom line" intentional?
          $endgroup$
          – Džuris
          19 hours ago








        1




        1




        $begingroup$
        welcome to CV. nice answer, which makes me want to del my answer ...
        $endgroup$
        – hxd1011
        22 hours ago




        $begingroup$
        welcome to CV. nice answer, which makes me want to del my answer ...
        $endgroup$
        – hxd1011
        22 hours ago




        4




        4




        $begingroup$
        Was the "button line" instead of "bottom line" intentional?
        $endgroup$
        – Džuris
        19 hours ago




        $begingroup$
        Was the "button line" instead of "bottom line" intentional?
        $endgroup$
        – Džuris
        19 hours ago













        3












        $begingroup$

        Overfitting is when a model estimates the variable you are modeling really well on the original data, but it does not estimate well on new data set (hold out, cross validation, forecasting, etc.). You have too many variables or estimators in your model (dummy variables, etc.) and these cause your model to become too sensitive to the noise in your original data. As a result of overfitting on the noise in your original data, the model predicts poorly.



        Underfitting is when a model does not estimate the variable well in either the original data or new data. Your model is missing some variables that are necessary to better estimate and predict the behavior of your dependent variable.



        The balancing act between over and underfitting is challenging and sometimes without a clear finish line. In modeling econometrics time series, this issue is resolved pretty well with regularization models (LASSO, Ridge Regression, Elastic-Net) that are catered specifically to reducing overfitting by respectively reducing the number of variables in your model, reducing the sensitivity of the coefficients to your data, or a combination of both.






        share|cite|improve this answer









        $endgroup$


















          3












          $begingroup$

          Overfitting is when a model estimates the variable you are modeling really well on the original data, but it does not estimate well on new data set (hold out, cross validation, forecasting, etc.). You have too many variables or estimators in your model (dummy variables, etc.) and these cause your model to become too sensitive to the noise in your original data. As a result of overfitting on the noise in your original data, the model predicts poorly.



          Underfitting is when a model does not estimate the variable well in either the original data or new data. Your model is missing some variables that are necessary to better estimate and predict the behavior of your dependent variable.



          The balancing act between over and underfitting is challenging and sometimes without a clear finish line. In modeling econometrics time series, this issue is resolved pretty well with regularization models (LASSO, Ridge Regression, Elastic-Net) that are catered specifically to reducing overfitting by respectively reducing the number of variables in your model, reducing the sensitivity of the coefficients to your data, or a combination of both.






          share|cite|improve this answer









          $endgroup$
















            3












            3








            3





            $begingroup$

            Overfitting is when a model estimates the variable you are modeling really well on the original data, but it does not estimate well on new data set (hold out, cross validation, forecasting, etc.). You have too many variables or estimators in your model (dummy variables, etc.) and these cause your model to become too sensitive to the noise in your original data. As a result of overfitting on the noise in your original data, the model predicts poorly.



            Underfitting is when a model does not estimate the variable well in either the original data or new data. Your model is missing some variables that are necessary to better estimate and predict the behavior of your dependent variable.



            The balancing act between over and underfitting is challenging and sometimes without a clear finish line. In modeling econometrics time series, this issue is resolved pretty well with regularization models (LASSO, Ridge Regression, Elastic-Net) that are catered specifically to reducing overfitting by respectively reducing the number of variables in your model, reducing the sensitivity of the coefficients to your data, or a combination of both.






            share|cite|improve this answer









            $endgroup$



            Overfitting is when a model estimates the variable you are modeling really well on the original data, but it does not estimate well on new data set (hold out, cross validation, forecasting, etc.). You have too many variables or estimators in your model (dummy variables, etc.) and these cause your model to become too sensitive to the noise in your original data. As a result of overfitting on the noise in your original data, the model predicts poorly.



            Underfitting is when a model does not estimate the variable well in either the original data or new data. Your model is missing some variables that are necessary to better estimate and predict the behavior of your dependent variable.



            The balancing act between over and underfitting is challenging and sometimes without a clear finish line. In modeling econometrics time series, this issue is resolved pretty well with regularization models (LASSO, Ridge Regression, Elastic-Net) that are catered specifically to reducing overfitting by respectively reducing the number of variables in your model, reducing the sensitivity of the coefficients to your data, or a combination of both.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered 15 hours ago









            SympaSympa

            4,16732344




            4,16732344























                2












                $begingroup$

                Perhaps during your research you came across the following equation:



                Error = IrreducibleError + Bias² + Variance.




                Why do we face these two problems in training a model ?




                The learning problem itself is basically a trade-off between bias and variance.




                What are the main reasons for overfitting and underfitting ?




                Short: Noise.



                Long: The irreducible error: Measurement errors/fluctuations in the data as well as the part of the target function that cannot be represented by the model. Remeasuring the target variable or changing the hypothesis space (i.e. selecting a different model) changes this component.



                Edit (to link to the other answers): Model performance as complexity is varied:



                .



                where errorD is the error over the entire distribution D (in practice estimated with test sets).






                share|cite|improve this answer











                $endgroup$









                • 1




                  $begingroup$
                  I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
                  $endgroup$
                  – Gregor
                  13 hours ago
















                2












                $begingroup$

                Perhaps during your research you came across the following equation:



                Error = IrreducibleError + Bias² + Variance.




                Why do we face these two problems in training a model ?




                The learning problem itself is basically a trade-off between bias and variance.




                What are the main reasons for overfitting and underfitting ?




                Short: Noise.



                Long: The irreducible error: Measurement errors/fluctuations in the data as well as the part of the target function that cannot be represented by the model. Remeasuring the target variable or changing the hypothesis space (i.e. selecting a different model) changes this component.



                Edit (to link to the other answers): Model performance as complexity is varied:



                .



                where errorD is the error over the entire distribution D (in practice estimated with test sets).






                share|cite|improve this answer











                $endgroup$









                • 1




                  $begingroup$
                  I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
                  $endgroup$
                  – Gregor
                  13 hours ago














                2












                2








                2





                $begingroup$

                Perhaps during your research you came across the following equation:



                Error = IrreducibleError + Bias² + Variance.




                Why do we face these two problems in training a model ?




                The learning problem itself is basically a trade-off between bias and variance.




                What are the main reasons for overfitting and underfitting ?




                Short: Noise.



                Long: The irreducible error: Measurement errors/fluctuations in the data as well as the part of the target function that cannot be represented by the model. Remeasuring the target variable or changing the hypothesis space (i.e. selecting a different model) changes this component.



                Edit (to link to the other answers): Model performance as complexity is varied:



                .



                where errorD is the error over the entire distribution D (in practice estimated with test sets).






                share|cite|improve this answer











                $endgroup$



                Perhaps during your research you came across the following equation:



                Error = IrreducibleError + Bias² + Variance.




                Why do we face these two problems in training a model ?




                The learning problem itself is basically a trade-off between bias and variance.




                What are the main reasons for overfitting and underfitting ?




                Short: Noise.



                Long: The irreducible error: Measurement errors/fluctuations in the data as well as the part of the target function that cannot be represented by the model. Remeasuring the target variable or changing the hypothesis space (i.e. selecting a different model) changes this component.



                Edit (to link to the other answers): Model performance as complexity is varied:



                .



                where errorD is the error over the entire distribution D (in practice estimated with test sets).







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited 2 hours ago

























                answered 14 hours ago









                lnathanlnathan

                873420




                873420








                • 1




                  $begingroup$
                  I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
                  $endgroup$
                  – Gregor
                  13 hours ago














                • 1




                  $begingroup$
                  I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
                  $endgroup$
                  – Gregor
                  13 hours ago








                1




                1




                $begingroup$
                I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
                $endgroup$
                – Gregor
                13 hours ago




                $begingroup$
                I think you should define your terminology. OP doesn't use the terms "bias" or "variance" in the question, you don't use the terms "overfitting" or "underfitting" in your answer (except in a quote of the question). I think this would be a much clearer answer if you explain the relationship between these terms.
                $endgroup$
                – Gregor
                13 hours ago











                1












                $begingroup$


                What are the main reasons for overfitting and underfitting ?




                For overfitting, the model is too complex to fit the training data well. For underfitting, the model is too simple.




                Why do we face these two problems in training a model ?




                It is hard to pick the "just right" model and parameters for the data.






                share|cite|improve this answer









                $endgroup$


















                  1












                  $begingroup$


                  What are the main reasons for overfitting and underfitting ?




                  For overfitting, the model is too complex to fit the training data well. For underfitting, the model is too simple.




                  Why do we face these two problems in training a model ?




                  It is hard to pick the "just right" model and parameters for the data.






                  share|cite|improve this answer









                  $endgroup$
















                    1












                    1








                    1





                    $begingroup$


                    What are the main reasons for overfitting and underfitting ?




                    For overfitting, the model is too complex to fit the training data well. For underfitting, the model is too simple.




                    Why do we face these two problems in training a model ?




                    It is hard to pick the "just right" model and parameters for the data.






                    share|cite|improve this answer









                    $endgroup$




                    What are the main reasons for overfitting and underfitting ?




                    For overfitting, the model is too complex to fit the training data well. For underfitting, the model is too simple.




                    Why do we face these two problems in training a model ?




                    It is hard to pick the "just right" model and parameters for the data.







                    share|cite|improve this answer












                    share|cite|improve this answer



                    share|cite|improve this answer










                    answered 22 hours ago









                    hxd1011hxd1011

                    18.7k653145




                    18.7k653145























                        1












                        $begingroup$

                        Almost all statistical problems can be stated in the following form:




                        1. Given the data $(y, x)$ find $hat{f}$ which produces $hat{y}=hat{f}(x)$.


                        2. Make this $hat{f}$ as close as possible to "true" $f$, where $f$ is defined as



                        $$y = f(x) + varepsilon$$



                        The temptation is always to make $hat{f}$ produce $hat{y}$ which are very close to the data $y$. But when new data point arrives, or we use data which was not used to construct $hat{f}$ the prediction may be way off. This happens because we are trying to explain $varepsilon$ instead of $f$. When we do this we stray from "true" $f$ and hence when new observation comes in we get a bad prediction. This when overfitting happens.



                        On the other hand when we find $hat{f}$ the question is always maybe we can get a better $tilde{f}$ which produces better fit and is close to "true" $f$? If we can then we underfitted in the first case.



                        If you look at the statistical problem this way, fitting the model is always a balance between underfitting and overfitting and any solution is always a compromise. We face this problem because our data is random and noisy.






                        share|cite|improve this answer









                        $endgroup$


















                          1












                          $begingroup$

                          Almost all statistical problems can be stated in the following form:




                          1. Given the data $(y, x)$ find $hat{f}$ which produces $hat{y}=hat{f}(x)$.


                          2. Make this $hat{f}$ as close as possible to "true" $f$, where $f$ is defined as



                          $$y = f(x) + varepsilon$$



                          The temptation is always to make $hat{f}$ produce $hat{y}$ which are very close to the data $y$. But when new data point arrives, or we use data which was not used to construct $hat{f}$ the prediction may be way off. This happens because we are trying to explain $varepsilon$ instead of $f$. When we do this we stray from "true" $f$ and hence when new observation comes in we get a bad prediction. This when overfitting happens.



                          On the other hand when we find $hat{f}$ the question is always maybe we can get a better $tilde{f}$ which produces better fit and is close to "true" $f$? If we can then we underfitted in the first case.



                          If you look at the statistical problem this way, fitting the model is always a balance between underfitting and overfitting and any solution is always a compromise. We face this problem because our data is random and noisy.






                          share|cite|improve this answer









                          $endgroup$
















                            1












                            1








                            1





                            $begingroup$

                            Almost all statistical problems can be stated in the following form:




                            1. Given the data $(y, x)$ find $hat{f}$ which produces $hat{y}=hat{f}(x)$.


                            2. Make this $hat{f}$ as close as possible to "true" $f$, where $f$ is defined as



                            $$y = f(x) + varepsilon$$



                            The temptation is always to make $hat{f}$ produce $hat{y}$ which are very close to the data $y$. But when new data point arrives, or we use data which was not used to construct $hat{f}$ the prediction may be way off. This happens because we are trying to explain $varepsilon$ instead of $f$. When we do this we stray from "true" $f$ and hence when new observation comes in we get a bad prediction. This when overfitting happens.



                            On the other hand when we find $hat{f}$ the question is always maybe we can get a better $tilde{f}$ which produces better fit and is close to "true" $f$? If we can then we underfitted in the first case.



                            If you look at the statistical problem this way, fitting the model is always a balance between underfitting and overfitting and any solution is always a compromise. We face this problem because our data is random and noisy.






                            share|cite|improve this answer









                            $endgroup$



                            Almost all statistical problems can be stated in the following form:




                            1. Given the data $(y, x)$ find $hat{f}$ which produces $hat{y}=hat{f}(x)$.


                            2. Make this $hat{f}$ as close as possible to "true" $f$, where $f$ is defined as



                            $$y = f(x) + varepsilon$$



                            The temptation is always to make $hat{f}$ produce $hat{y}$ which are very close to the data $y$. But when new data point arrives, or we use data which was not used to construct $hat{f}$ the prediction may be way off. This happens because we are trying to explain $varepsilon$ instead of $f$. When we do this we stray from "true" $f$ and hence when new observation comes in we get a bad prediction. This when overfitting happens.



                            On the other hand when we find $hat{f}$ the question is always maybe we can get a better $tilde{f}$ which produces better fit and is close to "true" $f$? If we can then we underfitted in the first case.



                            If you look at the statistical problem this way, fitting the model is always a balance between underfitting and overfitting and any solution is always a compromise. We face this problem because our data is random and noisy.







                            share|cite|improve this answer












                            share|cite|improve this answer



                            share|cite|improve this answer










                            answered 57 mins ago









                            mpiktasmpiktas

                            29.4k466130




                            29.4k466130






















                                Goktug is a new contributor. Be nice, and check out our Code of Conduct.










                                draft saved

                                draft discarded


















                                Goktug is a new contributor. Be nice, and check out our Code of Conduct.













                                Goktug is a new contributor. Be nice, and check out our Code of Conduct.












                                Goktug is a new contributor. Be nice, and check out our Code of Conduct.
















                                Thanks for contributing an answer to Cross Validated!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f395197%2foverfitting-and-underfitting%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Why does my Macbook overheat and use so much CPU and energy when on YouTube?Why do so many insist on using...

                                Puerta de Hutt Referencias Enlaces externos Menú de navegación15°58′00″S 5°42′00″O /...

                                How to prevent page numbers from appearing on glossaries?How to remove a dot and a page number in the...