Why do neural networks need so many training examples to perform?What can we learn about the human brain from...

Is there a kind of consulting service in Buddhism?

Inalienable or irrefutable

Can we use the stored gravitational potential energy of a building to produce power?

SQL Create Script DDL Table from View

A flower in a hexagon

Using only 1s, make 29 with the minimum number of digits

Am I a Rude Number?

How to approximate rolls for potions of healing using only d6's?

Show that there is no other isomorphim

Why did Jodrell Bank assist the Soviet Union to collect data from their spacecraft in the mid 1960's?

How do you funnel food off a cutting board?

Overfitting and Underfitting

Why zero tolerance on nudity in space?

What to do when being responsible for data protection in your lab, yet advice is ignored?

Which preposition to use with beauty? Of or with?

How should I handle players who ignore the session zero agreement?

How can I improve my fireworks photography?

Can a person refuse a presidential pardon?

Do my Windows system binaries contain sensitive information?

Called into a meeting and told we are being made redundant (laid off) and "not to share outside". Can I tell my partner?

Quenching swords in dragon blood; why?

It took me a lot of time to make this, pls like. (YouTube Comments #1)

Slicing User Stories

How to implement expandbefore, similarly to expandafter?



Why do neural networks need so many training examples to perform?


What can we learn about the human brain from artificial neural networks?Incremental training of Neural NetworksDo Neural Networks need “compound” features?Can a neural network learn a functional, and its functional derivative?Training Neural Networks on variable length vectorsNeural networks: why do we randomize the training set?Deep networks vs shallow networks: why do we need depth?Training Neural Net with examples it misclassifiedOne big neural network or many small neural networks?Early stopping criteria when training neural networksWhy do neural networks need feature selection / engineering?













59












$begingroup$


A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy regardless of color, make, etc. When my son was 2, he was able to identify trams and trains, even though he had seen just a few. Since he was usually confusing one with each other, apparently his neural network was not trained enough, but still.



What is it that artificial neural networks are missing that prevent them from being able to learn way quicker? Is transfer learning an answer?










share|cite|improve this question











$endgroup$








  • 23




    $begingroup$
    Elephants might be a better example than cars. As others have noted, a child may have seen many cars before hearing the label, so if their mind already defines "natural kinds" it now has a label for one. However, a Western child indisputably develops a good elephant-classifying system on the basis of just a few data.
    $endgroup$
    – J.G.
    Feb 24 at 19:14






  • 65




    $begingroup$
    What makes you think that a human child’s brain works like a neural network?
    $endgroup$
    – Paul Wasilewski
    Feb 24 at 21:33






  • 16




    $begingroup$
    A NN can be shown an image of a car. Your child gets a full 3D movie from different perspectives, for several different types of car. Your child also likely has similar examples to distinguish a car from. For instance their baby stroller, toys, etc. Without those, I think your child would have needed more examples.
    $endgroup$
    – Stian Yttervik
    Feb 25 at 11:46






  • 19




    $begingroup$
    @MSalters In the sense of an Artificial Neural Network? Probably not.
    $endgroup$
    – Firebug
    Feb 25 at 12:04






  • 26




    $begingroup$
    "A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy" Such a child has had two full years of experience with things that aren't cars. I'm certain that plays a significant role.
    $endgroup$
    – DarthFennec
    Feb 25 at 17:43
















59












$begingroup$


A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy regardless of color, make, etc. When my son was 2, he was able to identify trams and trains, even though he had seen just a few. Since he was usually confusing one with each other, apparently his neural network was not trained enough, but still.



What is it that artificial neural networks are missing that prevent them from being able to learn way quicker? Is transfer learning an answer?










share|cite|improve this question











$endgroup$








  • 23




    $begingroup$
    Elephants might be a better example than cars. As others have noted, a child may have seen many cars before hearing the label, so if their mind already defines "natural kinds" it now has a label for one. However, a Western child indisputably develops a good elephant-classifying system on the basis of just a few data.
    $endgroup$
    – J.G.
    Feb 24 at 19:14






  • 65




    $begingroup$
    What makes you think that a human child’s brain works like a neural network?
    $endgroup$
    – Paul Wasilewski
    Feb 24 at 21:33






  • 16




    $begingroup$
    A NN can be shown an image of a car. Your child gets a full 3D movie from different perspectives, for several different types of car. Your child also likely has similar examples to distinguish a car from. For instance their baby stroller, toys, etc. Without those, I think your child would have needed more examples.
    $endgroup$
    – Stian Yttervik
    Feb 25 at 11:46






  • 19




    $begingroup$
    @MSalters In the sense of an Artificial Neural Network? Probably not.
    $endgroup$
    – Firebug
    Feb 25 at 12:04






  • 26




    $begingroup$
    "A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy" Such a child has had two full years of experience with things that aren't cars. I'm certain that plays a significant role.
    $endgroup$
    – DarthFennec
    Feb 25 at 17:43














59












59








59


31



$begingroup$


A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy regardless of color, make, etc. When my son was 2, he was able to identify trams and trains, even though he had seen just a few. Since he was usually confusing one with each other, apparently his neural network was not trained enough, but still.



What is it that artificial neural networks are missing that prevent them from being able to learn way quicker? Is transfer learning an answer?










share|cite|improve this question











$endgroup$




A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy regardless of color, make, etc. When my son was 2, he was able to identify trams and trains, even though he had seen just a few. Since he was usually confusing one with each other, apparently his neural network was not trained enough, but still.



What is it that artificial neural networks are missing that prevent them from being able to learn way quicker? Is transfer learning an answer?







neural-networks neuroscience






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Feb 25 at 22:40









smci

88211018




88211018










asked Feb 24 at 14:07









MarcinMarcin

4901510




4901510








  • 23




    $begingroup$
    Elephants might be a better example than cars. As others have noted, a child may have seen many cars before hearing the label, so if their mind already defines "natural kinds" it now has a label for one. However, a Western child indisputably develops a good elephant-classifying system on the basis of just a few data.
    $endgroup$
    – J.G.
    Feb 24 at 19:14






  • 65




    $begingroup$
    What makes you think that a human child’s brain works like a neural network?
    $endgroup$
    – Paul Wasilewski
    Feb 24 at 21:33






  • 16




    $begingroup$
    A NN can be shown an image of a car. Your child gets a full 3D movie from different perspectives, for several different types of car. Your child also likely has similar examples to distinguish a car from. For instance their baby stroller, toys, etc. Without those, I think your child would have needed more examples.
    $endgroup$
    – Stian Yttervik
    Feb 25 at 11:46






  • 19




    $begingroup$
    @MSalters In the sense of an Artificial Neural Network? Probably not.
    $endgroup$
    – Firebug
    Feb 25 at 12:04






  • 26




    $begingroup$
    "A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy" Such a child has had two full years of experience with things that aren't cars. I'm certain that plays a significant role.
    $endgroup$
    – DarthFennec
    Feb 25 at 17:43














  • 23




    $begingroup$
    Elephants might be a better example than cars. As others have noted, a child may have seen many cars before hearing the label, so if their mind already defines "natural kinds" it now has a label for one. However, a Western child indisputably develops a good elephant-classifying system on the basis of just a few data.
    $endgroup$
    – J.G.
    Feb 24 at 19:14






  • 65




    $begingroup$
    What makes you think that a human child’s brain works like a neural network?
    $endgroup$
    – Paul Wasilewski
    Feb 24 at 21:33






  • 16




    $begingroup$
    A NN can be shown an image of a car. Your child gets a full 3D movie from different perspectives, for several different types of car. Your child also likely has similar examples to distinguish a car from. For instance their baby stroller, toys, etc. Without those, I think your child would have needed more examples.
    $endgroup$
    – Stian Yttervik
    Feb 25 at 11:46






  • 19




    $begingroup$
    @MSalters In the sense of an Artificial Neural Network? Probably not.
    $endgroup$
    – Firebug
    Feb 25 at 12:04






  • 26




    $begingroup$
    "A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy" Such a child has had two full years of experience with things that aren't cars. I'm certain that plays a significant role.
    $endgroup$
    – DarthFennec
    Feb 25 at 17:43








23




23




$begingroup$
Elephants might be a better example than cars. As others have noted, a child may have seen many cars before hearing the label, so if their mind already defines "natural kinds" it now has a label for one. However, a Western child indisputably develops a good elephant-classifying system on the basis of just a few data.
$endgroup$
– J.G.
Feb 24 at 19:14




$begingroup$
Elephants might be a better example than cars. As others have noted, a child may have seen many cars before hearing the label, so if their mind already defines "natural kinds" it now has a label for one. However, a Western child indisputably develops a good elephant-classifying system on the basis of just a few data.
$endgroup$
– J.G.
Feb 24 at 19:14




65




65




$begingroup$
What makes you think that a human child’s brain works like a neural network?
$endgroup$
– Paul Wasilewski
Feb 24 at 21:33




$begingroup$
What makes you think that a human child’s brain works like a neural network?
$endgroup$
– Paul Wasilewski
Feb 24 at 21:33




16




16




$begingroup$
A NN can be shown an image of a car. Your child gets a full 3D movie from different perspectives, for several different types of car. Your child also likely has similar examples to distinguish a car from. For instance their baby stroller, toys, etc. Without those, I think your child would have needed more examples.
$endgroup$
– Stian Yttervik
Feb 25 at 11:46




$begingroup$
A NN can be shown an image of a car. Your child gets a full 3D movie from different perspectives, for several different types of car. Your child also likely has similar examples to distinguish a car from. For instance their baby stroller, toys, etc. Without those, I think your child would have needed more examples.
$endgroup$
– Stian Yttervik
Feb 25 at 11:46




19




19




$begingroup$
@MSalters In the sense of an Artificial Neural Network? Probably not.
$endgroup$
– Firebug
Feb 25 at 12:04




$begingroup$
@MSalters In the sense of an Artificial Neural Network? Probably not.
$endgroup$
– Firebug
Feb 25 at 12:04




26




26




$begingroup$
"A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy" Such a child has had two full years of experience with things that aren't cars. I'm certain that plays a significant role.
$endgroup$
– DarthFennec
Feb 25 at 17:43




$begingroup$
"A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy" Such a child has had two full years of experience with things that aren't cars. I'm certain that plays a significant role.
$endgroup$
– DarthFennec
Feb 25 at 17:43










12 Answers
12






active

oldest

votes


















91












$begingroup$

I caution against expecting strong resemblance between biological and artificial neural networks. I think the name "neural networks" is a bit dangerous, because it tricks people into expecting that neurological processes and machine learning should be the same. The differences between biological and artificial neural networks outweigh the similarities.



As an example of how this can go awry, you can also turn the reasoning in the original post on its head. You can train a neural network to learn to recognize cars in an afternoon, provided you have a reasonably fast computer and some amount of training data. You can make this a binary task (car/not car) or a multi-class task (car/tram/bike/airplane/boat) and still be confident in a high level of success.



By contrast, I wouldn't expect a child to be able to pick out a car the day - or even the week - after it's born, even after it has seen "so many training examples." Something is obviously different between a two-year-old and an infant that accounts for its learning ability at two years that is not present at birth, while a neural network is perfectly capable of picking up object classification "immediately after birth." I think that there are two important differences: (1) the relative volumes of training data available and (2) a self-teaching mechanism that develops over time because of abundant training data.





The original post exposes two questions. The title and body of the question ask why neural networks need "so many examples." Relative to a child's experience, neural networks trained using common image benchmarks have comparatively little data.



I will re-phrases the question in the title to something along the lines of



"How does training a neural network for a common image benchmark compare & contrast to the learning experience of a child?"



For the sake of comparison I'll consider the CIFAR-10 data because it is a common image benchmark. The labeled portion is composed of 10 classes of images with 6000 images per class. Each image is 32x32 pixels. If you somehow stacked the labeled images from CIFAR-10 and made a standard 48 fps video, you'd have about 20 minutes of footage.



A child of 2 years who observes the world for 12 hours daily has roughly 263000 minutes (more than 4000 hours) of direct observations of the world, including feedback from adults (labels). (These are just ballpark figures -- I don't know how many minutes a typical two-year-old has spent observing the world.) Moreover, the child will have exposure to many, many objects beyond the 10 classes that comprise CIFAR-10.



So there are a few things at play. One is that the child has exposure to more data overall and a more diverse source of data than the CIFAR-10 model has. Data diversity and data volume are well-recognized as pre-requisites for robust models in general. In this light, it doesn't seem surprising that a neural network is worse at this task than the child, because a neural network trained on CIFAR-10 is positively starved for training data compared to the two-year-old. The image resolution available to a child is better than the 32x32 CIFAR-10 images, so the child is able to learn information about the fine details of objects.



The CIFAR-10 to two-year-old comparison is not perfect because the CIFAR-10 model will likely be trained with multiple passes over the same static images, while the child will see, using binocular vision, how objects are arranged in a three-dimensional world while moving about and with different lighting conditions and perspectives on the same objects.



The anecdote about OP's child implies a second question about how neural networks can be self-teaching. I will rephrase this as



"How can neural networks become self-teaching?"



A child is endowed with some talent for self-teaching, so that new categories of objects can be added over time without having to start over from scratch.




  • OP's remark about transfer-learning names one kind of model adaptation in the machine learning context.


  • In comments, other users have pointed out that one- and few-shot learning* is another machine learning research area.


  • Additionally, reinforcement-learning addresses self-teaching models from a different perspective, essentially allowing robots to undertake trial-and-error experimentation to find optimal strategies for solving specific problems (e.g. playing chess).



It's probably true that all three of these machine learning paradigms are germane to improving how machines adapt to new computer vision tasks. Quickly adapting machine learning models to new tasks is an active area of research. However, because the practical goals of these projects (identify new instances of malware, recognize imposters in passport photos, index the internet) and criteria for success differ from the goals of a child learning about the world, and the fact that one is done in a computer using math and the other is done in organic material using chemistry, direct comparisons between the two will remain fraught.





As an aside, it would be interesting to study how to flip the CIFAR-10 problem around and train a neural network to recognize 6000 objects from 10 examples of each. But even this wouldn't be a fair comparison to 2-year-old, because there would still be a large discrepancy in the total volume, diversity and resolution of the training data.



*We don't presently have a tags for one-shot learning or few-shot learning.






share|cite|improve this answer











$endgroup$









  • 33




    $begingroup$
    To make it a bit more specific, a human child has already had years of training with tens of thousands of example allowing them to determining how objects look when viewed from different angles, how to identify their boundaries, the relationship between apparent size and actual size, and so on.
    $endgroup$
    – David Schwartz
    Feb 25 at 2:25






  • 24




    $begingroup$
    A child's brain is active inside the womb. The baby can identify their parents by sound, after the sound is filtered through water. A new-born baby had months of data to work with before they're born, but they still need years more before they can form a word, then couple more years before they can form a sentence, then couple more for a grammatically correct sentence, etc... learning is very complicated.
    $endgroup$
    – Nelson
    Feb 25 at 4:52








  • 5




    $begingroup$
    @EelcoHoogendoorn it explains the contrast 'child' versus 'neural network' that has been used in the question. The answer is that this is only an apparent contrast. Neural networks do not need that many examples at all, as kids get also many examples (but just in a different way) before they are able to recognize cars.
    $endgroup$
    – Martijn Weterings
    Feb 26 at 13:16








  • 3




    $begingroup$
    @Nelson, I am not sure what the reason is for your comment, but you can change 'years' into 'year'. With 1 year kids speak words, with 2 years the first sentences are spoken, and with 3 years grammar, such as past tense and pronouns, becomes correctly used.
    $endgroup$
    – Martijn Weterings
    Feb 26 at 13:23








  • 1




    $begingroup$
    @EelcoHoogendoorn I think the premise of the question is a case of reasoning from a faulty analogy, so directly address the analogy is responsive. Contrasting biological and artificial neural networks is also responsive, because the answer would outline how biological and artificial neural networks are most similar in their name (both contain the phrase "neural networks") but not similar in their essential characteristics, or at least the characteristics assumed by the question.
    $endgroup$
    – Sycorax
    Feb 26 at 18:27





















49












$begingroup$

First of all, at age two, a child knows a lot about the world and actively applies this knowledge. A child does a lot of "transfer learning" by applying this knowledge to new concepts.



Second, before seeing those five "labeled" examples of cars, a child sees a lot of cars on the street, on TV, toy cars, etc., so also a lot of "unsupervised learning" happens beforehand.



Finally, neural networks have almost nothing in common with the human brain, so there's not much point in comparing them. Also notice that there are algorithms for one-shot learning, and pretty much research on it currently happens.






share|cite|improve this answer











$endgroup$









  • 8




    $begingroup$
    4th point, a child also has more than 100 million years of evolutionary selection towards learning efficiently/accurately.
    $endgroup$
    – csiz
    Feb 27 at 3:39



















38












$begingroup$

One major aspect that I don't see in current answers is evolution.



A child's brain does not learn from scratch. It's similar to asking how deer and giraffe babies can walk a few minutes after birth. Because they are born with their brains already wired for this task. There is some fine-tuning needed of course, but the baby deer doesn't learn to walk from "random initialization".



Similarly, the fact that big moving objects exist and are important to keep track of is something we are born with.



So I think the presupposition of this question is simply false. Human neural networks had the opportunity to see tons of - maybe not cars but - moving, rotating 3D objects with difficult textures and shapes etc., but this happened through lots of generations and the learning took place by evolutionary algorithms, i.e. the ones whose brain was better structured for this task, could live to reproduce with higher chance, leaving the next generation with better and better brain wiring from the start.






share|cite|improve this answer









$endgroup$









  • 8




    $begingroup$
    Fun aside: there's evidence that when it comes to discriminating between different models of cars, we actually leverage the specialized facial recognition center of our brain. It's plausible that, while a child may not distinguish between different models, the implicit presence of a 'face' on a mobile object may cause cars to be categorized as a type of creature and therefore be favored to be identified by evolution, since recognizing mobile objects with faces is helpful to survival.
    $endgroup$
    – Dan Bryant
    Feb 25 at 16:46






  • 7




    $begingroup$
    This answer addresses exactly what I was thinking. Children are not born as blank slates. They come with features that make some patterns easier to recognize, some things easier to learn, etc.
    $endgroup$
    – Eff
    Feb 26 at 7:49






  • 1




    $begingroup$
    While animals that walk right out of the womb are indeed fascinating, such evolutionary hardwiring is thought to be at the very opposite extreme of human learning, which is thought to be the extreme of experience-driven learning in the natural world. Certainly cars will have left minimal evolutionary impact on the evolution of our brains.
    $endgroup$
    – Eelco Hoogendoorn
    Feb 26 at 12:28






  • 5




    $begingroup$
    @EelcoHoogendoorn The ability to learn and understand the environment has been evolutionarily selected for. The brain has been set up by evolution to be extremely efficient at learning. The ability to connect the dots, see patterns, understand shapes and movement, makes inferences, etc.
    $endgroup$
    – Eff
    Feb 26 at 13:58






  • 3




    $begingroup$
    This is a good point, but it's also true that as researchers come to understand this, they build NN's that have hard-coded structures that facilitate certain types of learning. Consider that a convolutional NN has hard coded receptive fields that greatly speed up learning / enhance performance on visual tasks. Those fields could be learned from scratch in a fully connected network, but it's much harder. @EelcoHoogendoorn, human brains are full of structure that facilitates learning.
    $endgroup$
    – gung
    Feb 26 at 16:26



















21












$begingroup$

I don't know much about neural networks but I know a fair bit about babies.



Many 2 year olds have a lot of issues with how general words should be. For instance, it is quite common at that age for kids to use "dog" for any four legged animal. That's a more difficult distinction than "car" - just think how different a poodle looks from a great Dane, for instance and yet they are both "dog" while a cat is not.



And a child at 2 has seen many many more than 5 examples of "car". A kid sees dozens or even hundreds of examples of cars any time the family goes for a drive. And a lot of parents will comment "look at the car" a lot more than 5 times. But kids can also think in ways that they weren't told about. For instance, on the street the kid sees lots of things lined up. His dad says (of one) "look at the shiny car!" and the kid thinks "maybe all those other things lined up are also cars?"






share|cite|improve this answer









$endgroup$









  • 2




    $begingroup$
    Other examples: Taxi's, driving lesson cars, and police cars are the same. Whenever a car is red then it is a firetruck. Campervans are ambulances. A lorry with a loader crane becomes classified as an excavator. The bus that just passed by goes to the train station, so the next bus, which looks the same, must also be going to the train station. And seeing the moon during broad daylight is a very special event.
    $endgroup$
    – Martijn Weterings
    Feb 26 at 13:40



















10












$begingroup$

This is an a fascinating question that I've pondered over a lot also, and can come up with a few explanations why.




  • Neural networks work nothing like the brain. Backpropagation is unique to neural networks, and does not happen in the brain. In that sense, we just don't know the general learning algorithm in our brains. It could be electrical, it could be chemical, it could even be a combination of the two. Neural networks could be considered an inferior form of learning compared to our brains because of how simplified they are.

  • If neural networks are indeed like our brain, then human babies undergo extensive "training" of the early layers, like feature extraction, in their early days. So their neural networks aren't really trained from scratch, but rather the last layer is retrained to add more and more classes and labels.






share|cite|improve this answer








New contributor




sd2017 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$





















    9












    $begingroup$


    A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy regardless of color, make, etc.




    The concept of "instances" gets easily muddied. While a child may have seen 5 unique instances of a car, they have actually seen thousands of thousands of frames, in many differing environments. They have likely seen cars in other contexts. They also have an intuition for the physical world developed over their lifetime - some transfer learning probably happens here. Yet we wrap all of that up into "5 instances."



    Meanwhile, every single frame/image you pass to a CNN is considered an "example." If you apply a consistent definition, both systems are really utilizing a much more similar amount of training data.



    Also, I would like to note that convolutional neural networks - CNNs - are more useful in computer vision than ANNs, and in fact approach human performance in tasks like image classification. Deep learning is (probably) not a panacea, but it does perform admirably in this domain.






    share|cite|improve this answer









    $endgroup$





















      5












      $begingroup$

      As pointed out by others, the data-efficiency of artificial neural networks varies quite substantially, depending on the details. As a matter of fact, there are many so called one-shot learning methods, that can solve the task of labelling trams with quite good accuracy, using only a single labelled sample.



      One way to do this is by so-called transfer learning; a network trained on other labels is usually very effectively adaptable to new labels, since the hard work is breaking down the low level components of the image in a sensible way.



      But we do not infact need such labeled data to perform such task; much like babies dont need nearly as much labeled data as the neural networs you are thinking of do.



      For instance, one such unsupervised methods that I have also successfully applied in other contexts, is to take an unlabeled set of images, randomly rotate them, and train a network to predict which side of the image is 'up'. Without knowing what the visible objects are, or what they are called, this forces the network to learn a tremendous amount of structure about the images; and this can form an excellent basis for much more data-efficient subsequent labeled learning.



      While it is true that artificial networks are quite different from real ones in probably meaningful ways, such as the absence of an obvious analogue of backpropagation, it is very probably true that real neural networks make use of the same tricks, of trying to learn the structure in the data implied by some simple priors.



      One other example which almost certainly plays a role in animals and has also shown great promise in understanding video, is in the assumption that the future should be predictable from the past. Just by starting from that assumption, you can teach a neural network a whole lot. Or on a philosophical level, I am inclined to believe that this assumption underlies almost everything what we consider to be 'knowledge'.



      I am not saying anything new here; but it is relatively new in the sense that these possibilities are too young to have found many applications yet, and do not yet have percolated down to the textbook understanding of 'what an ANN can do'. So to answer the OPs question; ANN's have already closed much of the gap that you describe.






      share|cite|improve this answer









      $endgroup$





















        4












        $begingroup$

        One way to train a deep neural network is to treat it as a stack of auto-encoders (Restricted Boltzmann Machines).



        In theory, an auto-encoder learns in an unsupervised manner: It takes arbitrary, unlabelled input data and processes it to generate output data. Then it takes that output data, and tries to regenerate its input data. It tweaks its nodes' parameters until it can come close to round-tripping its data. If you think about it, the auto-encoder is writing its own automated unit tests. In effect, it is turning its "unlabelled input data" into labelled data: The original data serves as a label for the round-tripped data.



        After the layers of auto-encoders are trained, the neural network is fine-tuned using labelled data to perform its intended function. In effect, these are functional tests.



        The original poster asks why a lot of data is needed to train an artificial neural network, and compares that to the allegedly low amount of training data needed by a two-year-old human. The original poster is comparing apples-to-oranges: The overall training process for the artificial neural net, versus the fine-tuning with labels for the two-year-old.



        But in reality, the two-year old has been training its auto-encoders on random, self-labelled data for more than two years. Babies dream when they are in utero. (So do kittens.) Researchers have described these dreams as involving random neuron firings in the visual processing centers.






        share|cite|improve this answer










        New contributor




        Jasper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$









        • 1




          $begingroup$
          Agreed; except that auto-encoders in practice are not very powerful tools at doing much unsupervised learning at all; everything we know points at there being more going on, so the phrasing 'the two-year old has been training its auto-encoders' should not be taken too literally I suppose.
          $endgroup$
          – Eelco Hoogendoorn
          Feb 26 at 12:34



















        4












        $begingroup$

        We don't learn to "see cars" until we learn to see



        It takes quite a long time and lots of examples for a child to learn how to see objects as such. After that, a child can learn to identify a particular type of object from just a few examples. If you compare a two year old child with a learning system that literally starts from a blank slate, it's an apples and oranges comparison; at that age child has seen thousands of hours of "video footage".



        In a similar manner, it takes artificial neural networks a lot of examples to learn "how to see" but after that it's possible to transfer that knowledge to new examples. Transfer learning is a whole domain of machine learning, and things like "one shot learning" are possible - you can build ANNs that will learn to identify new types of objects that it hasn't seen before from a single example, or to identify a particular person from a single photo of their face. But doing this initial "learning to see" part well requires quite a lot of data.



        Furthermore, there's some evidence that not all training data is equal, namely, that data which you "choose" while learning is more effective than data that's simply provided to you. E.g. Held & Hein twin kitten experiment. https://www.lri.fr/~mbl/ENS/FONDIHM/2013/papers/about-HeldHein63.pdf






        share|cite|improve this answer











        $endgroup$





















          4












          $begingroup$

          One thing that I haven't seen in the answers so far is the fact that one 'instance' of a real world object that is seen by a human child does not corresponds to an instance in the context of NN training.



          Suppose you're standing at a railway intersection with a 5 year old child and watch 5 trains pass within 10 minutes. Now, you could say "My child only saw 5 trains and can reliably identify other trains while a NN needs thousands of images!". While this is likely true, you are completely ignoring the fact that every train your child sees contains A LOT more information than a single image of a train. In fact, the brain of your child is processing several dozens images of the train per second while it is passing by, each from a slightly different angle, different shadows, etc., while a single image will provide the NN with very limited information.
          In this context, your child even has information that is not available to the NN, for example the speed of the train or the sound that the train makes.



          Further, your child can talk and ASK QUESTIONS! "Trains are very long, right?" "Yes.", "And they are very big too, right?" "Yes.". With two simple questions your child learn two very essential features in less than a minute!



          Another important point is object detection. Your child is able to identify immediately on which object, i.e. which part of the image, it needs to focus on, while a NN must learn to detect the relevant object before it can attempt to classify it.






          share|cite|improve this answer











          $endgroup$









          • 3




            $begingroup$
            I would add also that the child has context: it sees a train on the rails, be it at a station, level crossing etc. If it sees a huge (zeppelin size) balloon shaped and painted to look like a train in the sky, it won't say it's a train. It will say it looks like a train, but it won't attach a label "train" to it. I'm skeptical a NN will return a label "train-looking balloon" in this case. Similarly, a child won't mistake a billboard with a train on it with an actual train. A picture of a picture of a train is a picture of a train to a NN – it will return the label "train".
            $endgroup$
            – corey979
            2 days ago



















          3












          $begingroup$

          I would argue the performance is not that different as you might expect, but you ask a great question (see the last paragraph).



          As you mention transfer learning: To compare apples with apples we have to look how many pictures in total and how many pictures of the class of interest a human / neural net "sees".



          1. How many pictures does a human look at?



          Human´s eye movement takes around 200ms which could be seen as kind of an "biological photo". See the talk by computer vision expert Fei-Fei Li: https://www.ted.com/talks/fei_fei_li_how_we_re_teaching_computers_to_understand_pictures#t-362785.



          She adds:




          So by age 3 a child would have seen hundreds of millions of pictures.




          In ImageNet, the leading database for object detection, there are ~14million labeled pictures. So a neural network being trained on ImageNet would have seen as many pictures as a 14000000/5/60/60/24*2 ~ 64 days old baby, so two months old (assuming the baby is awake half of her life).
          To be fair its hard to tell how many of this pictures are labeled. Moreover, the pictures, a baby sees, are not that diverse like in ImageNet. (Probably the baby sees her mother have of the time,... ;).
          However, i think its fair to say that your son will have seen hundreds of millions of pictures (and then applies transfer learning).



          So how many pictures do we need to learn a new category given a solid base of related pictures that can be (transfer) learned from?



          First blog post i found was this: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html. They use 1000 examples per class. I could imagine 2.5 years later even way less is required.
          However, 1000 pictures can be seen by a human in 1000/5/60 in 3.3 minutes.



          You wrote:




          A human child at age 2 needs around 5 instances of a car to be able to
          identify it with reasonable accuracy regardless of color, make, etc.




          That would be equivilant to forty seconds per instance (with various angles of that object to make it comparable).



          To sum up:
          As i mentioned, I had to make a few assumptions. But i think, one can see that the performance is not that different as one might expect.



          However, i believe you ask a great question and here is why:



          2. Would neural network perform better/different if they would work more like brains? (Geoffrey Hinton says yes).



          In an interview https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/, in late 2018, he compares the current implementations of neural networks with the brain. He mentions, in terms of weights, the artificial neural networks are smaller than the brain by a factor of 10.000. Therefore, the brain needs way less iterations of trainings to learn. In order to enable artificial neural networks, to work more like our brains, he follows another trend in hardware, a UK based startup called Graphcore. It reduces the calculation time by a smart way of storing the weights of a neural network. Therefore, more weights can be used and the training time of the artificial neural networks might get reduced.






          share|cite|improve this answer










          New contributor




          BigDataScientist is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$





















            1












            $begingroup$

            I am an expert in this. I am human, I was a baby, I have a car, and I do AI.



            The reason why babies pick up cars with far more limited examples is intuition. The human brain already has structures to deal with 3D rotations. Also, there are two eyes which provide parallax for depth mapping which really helps. You can intuit between a car and a picture of a car, because there is no actual depth to the picture. Hinton (AI researcher) has proposed the idea of Capsule Networks, which would be able to handle things more intuitively. Unfortunately for computers, the training data is (usually) 2D images, arrays of flat pixels. In order to not over-fit, much data is required so the orientation of the cars in the images is generalized. The baby brain can do this already and can recognize a car at any orientation.






            share|cite|improve this answer








            New contributor




            Jason Hihn is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "65"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f394118%2fwhy-do-neural-networks-need-so-many-training-examples-to-perform%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              12 Answers
              12






              active

              oldest

              votes








              12 Answers
              12






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              91












              $begingroup$

              I caution against expecting strong resemblance between biological and artificial neural networks. I think the name "neural networks" is a bit dangerous, because it tricks people into expecting that neurological processes and machine learning should be the same. The differences between biological and artificial neural networks outweigh the similarities.



              As an example of how this can go awry, you can also turn the reasoning in the original post on its head. You can train a neural network to learn to recognize cars in an afternoon, provided you have a reasonably fast computer and some amount of training data. You can make this a binary task (car/not car) or a multi-class task (car/tram/bike/airplane/boat) and still be confident in a high level of success.



              By contrast, I wouldn't expect a child to be able to pick out a car the day - or even the week - after it's born, even after it has seen "so many training examples." Something is obviously different between a two-year-old and an infant that accounts for its learning ability at two years that is not present at birth, while a neural network is perfectly capable of picking up object classification "immediately after birth." I think that there are two important differences: (1) the relative volumes of training data available and (2) a self-teaching mechanism that develops over time because of abundant training data.





              The original post exposes two questions. The title and body of the question ask why neural networks need "so many examples." Relative to a child's experience, neural networks trained using common image benchmarks have comparatively little data.



              I will re-phrases the question in the title to something along the lines of



              "How does training a neural network for a common image benchmark compare & contrast to the learning experience of a child?"



              For the sake of comparison I'll consider the CIFAR-10 data because it is a common image benchmark. The labeled portion is composed of 10 classes of images with 6000 images per class. Each image is 32x32 pixels. If you somehow stacked the labeled images from CIFAR-10 and made a standard 48 fps video, you'd have about 20 minutes of footage.



              A child of 2 years who observes the world for 12 hours daily has roughly 263000 minutes (more than 4000 hours) of direct observations of the world, including feedback from adults (labels). (These are just ballpark figures -- I don't know how many minutes a typical two-year-old has spent observing the world.) Moreover, the child will have exposure to many, many objects beyond the 10 classes that comprise CIFAR-10.



              So there are a few things at play. One is that the child has exposure to more data overall and a more diverse source of data than the CIFAR-10 model has. Data diversity and data volume are well-recognized as pre-requisites for robust models in general. In this light, it doesn't seem surprising that a neural network is worse at this task than the child, because a neural network trained on CIFAR-10 is positively starved for training data compared to the two-year-old. The image resolution available to a child is better than the 32x32 CIFAR-10 images, so the child is able to learn information about the fine details of objects.



              The CIFAR-10 to two-year-old comparison is not perfect because the CIFAR-10 model will likely be trained with multiple passes over the same static images, while the child will see, using binocular vision, how objects are arranged in a three-dimensional world while moving about and with different lighting conditions and perspectives on the same objects.



              The anecdote about OP's child implies a second question about how neural networks can be self-teaching. I will rephrase this as



              "How can neural networks become self-teaching?"



              A child is endowed with some talent for self-teaching, so that new categories of objects can be added over time without having to start over from scratch.




              • OP's remark about transfer-learning names one kind of model adaptation in the machine learning context.


              • In comments, other users have pointed out that one- and few-shot learning* is another machine learning research area.


              • Additionally, reinforcement-learning addresses self-teaching models from a different perspective, essentially allowing robots to undertake trial-and-error experimentation to find optimal strategies for solving specific problems (e.g. playing chess).



              It's probably true that all three of these machine learning paradigms are germane to improving how machines adapt to new computer vision tasks. Quickly adapting machine learning models to new tasks is an active area of research. However, because the practical goals of these projects (identify new instances of malware, recognize imposters in passport photos, index the internet) and criteria for success differ from the goals of a child learning about the world, and the fact that one is done in a computer using math and the other is done in organic material using chemistry, direct comparisons between the two will remain fraught.





              As an aside, it would be interesting to study how to flip the CIFAR-10 problem around and train a neural network to recognize 6000 objects from 10 examples of each. But even this wouldn't be a fair comparison to 2-year-old, because there would still be a large discrepancy in the total volume, diversity and resolution of the training data.



              *We don't presently have a tags for one-shot learning or few-shot learning.






              share|cite|improve this answer











              $endgroup$









              • 33




                $begingroup$
                To make it a bit more specific, a human child has already had years of training with tens of thousands of example allowing them to determining how objects look when viewed from different angles, how to identify their boundaries, the relationship between apparent size and actual size, and so on.
                $endgroup$
                – David Schwartz
                Feb 25 at 2:25






              • 24




                $begingroup$
                A child's brain is active inside the womb. The baby can identify their parents by sound, after the sound is filtered through water. A new-born baby had months of data to work with before they're born, but they still need years more before they can form a word, then couple more years before they can form a sentence, then couple more for a grammatically correct sentence, etc... learning is very complicated.
                $endgroup$
                – Nelson
                Feb 25 at 4:52








              • 5




                $begingroup$
                @EelcoHoogendoorn it explains the contrast 'child' versus 'neural network' that has been used in the question. The answer is that this is only an apparent contrast. Neural networks do not need that many examples at all, as kids get also many examples (but just in a different way) before they are able to recognize cars.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:16








              • 3




                $begingroup$
                @Nelson, I am not sure what the reason is for your comment, but you can change 'years' into 'year'. With 1 year kids speak words, with 2 years the first sentences are spoken, and with 3 years grammar, such as past tense and pronouns, becomes correctly used.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:23








              • 1




                $begingroup$
                @EelcoHoogendoorn I think the premise of the question is a case of reasoning from a faulty analogy, so directly address the analogy is responsive. Contrasting biological and artificial neural networks is also responsive, because the answer would outline how biological and artificial neural networks are most similar in their name (both contain the phrase "neural networks") but not similar in their essential characteristics, or at least the characteristics assumed by the question.
                $endgroup$
                – Sycorax
                Feb 26 at 18:27


















              91












              $begingroup$

              I caution against expecting strong resemblance between biological and artificial neural networks. I think the name "neural networks" is a bit dangerous, because it tricks people into expecting that neurological processes and machine learning should be the same. The differences between biological and artificial neural networks outweigh the similarities.



              As an example of how this can go awry, you can also turn the reasoning in the original post on its head. You can train a neural network to learn to recognize cars in an afternoon, provided you have a reasonably fast computer and some amount of training data. You can make this a binary task (car/not car) or a multi-class task (car/tram/bike/airplane/boat) and still be confident in a high level of success.



              By contrast, I wouldn't expect a child to be able to pick out a car the day - or even the week - after it's born, even after it has seen "so many training examples." Something is obviously different between a two-year-old and an infant that accounts for its learning ability at two years that is not present at birth, while a neural network is perfectly capable of picking up object classification "immediately after birth." I think that there are two important differences: (1) the relative volumes of training data available and (2) a self-teaching mechanism that develops over time because of abundant training data.





              The original post exposes two questions. The title and body of the question ask why neural networks need "so many examples." Relative to a child's experience, neural networks trained using common image benchmarks have comparatively little data.



              I will re-phrases the question in the title to something along the lines of



              "How does training a neural network for a common image benchmark compare & contrast to the learning experience of a child?"



              For the sake of comparison I'll consider the CIFAR-10 data because it is a common image benchmark. The labeled portion is composed of 10 classes of images with 6000 images per class. Each image is 32x32 pixels. If you somehow stacked the labeled images from CIFAR-10 and made a standard 48 fps video, you'd have about 20 minutes of footage.



              A child of 2 years who observes the world for 12 hours daily has roughly 263000 minutes (more than 4000 hours) of direct observations of the world, including feedback from adults (labels). (These are just ballpark figures -- I don't know how many minutes a typical two-year-old has spent observing the world.) Moreover, the child will have exposure to many, many objects beyond the 10 classes that comprise CIFAR-10.



              So there are a few things at play. One is that the child has exposure to more data overall and a more diverse source of data than the CIFAR-10 model has. Data diversity and data volume are well-recognized as pre-requisites for robust models in general. In this light, it doesn't seem surprising that a neural network is worse at this task than the child, because a neural network trained on CIFAR-10 is positively starved for training data compared to the two-year-old. The image resolution available to a child is better than the 32x32 CIFAR-10 images, so the child is able to learn information about the fine details of objects.



              The CIFAR-10 to two-year-old comparison is not perfect because the CIFAR-10 model will likely be trained with multiple passes over the same static images, while the child will see, using binocular vision, how objects are arranged in a three-dimensional world while moving about and with different lighting conditions and perspectives on the same objects.



              The anecdote about OP's child implies a second question about how neural networks can be self-teaching. I will rephrase this as



              "How can neural networks become self-teaching?"



              A child is endowed with some talent for self-teaching, so that new categories of objects can be added over time without having to start over from scratch.




              • OP's remark about transfer-learning names one kind of model adaptation in the machine learning context.


              • In comments, other users have pointed out that one- and few-shot learning* is another machine learning research area.


              • Additionally, reinforcement-learning addresses self-teaching models from a different perspective, essentially allowing robots to undertake trial-and-error experimentation to find optimal strategies for solving specific problems (e.g. playing chess).



              It's probably true that all three of these machine learning paradigms are germane to improving how machines adapt to new computer vision tasks. Quickly adapting machine learning models to new tasks is an active area of research. However, because the practical goals of these projects (identify new instances of malware, recognize imposters in passport photos, index the internet) and criteria for success differ from the goals of a child learning about the world, and the fact that one is done in a computer using math and the other is done in organic material using chemistry, direct comparisons between the two will remain fraught.





              As an aside, it would be interesting to study how to flip the CIFAR-10 problem around and train a neural network to recognize 6000 objects from 10 examples of each. But even this wouldn't be a fair comparison to 2-year-old, because there would still be a large discrepancy in the total volume, diversity and resolution of the training data.



              *We don't presently have a tags for one-shot learning or few-shot learning.






              share|cite|improve this answer











              $endgroup$









              • 33




                $begingroup$
                To make it a bit more specific, a human child has already had years of training with tens of thousands of example allowing them to determining how objects look when viewed from different angles, how to identify their boundaries, the relationship between apparent size and actual size, and so on.
                $endgroup$
                – David Schwartz
                Feb 25 at 2:25






              • 24




                $begingroup$
                A child's brain is active inside the womb. The baby can identify their parents by sound, after the sound is filtered through water. A new-born baby had months of data to work with before they're born, but they still need years more before they can form a word, then couple more years before they can form a sentence, then couple more for a grammatically correct sentence, etc... learning is very complicated.
                $endgroup$
                – Nelson
                Feb 25 at 4:52








              • 5




                $begingroup$
                @EelcoHoogendoorn it explains the contrast 'child' versus 'neural network' that has been used in the question. The answer is that this is only an apparent contrast. Neural networks do not need that many examples at all, as kids get also many examples (but just in a different way) before they are able to recognize cars.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:16








              • 3




                $begingroup$
                @Nelson, I am not sure what the reason is for your comment, but you can change 'years' into 'year'. With 1 year kids speak words, with 2 years the first sentences are spoken, and with 3 years grammar, such as past tense and pronouns, becomes correctly used.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:23








              • 1




                $begingroup$
                @EelcoHoogendoorn I think the premise of the question is a case of reasoning from a faulty analogy, so directly address the analogy is responsive. Contrasting biological and artificial neural networks is also responsive, because the answer would outline how biological and artificial neural networks are most similar in their name (both contain the phrase "neural networks") but not similar in their essential characteristics, or at least the characteristics assumed by the question.
                $endgroup$
                – Sycorax
                Feb 26 at 18:27
















              91












              91








              91





              $begingroup$

              I caution against expecting strong resemblance between biological and artificial neural networks. I think the name "neural networks" is a bit dangerous, because it tricks people into expecting that neurological processes and machine learning should be the same. The differences between biological and artificial neural networks outweigh the similarities.



              As an example of how this can go awry, you can also turn the reasoning in the original post on its head. You can train a neural network to learn to recognize cars in an afternoon, provided you have a reasonably fast computer and some amount of training data. You can make this a binary task (car/not car) or a multi-class task (car/tram/bike/airplane/boat) and still be confident in a high level of success.



              By contrast, I wouldn't expect a child to be able to pick out a car the day - or even the week - after it's born, even after it has seen "so many training examples." Something is obviously different between a two-year-old and an infant that accounts for its learning ability at two years that is not present at birth, while a neural network is perfectly capable of picking up object classification "immediately after birth." I think that there are two important differences: (1) the relative volumes of training data available and (2) a self-teaching mechanism that develops over time because of abundant training data.





              The original post exposes two questions. The title and body of the question ask why neural networks need "so many examples." Relative to a child's experience, neural networks trained using common image benchmarks have comparatively little data.



              I will re-phrases the question in the title to something along the lines of



              "How does training a neural network for a common image benchmark compare & contrast to the learning experience of a child?"



              For the sake of comparison I'll consider the CIFAR-10 data because it is a common image benchmark. The labeled portion is composed of 10 classes of images with 6000 images per class. Each image is 32x32 pixels. If you somehow stacked the labeled images from CIFAR-10 and made a standard 48 fps video, you'd have about 20 minutes of footage.



              A child of 2 years who observes the world for 12 hours daily has roughly 263000 minutes (more than 4000 hours) of direct observations of the world, including feedback from adults (labels). (These are just ballpark figures -- I don't know how many minutes a typical two-year-old has spent observing the world.) Moreover, the child will have exposure to many, many objects beyond the 10 classes that comprise CIFAR-10.



              So there are a few things at play. One is that the child has exposure to more data overall and a more diverse source of data than the CIFAR-10 model has. Data diversity and data volume are well-recognized as pre-requisites for robust models in general. In this light, it doesn't seem surprising that a neural network is worse at this task than the child, because a neural network trained on CIFAR-10 is positively starved for training data compared to the two-year-old. The image resolution available to a child is better than the 32x32 CIFAR-10 images, so the child is able to learn information about the fine details of objects.



              The CIFAR-10 to two-year-old comparison is not perfect because the CIFAR-10 model will likely be trained with multiple passes over the same static images, while the child will see, using binocular vision, how objects are arranged in a three-dimensional world while moving about and with different lighting conditions and perspectives on the same objects.



              The anecdote about OP's child implies a second question about how neural networks can be self-teaching. I will rephrase this as



              "How can neural networks become self-teaching?"



              A child is endowed with some talent for self-teaching, so that new categories of objects can be added over time without having to start over from scratch.




              • OP's remark about transfer-learning names one kind of model adaptation in the machine learning context.


              • In comments, other users have pointed out that one- and few-shot learning* is another machine learning research area.


              • Additionally, reinforcement-learning addresses self-teaching models from a different perspective, essentially allowing robots to undertake trial-and-error experimentation to find optimal strategies for solving specific problems (e.g. playing chess).



              It's probably true that all three of these machine learning paradigms are germane to improving how machines adapt to new computer vision tasks. Quickly adapting machine learning models to new tasks is an active area of research. However, because the practical goals of these projects (identify new instances of malware, recognize imposters in passport photos, index the internet) and criteria for success differ from the goals of a child learning about the world, and the fact that one is done in a computer using math and the other is done in organic material using chemistry, direct comparisons between the two will remain fraught.





              As an aside, it would be interesting to study how to flip the CIFAR-10 problem around and train a neural network to recognize 6000 objects from 10 examples of each. But even this wouldn't be a fair comparison to 2-year-old, because there would still be a large discrepancy in the total volume, diversity and resolution of the training data.



              *We don't presently have a tags for one-shot learning or few-shot learning.






              share|cite|improve this answer











              $endgroup$



              I caution against expecting strong resemblance between biological and artificial neural networks. I think the name "neural networks" is a bit dangerous, because it tricks people into expecting that neurological processes and machine learning should be the same. The differences between biological and artificial neural networks outweigh the similarities.



              As an example of how this can go awry, you can also turn the reasoning in the original post on its head. You can train a neural network to learn to recognize cars in an afternoon, provided you have a reasonably fast computer and some amount of training data. You can make this a binary task (car/not car) or a multi-class task (car/tram/bike/airplane/boat) and still be confident in a high level of success.



              By contrast, I wouldn't expect a child to be able to pick out a car the day - or even the week - after it's born, even after it has seen "so many training examples." Something is obviously different between a two-year-old and an infant that accounts for its learning ability at two years that is not present at birth, while a neural network is perfectly capable of picking up object classification "immediately after birth." I think that there are two important differences: (1) the relative volumes of training data available and (2) a self-teaching mechanism that develops over time because of abundant training data.





              The original post exposes two questions. The title and body of the question ask why neural networks need "so many examples." Relative to a child's experience, neural networks trained using common image benchmarks have comparatively little data.



              I will re-phrases the question in the title to something along the lines of



              "How does training a neural network for a common image benchmark compare & contrast to the learning experience of a child?"



              For the sake of comparison I'll consider the CIFAR-10 data because it is a common image benchmark. The labeled portion is composed of 10 classes of images with 6000 images per class. Each image is 32x32 pixels. If you somehow stacked the labeled images from CIFAR-10 and made a standard 48 fps video, you'd have about 20 minutes of footage.



              A child of 2 years who observes the world for 12 hours daily has roughly 263000 minutes (more than 4000 hours) of direct observations of the world, including feedback from adults (labels). (These are just ballpark figures -- I don't know how many minutes a typical two-year-old has spent observing the world.) Moreover, the child will have exposure to many, many objects beyond the 10 classes that comprise CIFAR-10.



              So there are a few things at play. One is that the child has exposure to more data overall and a more diverse source of data than the CIFAR-10 model has. Data diversity and data volume are well-recognized as pre-requisites for robust models in general. In this light, it doesn't seem surprising that a neural network is worse at this task than the child, because a neural network trained on CIFAR-10 is positively starved for training data compared to the two-year-old. The image resolution available to a child is better than the 32x32 CIFAR-10 images, so the child is able to learn information about the fine details of objects.



              The CIFAR-10 to two-year-old comparison is not perfect because the CIFAR-10 model will likely be trained with multiple passes over the same static images, while the child will see, using binocular vision, how objects are arranged in a three-dimensional world while moving about and with different lighting conditions and perspectives on the same objects.



              The anecdote about OP's child implies a second question about how neural networks can be self-teaching. I will rephrase this as



              "How can neural networks become self-teaching?"



              A child is endowed with some talent for self-teaching, so that new categories of objects can be added over time without having to start over from scratch.




              • OP's remark about transfer-learning names one kind of model adaptation in the machine learning context.


              • In comments, other users have pointed out that one- and few-shot learning* is another machine learning research area.


              • Additionally, reinforcement-learning addresses self-teaching models from a different perspective, essentially allowing robots to undertake trial-and-error experimentation to find optimal strategies for solving specific problems (e.g. playing chess).



              It's probably true that all three of these machine learning paradigms are germane to improving how machines adapt to new computer vision tasks. Quickly adapting machine learning models to new tasks is an active area of research. However, because the practical goals of these projects (identify new instances of malware, recognize imposters in passport photos, index the internet) and criteria for success differ from the goals of a child learning about the world, and the fact that one is done in a computer using math and the other is done in organic material using chemistry, direct comparisons between the two will remain fraught.





              As an aside, it would be interesting to study how to flip the CIFAR-10 problem around and train a neural network to recognize 6000 objects from 10 examples of each. But even this wouldn't be a fair comparison to 2-year-old, because there would still be a large discrepancy in the total volume, diversity and resolution of the training data.



              *We don't presently have a tags for one-shot learning or few-shot learning.







              share|cite|improve this answer














              share|cite|improve this answer



              share|cite|improve this answer








              edited yesterday

























              answered Feb 24 at 15:44









              SycoraxSycorax

              41.2k12104205




              41.2k12104205








              • 33




                $begingroup$
                To make it a bit more specific, a human child has already had years of training with tens of thousands of example allowing them to determining how objects look when viewed from different angles, how to identify their boundaries, the relationship between apparent size and actual size, and so on.
                $endgroup$
                – David Schwartz
                Feb 25 at 2:25






              • 24




                $begingroup$
                A child's brain is active inside the womb. The baby can identify their parents by sound, after the sound is filtered through water. A new-born baby had months of data to work with before they're born, but they still need years more before they can form a word, then couple more years before they can form a sentence, then couple more for a grammatically correct sentence, etc... learning is very complicated.
                $endgroup$
                – Nelson
                Feb 25 at 4:52








              • 5




                $begingroup$
                @EelcoHoogendoorn it explains the contrast 'child' versus 'neural network' that has been used in the question. The answer is that this is only an apparent contrast. Neural networks do not need that many examples at all, as kids get also many examples (but just in a different way) before they are able to recognize cars.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:16








              • 3




                $begingroup$
                @Nelson, I am not sure what the reason is for your comment, but you can change 'years' into 'year'. With 1 year kids speak words, with 2 years the first sentences are spoken, and with 3 years grammar, such as past tense and pronouns, becomes correctly used.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:23








              • 1




                $begingroup$
                @EelcoHoogendoorn I think the premise of the question is a case of reasoning from a faulty analogy, so directly address the analogy is responsive. Contrasting biological and artificial neural networks is also responsive, because the answer would outline how biological and artificial neural networks are most similar in their name (both contain the phrase "neural networks") but not similar in their essential characteristics, or at least the characteristics assumed by the question.
                $endgroup$
                – Sycorax
                Feb 26 at 18:27
















              • 33




                $begingroup$
                To make it a bit more specific, a human child has already had years of training with tens of thousands of example allowing them to determining how objects look when viewed from different angles, how to identify their boundaries, the relationship between apparent size and actual size, and so on.
                $endgroup$
                – David Schwartz
                Feb 25 at 2:25






              • 24




                $begingroup$
                A child's brain is active inside the womb. The baby can identify their parents by sound, after the sound is filtered through water. A new-born baby had months of data to work with before they're born, but they still need years more before they can form a word, then couple more years before they can form a sentence, then couple more for a grammatically correct sentence, etc... learning is very complicated.
                $endgroup$
                – Nelson
                Feb 25 at 4:52








              • 5




                $begingroup$
                @EelcoHoogendoorn it explains the contrast 'child' versus 'neural network' that has been used in the question. The answer is that this is only an apparent contrast. Neural networks do not need that many examples at all, as kids get also many examples (but just in a different way) before they are able to recognize cars.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:16








              • 3




                $begingroup$
                @Nelson, I am not sure what the reason is for your comment, but you can change 'years' into 'year'. With 1 year kids speak words, with 2 years the first sentences are spoken, and with 3 years grammar, such as past tense and pronouns, becomes correctly used.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:23








              • 1




                $begingroup$
                @EelcoHoogendoorn I think the premise of the question is a case of reasoning from a faulty analogy, so directly address the analogy is responsive. Contrasting biological and artificial neural networks is also responsive, because the answer would outline how biological and artificial neural networks are most similar in their name (both contain the phrase "neural networks") but not similar in their essential characteristics, or at least the characteristics assumed by the question.
                $endgroup$
                – Sycorax
                Feb 26 at 18:27










              33




              33




              $begingroup$
              To make it a bit more specific, a human child has already had years of training with tens of thousands of example allowing them to determining how objects look when viewed from different angles, how to identify their boundaries, the relationship between apparent size and actual size, and so on.
              $endgroup$
              – David Schwartz
              Feb 25 at 2:25




              $begingroup$
              To make it a bit more specific, a human child has already had years of training with tens of thousands of example allowing them to determining how objects look when viewed from different angles, how to identify their boundaries, the relationship between apparent size and actual size, and so on.
              $endgroup$
              – David Schwartz
              Feb 25 at 2:25




              24




              24




              $begingroup$
              A child's brain is active inside the womb. The baby can identify their parents by sound, after the sound is filtered through water. A new-born baby had months of data to work with before they're born, but they still need years more before they can form a word, then couple more years before they can form a sentence, then couple more for a grammatically correct sentence, etc... learning is very complicated.
              $endgroup$
              – Nelson
              Feb 25 at 4:52






              $begingroup$
              A child's brain is active inside the womb. The baby can identify their parents by sound, after the sound is filtered through water. A new-born baby had months of data to work with before they're born, but they still need years more before they can form a word, then couple more years before they can form a sentence, then couple more for a grammatically correct sentence, etc... learning is very complicated.
              $endgroup$
              – Nelson
              Feb 25 at 4:52






              5




              5




              $begingroup$
              @EelcoHoogendoorn it explains the contrast 'child' versus 'neural network' that has been used in the question. The answer is that this is only an apparent contrast. Neural networks do not need that many examples at all, as kids get also many examples (but just in a different way) before they are able to recognize cars.
              $endgroup$
              – Martijn Weterings
              Feb 26 at 13:16






              $begingroup$
              @EelcoHoogendoorn it explains the contrast 'child' versus 'neural network' that has been used in the question. The answer is that this is only an apparent contrast. Neural networks do not need that many examples at all, as kids get also many examples (but just in a different way) before they are able to recognize cars.
              $endgroup$
              – Martijn Weterings
              Feb 26 at 13:16






              3




              3




              $begingroup$
              @Nelson, I am not sure what the reason is for your comment, but you can change 'years' into 'year'. With 1 year kids speak words, with 2 years the first sentences are spoken, and with 3 years grammar, such as past tense and pronouns, becomes correctly used.
              $endgroup$
              – Martijn Weterings
              Feb 26 at 13:23






              $begingroup$
              @Nelson, I am not sure what the reason is for your comment, but you can change 'years' into 'year'. With 1 year kids speak words, with 2 years the first sentences are spoken, and with 3 years grammar, such as past tense and pronouns, becomes correctly used.
              $endgroup$
              – Martijn Weterings
              Feb 26 at 13:23






              1




              1




              $begingroup$
              @EelcoHoogendoorn I think the premise of the question is a case of reasoning from a faulty analogy, so directly address the analogy is responsive. Contrasting biological and artificial neural networks is also responsive, because the answer would outline how biological and artificial neural networks are most similar in their name (both contain the phrase "neural networks") but not similar in their essential characteristics, or at least the characteristics assumed by the question.
              $endgroup$
              – Sycorax
              Feb 26 at 18:27






              $begingroup$
              @EelcoHoogendoorn I think the premise of the question is a case of reasoning from a faulty analogy, so directly address the analogy is responsive. Contrasting biological and artificial neural networks is also responsive, because the answer would outline how biological and artificial neural networks are most similar in their name (both contain the phrase "neural networks") but not similar in their essential characteristics, or at least the characteristics assumed by the question.
              $endgroup$
              – Sycorax
              Feb 26 at 18:27















              49












              $begingroup$

              First of all, at age two, a child knows a lot about the world and actively applies this knowledge. A child does a lot of "transfer learning" by applying this knowledge to new concepts.



              Second, before seeing those five "labeled" examples of cars, a child sees a lot of cars on the street, on TV, toy cars, etc., so also a lot of "unsupervised learning" happens beforehand.



              Finally, neural networks have almost nothing in common with the human brain, so there's not much point in comparing them. Also notice that there are algorithms for one-shot learning, and pretty much research on it currently happens.






              share|cite|improve this answer











              $endgroup$









              • 8




                $begingroup$
                4th point, a child also has more than 100 million years of evolutionary selection towards learning efficiently/accurately.
                $endgroup$
                – csiz
                Feb 27 at 3:39
















              49












              $begingroup$

              First of all, at age two, a child knows a lot about the world and actively applies this knowledge. A child does a lot of "transfer learning" by applying this knowledge to new concepts.



              Second, before seeing those five "labeled" examples of cars, a child sees a lot of cars on the street, on TV, toy cars, etc., so also a lot of "unsupervised learning" happens beforehand.



              Finally, neural networks have almost nothing in common with the human brain, so there's not much point in comparing them. Also notice that there are algorithms for one-shot learning, and pretty much research on it currently happens.






              share|cite|improve this answer











              $endgroup$









              • 8




                $begingroup$
                4th point, a child also has more than 100 million years of evolutionary selection towards learning efficiently/accurately.
                $endgroup$
                – csiz
                Feb 27 at 3:39














              49












              49








              49





              $begingroup$

              First of all, at age two, a child knows a lot about the world and actively applies this knowledge. A child does a lot of "transfer learning" by applying this knowledge to new concepts.



              Second, before seeing those five "labeled" examples of cars, a child sees a lot of cars on the street, on TV, toy cars, etc., so also a lot of "unsupervised learning" happens beforehand.



              Finally, neural networks have almost nothing in common with the human brain, so there's not much point in comparing them. Also notice that there are algorithms for one-shot learning, and pretty much research on it currently happens.






              share|cite|improve this answer











              $endgroup$



              First of all, at age two, a child knows a lot about the world and actively applies this knowledge. A child does a lot of "transfer learning" by applying this knowledge to new concepts.



              Second, before seeing those five "labeled" examples of cars, a child sees a lot of cars on the street, on TV, toy cars, etc., so also a lot of "unsupervised learning" happens beforehand.



              Finally, neural networks have almost nothing in common with the human brain, so there's not much point in comparing them. Also notice that there are algorithms for one-shot learning, and pretty much research on it currently happens.







              share|cite|improve this answer














              share|cite|improve this answer



              share|cite|improve this answer








              edited Feb 27 at 14:31









              Peter Mortensen

              20128




              20128










              answered Feb 24 at 15:19









              TimTim

              58.6k9128221




              58.6k9128221








              • 8




                $begingroup$
                4th point, a child also has more than 100 million years of evolutionary selection towards learning efficiently/accurately.
                $endgroup$
                – csiz
                Feb 27 at 3:39














              • 8




                $begingroup$
                4th point, a child also has more than 100 million years of evolutionary selection towards learning efficiently/accurately.
                $endgroup$
                – csiz
                Feb 27 at 3:39








              8




              8




              $begingroup$
              4th point, a child also has more than 100 million years of evolutionary selection towards learning efficiently/accurately.
              $endgroup$
              – csiz
              Feb 27 at 3:39




              $begingroup$
              4th point, a child also has more than 100 million years of evolutionary selection towards learning efficiently/accurately.
              $endgroup$
              – csiz
              Feb 27 at 3:39











              38












              $begingroup$

              One major aspect that I don't see in current answers is evolution.



              A child's brain does not learn from scratch. It's similar to asking how deer and giraffe babies can walk a few minutes after birth. Because they are born with their brains already wired for this task. There is some fine-tuning needed of course, but the baby deer doesn't learn to walk from "random initialization".



              Similarly, the fact that big moving objects exist and are important to keep track of is something we are born with.



              So I think the presupposition of this question is simply false. Human neural networks had the opportunity to see tons of - maybe not cars but - moving, rotating 3D objects with difficult textures and shapes etc., but this happened through lots of generations and the learning took place by evolutionary algorithms, i.e. the ones whose brain was better structured for this task, could live to reproduce with higher chance, leaving the next generation with better and better brain wiring from the start.






              share|cite|improve this answer









              $endgroup$









              • 8




                $begingroup$
                Fun aside: there's evidence that when it comes to discriminating between different models of cars, we actually leverage the specialized facial recognition center of our brain. It's plausible that, while a child may not distinguish between different models, the implicit presence of a 'face' on a mobile object may cause cars to be categorized as a type of creature and therefore be favored to be identified by evolution, since recognizing mobile objects with faces is helpful to survival.
                $endgroup$
                – Dan Bryant
                Feb 25 at 16:46






              • 7




                $begingroup$
                This answer addresses exactly what I was thinking. Children are not born as blank slates. They come with features that make some patterns easier to recognize, some things easier to learn, etc.
                $endgroup$
                – Eff
                Feb 26 at 7:49






              • 1




                $begingroup$
                While animals that walk right out of the womb are indeed fascinating, such evolutionary hardwiring is thought to be at the very opposite extreme of human learning, which is thought to be the extreme of experience-driven learning in the natural world. Certainly cars will have left minimal evolutionary impact on the evolution of our brains.
                $endgroup$
                – Eelco Hoogendoorn
                Feb 26 at 12:28






              • 5




                $begingroup$
                @EelcoHoogendoorn The ability to learn and understand the environment has been evolutionarily selected for. The brain has been set up by evolution to be extremely efficient at learning. The ability to connect the dots, see patterns, understand shapes and movement, makes inferences, etc.
                $endgroup$
                – Eff
                Feb 26 at 13:58






              • 3




                $begingroup$
                This is a good point, but it's also true that as researchers come to understand this, they build NN's that have hard-coded structures that facilitate certain types of learning. Consider that a convolutional NN has hard coded receptive fields that greatly speed up learning / enhance performance on visual tasks. Those fields could be learned from scratch in a fully connected network, but it's much harder. @EelcoHoogendoorn, human brains are full of structure that facilitates learning.
                $endgroup$
                – gung
                Feb 26 at 16:26
















              38












              $begingroup$

              One major aspect that I don't see in current answers is evolution.



              A child's brain does not learn from scratch. It's similar to asking how deer and giraffe babies can walk a few minutes after birth. Because they are born with their brains already wired for this task. There is some fine-tuning needed of course, but the baby deer doesn't learn to walk from "random initialization".



              Similarly, the fact that big moving objects exist and are important to keep track of is something we are born with.



              So I think the presupposition of this question is simply false. Human neural networks had the opportunity to see tons of - maybe not cars but - moving, rotating 3D objects with difficult textures and shapes etc., but this happened through lots of generations and the learning took place by evolutionary algorithms, i.e. the ones whose brain was better structured for this task, could live to reproduce with higher chance, leaving the next generation with better and better brain wiring from the start.






              share|cite|improve this answer









              $endgroup$









              • 8




                $begingroup$
                Fun aside: there's evidence that when it comes to discriminating between different models of cars, we actually leverage the specialized facial recognition center of our brain. It's plausible that, while a child may not distinguish between different models, the implicit presence of a 'face' on a mobile object may cause cars to be categorized as a type of creature and therefore be favored to be identified by evolution, since recognizing mobile objects with faces is helpful to survival.
                $endgroup$
                – Dan Bryant
                Feb 25 at 16:46






              • 7




                $begingroup$
                This answer addresses exactly what I was thinking. Children are not born as blank slates. They come with features that make some patterns easier to recognize, some things easier to learn, etc.
                $endgroup$
                – Eff
                Feb 26 at 7:49






              • 1




                $begingroup$
                While animals that walk right out of the womb are indeed fascinating, such evolutionary hardwiring is thought to be at the very opposite extreme of human learning, which is thought to be the extreme of experience-driven learning in the natural world. Certainly cars will have left minimal evolutionary impact on the evolution of our brains.
                $endgroup$
                – Eelco Hoogendoorn
                Feb 26 at 12:28






              • 5




                $begingroup$
                @EelcoHoogendoorn The ability to learn and understand the environment has been evolutionarily selected for. The brain has been set up by evolution to be extremely efficient at learning. The ability to connect the dots, see patterns, understand shapes and movement, makes inferences, etc.
                $endgroup$
                – Eff
                Feb 26 at 13:58






              • 3




                $begingroup$
                This is a good point, but it's also true that as researchers come to understand this, they build NN's that have hard-coded structures that facilitate certain types of learning. Consider that a convolutional NN has hard coded receptive fields that greatly speed up learning / enhance performance on visual tasks. Those fields could be learned from scratch in a fully connected network, but it's much harder. @EelcoHoogendoorn, human brains are full of structure that facilitates learning.
                $endgroup$
                – gung
                Feb 26 at 16:26














              38












              38








              38





              $begingroup$

              One major aspect that I don't see in current answers is evolution.



              A child's brain does not learn from scratch. It's similar to asking how deer and giraffe babies can walk a few minutes after birth. Because they are born with their brains already wired for this task. There is some fine-tuning needed of course, but the baby deer doesn't learn to walk from "random initialization".



              Similarly, the fact that big moving objects exist and are important to keep track of is something we are born with.



              So I think the presupposition of this question is simply false. Human neural networks had the opportunity to see tons of - maybe not cars but - moving, rotating 3D objects with difficult textures and shapes etc., but this happened through lots of generations and the learning took place by evolutionary algorithms, i.e. the ones whose brain was better structured for this task, could live to reproduce with higher chance, leaving the next generation with better and better brain wiring from the start.






              share|cite|improve this answer









              $endgroup$



              One major aspect that I don't see in current answers is evolution.



              A child's brain does not learn from scratch. It's similar to asking how deer and giraffe babies can walk a few minutes after birth. Because they are born with their brains already wired for this task. There is some fine-tuning needed of course, but the baby deer doesn't learn to walk from "random initialization".



              Similarly, the fact that big moving objects exist and are important to keep track of is something we are born with.



              So I think the presupposition of this question is simply false. Human neural networks had the opportunity to see tons of - maybe not cars but - moving, rotating 3D objects with difficult textures and shapes etc., but this happened through lots of generations and the learning took place by evolutionary algorithms, i.e. the ones whose brain was better structured for this task, could live to reproduce with higher chance, leaving the next generation with better and better brain wiring from the start.







              share|cite|improve this answer












              share|cite|improve this answer



              share|cite|improve this answer










              answered Feb 25 at 12:51









              isarandiisarandi

              56439




              56439








              • 8




                $begingroup$
                Fun aside: there's evidence that when it comes to discriminating between different models of cars, we actually leverage the specialized facial recognition center of our brain. It's plausible that, while a child may not distinguish between different models, the implicit presence of a 'face' on a mobile object may cause cars to be categorized as a type of creature and therefore be favored to be identified by evolution, since recognizing mobile objects with faces is helpful to survival.
                $endgroup$
                – Dan Bryant
                Feb 25 at 16:46






              • 7




                $begingroup$
                This answer addresses exactly what I was thinking. Children are not born as blank slates. They come with features that make some patterns easier to recognize, some things easier to learn, etc.
                $endgroup$
                – Eff
                Feb 26 at 7:49






              • 1




                $begingroup$
                While animals that walk right out of the womb are indeed fascinating, such evolutionary hardwiring is thought to be at the very opposite extreme of human learning, which is thought to be the extreme of experience-driven learning in the natural world. Certainly cars will have left minimal evolutionary impact on the evolution of our brains.
                $endgroup$
                – Eelco Hoogendoorn
                Feb 26 at 12:28






              • 5




                $begingroup$
                @EelcoHoogendoorn The ability to learn and understand the environment has been evolutionarily selected for. The brain has been set up by evolution to be extremely efficient at learning. The ability to connect the dots, see patterns, understand shapes and movement, makes inferences, etc.
                $endgroup$
                – Eff
                Feb 26 at 13:58






              • 3




                $begingroup$
                This is a good point, but it's also true that as researchers come to understand this, they build NN's that have hard-coded structures that facilitate certain types of learning. Consider that a convolutional NN has hard coded receptive fields that greatly speed up learning / enhance performance on visual tasks. Those fields could be learned from scratch in a fully connected network, but it's much harder. @EelcoHoogendoorn, human brains are full of structure that facilitates learning.
                $endgroup$
                – gung
                Feb 26 at 16:26














              • 8




                $begingroup$
                Fun aside: there's evidence that when it comes to discriminating between different models of cars, we actually leverage the specialized facial recognition center of our brain. It's plausible that, while a child may not distinguish between different models, the implicit presence of a 'face' on a mobile object may cause cars to be categorized as a type of creature and therefore be favored to be identified by evolution, since recognizing mobile objects with faces is helpful to survival.
                $endgroup$
                – Dan Bryant
                Feb 25 at 16:46






              • 7




                $begingroup$
                This answer addresses exactly what I was thinking. Children are not born as blank slates. They come with features that make some patterns easier to recognize, some things easier to learn, etc.
                $endgroup$
                – Eff
                Feb 26 at 7:49






              • 1




                $begingroup$
                While animals that walk right out of the womb are indeed fascinating, such evolutionary hardwiring is thought to be at the very opposite extreme of human learning, which is thought to be the extreme of experience-driven learning in the natural world. Certainly cars will have left minimal evolutionary impact on the evolution of our brains.
                $endgroup$
                – Eelco Hoogendoorn
                Feb 26 at 12:28






              • 5




                $begingroup$
                @EelcoHoogendoorn The ability to learn and understand the environment has been evolutionarily selected for. The brain has been set up by evolution to be extremely efficient at learning. The ability to connect the dots, see patterns, understand shapes and movement, makes inferences, etc.
                $endgroup$
                – Eff
                Feb 26 at 13:58






              • 3




                $begingroup$
                This is a good point, but it's also true that as researchers come to understand this, they build NN's that have hard-coded structures that facilitate certain types of learning. Consider that a convolutional NN has hard coded receptive fields that greatly speed up learning / enhance performance on visual tasks. Those fields could be learned from scratch in a fully connected network, but it's much harder. @EelcoHoogendoorn, human brains are full of structure that facilitates learning.
                $endgroup$
                – gung
                Feb 26 at 16:26








              8




              8




              $begingroup$
              Fun aside: there's evidence that when it comes to discriminating between different models of cars, we actually leverage the specialized facial recognition center of our brain. It's plausible that, while a child may not distinguish between different models, the implicit presence of a 'face' on a mobile object may cause cars to be categorized as a type of creature and therefore be favored to be identified by evolution, since recognizing mobile objects with faces is helpful to survival.
              $endgroup$
              – Dan Bryant
              Feb 25 at 16:46




              $begingroup$
              Fun aside: there's evidence that when it comes to discriminating between different models of cars, we actually leverage the specialized facial recognition center of our brain. It's plausible that, while a child may not distinguish between different models, the implicit presence of a 'face' on a mobile object may cause cars to be categorized as a type of creature and therefore be favored to be identified by evolution, since recognizing mobile objects with faces is helpful to survival.
              $endgroup$
              – Dan Bryant
              Feb 25 at 16:46




              7




              7




              $begingroup$
              This answer addresses exactly what I was thinking. Children are not born as blank slates. They come with features that make some patterns easier to recognize, some things easier to learn, etc.
              $endgroup$
              – Eff
              Feb 26 at 7:49




              $begingroup$
              This answer addresses exactly what I was thinking. Children are not born as blank slates. They come with features that make some patterns easier to recognize, some things easier to learn, etc.
              $endgroup$
              – Eff
              Feb 26 at 7:49




              1




              1




              $begingroup$
              While animals that walk right out of the womb are indeed fascinating, such evolutionary hardwiring is thought to be at the very opposite extreme of human learning, which is thought to be the extreme of experience-driven learning in the natural world. Certainly cars will have left minimal evolutionary impact on the evolution of our brains.
              $endgroup$
              – Eelco Hoogendoorn
              Feb 26 at 12:28




              $begingroup$
              While animals that walk right out of the womb are indeed fascinating, such evolutionary hardwiring is thought to be at the very opposite extreme of human learning, which is thought to be the extreme of experience-driven learning in the natural world. Certainly cars will have left minimal evolutionary impact on the evolution of our brains.
              $endgroup$
              – Eelco Hoogendoorn
              Feb 26 at 12:28




              5




              5




              $begingroup$
              @EelcoHoogendoorn The ability to learn and understand the environment has been evolutionarily selected for. The brain has been set up by evolution to be extremely efficient at learning. The ability to connect the dots, see patterns, understand shapes and movement, makes inferences, etc.
              $endgroup$
              – Eff
              Feb 26 at 13:58




              $begingroup$
              @EelcoHoogendoorn The ability to learn and understand the environment has been evolutionarily selected for. The brain has been set up by evolution to be extremely efficient at learning. The ability to connect the dots, see patterns, understand shapes and movement, makes inferences, etc.
              $endgroup$
              – Eff
              Feb 26 at 13:58




              3




              3




              $begingroup$
              This is a good point, but it's also true that as researchers come to understand this, they build NN's that have hard-coded structures that facilitate certain types of learning. Consider that a convolutional NN has hard coded receptive fields that greatly speed up learning / enhance performance on visual tasks. Those fields could be learned from scratch in a fully connected network, but it's much harder. @EelcoHoogendoorn, human brains are full of structure that facilitates learning.
              $endgroup$
              – gung
              Feb 26 at 16:26




              $begingroup$
              This is a good point, but it's also true that as researchers come to understand this, they build NN's that have hard-coded structures that facilitate certain types of learning. Consider that a convolutional NN has hard coded receptive fields that greatly speed up learning / enhance performance on visual tasks. Those fields could be learned from scratch in a fully connected network, but it's much harder. @EelcoHoogendoorn, human brains are full of structure that facilitates learning.
              $endgroup$
              – gung
              Feb 26 at 16:26











              21












              $begingroup$

              I don't know much about neural networks but I know a fair bit about babies.



              Many 2 year olds have a lot of issues with how general words should be. For instance, it is quite common at that age for kids to use "dog" for any four legged animal. That's a more difficult distinction than "car" - just think how different a poodle looks from a great Dane, for instance and yet they are both "dog" while a cat is not.



              And a child at 2 has seen many many more than 5 examples of "car". A kid sees dozens or even hundreds of examples of cars any time the family goes for a drive. And a lot of parents will comment "look at the car" a lot more than 5 times. But kids can also think in ways that they weren't told about. For instance, on the street the kid sees lots of things lined up. His dad says (of one) "look at the shiny car!" and the kid thinks "maybe all those other things lined up are also cars?"






              share|cite|improve this answer









              $endgroup$









              • 2




                $begingroup$
                Other examples: Taxi's, driving lesson cars, and police cars are the same. Whenever a car is red then it is a firetruck. Campervans are ambulances. A lorry with a loader crane becomes classified as an excavator. The bus that just passed by goes to the train station, so the next bus, which looks the same, must also be going to the train station. And seeing the moon during broad daylight is a very special event.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:40
















              21












              $begingroup$

              I don't know much about neural networks but I know a fair bit about babies.



              Many 2 year olds have a lot of issues with how general words should be. For instance, it is quite common at that age for kids to use "dog" for any four legged animal. That's a more difficult distinction than "car" - just think how different a poodle looks from a great Dane, for instance and yet they are both "dog" while a cat is not.



              And a child at 2 has seen many many more than 5 examples of "car". A kid sees dozens or even hundreds of examples of cars any time the family goes for a drive. And a lot of parents will comment "look at the car" a lot more than 5 times. But kids can also think in ways that they weren't told about. For instance, on the street the kid sees lots of things lined up. His dad says (of one) "look at the shiny car!" and the kid thinks "maybe all those other things lined up are also cars?"






              share|cite|improve this answer









              $endgroup$









              • 2




                $begingroup$
                Other examples: Taxi's, driving lesson cars, and police cars are the same. Whenever a car is red then it is a firetruck. Campervans are ambulances. A lorry with a loader crane becomes classified as an excavator. The bus that just passed by goes to the train station, so the next bus, which looks the same, must also be going to the train station. And seeing the moon during broad daylight is a very special event.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:40














              21












              21








              21





              $begingroup$

              I don't know much about neural networks but I know a fair bit about babies.



              Many 2 year olds have a lot of issues with how general words should be. For instance, it is quite common at that age for kids to use "dog" for any four legged animal. That's a more difficult distinction than "car" - just think how different a poodle looks from a great Dane, for instance and yet they are both "dog" while a cat is not.



              And a child at 2 has seen many many more than 5 examples of "car". A kid sees dozens or even hundreds of examples of cars any time the family goes for a drive. And a lot of parents will comment "look at the car" a lot more than 5 times. But kids can also think in ways that they weren't told about. For instance, on the street the kid sees lots of things lined up. His dad says (of one) "look at the shiny car!" and the kid thinks "maybe all those other things lined up are also cars?"






              share|cite|improve this answer









              $endgroup$



              I don't know much about neural networks but I know a fair bit about babies.



              Many 2 year olds have a lot of issues with how general words should be. For instance, it is quite common at that age for kids to use "dog" for any four legged animal. That's a more difficult distinction than "car" - just think how different a poodle looks from a great Dane, for instance and yet they are both "dog" while a cat is not.



              And a child at 2 has seen many many more than 5 examples of "car". A kid sees dozens or even hundreds of examples of cars any time the family goes for a drive. And a lot of parents will comment "look at the car" a lot more than 5 times. But kids can also think in ways that they weren't told about. For instance, on the street the kid sees lots of things lined up. His dad says (of one) "look at the shiny car!" and the kid thinks "maybe all those other things lined up are also cars?"







              share|cite|improve this answer












              share|cite|improve this answer



              share|cite|improve this answer










              answered Feb 25 at 12:00









              Peter FlomPeter Flom

              75.9k11107209




              75.9k11107209








              • 2




                $begingroup$
                Other examples: Taxi's, driving lesson cars, and police cars are the same. Whenever a car is red then it is a firetruck. Campervans are ambulances. A lorry with a loader crane becomes classified as an excavator. The bus that just passed by goes to the train station, so the next bus, which looks the same, must also be going to the train station. And seeing the moon during broad daylight is a very special event.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:40














              • 2




                $begingroup$
                Other examples: Taxi's, driving lesson cars, and police cars are the same. Whenever a car is red then it is a firetruck. Campervans are ambulances. A lorry with a loader crane becomes classified as an excavator. The bus that just passed by goes to the train station, so the next bus, which looks the same, must also be going to the train station. And seeing the moon during broad daylight is a very special event.
                $endgroup$
                – Martijn Weterings
                Feb 26 at 13:40








              2




              2




              $begingroup$
              Other examples: Taxi's, driving lesson cars, and police cars are the same. Whenever a car is red then it is a firetruck. Campervans are ambulances. A lorry with a loader crane becomes classified as an excavator. The bus that just passed by goes to the train station, so the next bus, which looks the same, must also be going to the train station. And seeing the moon during broad daylight is a very special event.
              $endgroup$
              – Martijn Weterings
              Feb 26 at 13:40




              $begingroup$
              Other examples: Taxi's, driving lesson cars, and police cars are the same. Whenever a car is red then it is a firetruck. Campervans are ambulances. A lorry with a loader crane becomes classified as an excavator. The bus that just passed by goes to the train station, so the next bus, which looks the same, must also be going to the train station. And seeing the moon during broad daylight is a very special event.
              $endgroup$
              – Martijn Weterings
              Feb 26 at 13:40











              10












              $begingroup$

              This is an a fascinating question that I've pondered over a lot also, and can come up with a few explanations why.




              • Neural networks work nothing like the brain. Backpropagation is unique to neural networks, and does not happen in the brain. In that sense, we just don't know the general learning algorithm in our brains. It could be electrical, it could be chemical, it could even be a combination of the two. Neural networks could be considered an inferior form of learning compared to our brains because of how simplified they are.

              • If neural networks are indeed like our brain, then human babies undergo extensive "training" of the early layers, like feature extraction, in their early days. So their neural networks aren't really trained from scratch, but rather the last layer is retrained to add more and more classes and labels.






              share|cite|improve this answer








              New contributor




              sd2017 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$


















                10












                $begingroup$

                This is an a fascinating question that I've pondered over a lot also, and can come up with a few explanations why.




                • Neural networks work nothing like the brain. Backpropagation is unique to neural networks, and does not happen in the brain. In that sense, we just don't know the general learning algorithm in our brains. It could be electrical, it could be chemical, it could even be a combination of the two. Neural networks could be considered an inferior form of learning compared to our brains because of how simplified they are.

                • If neural networks are indeed like our brain, then human babies undergo extensive "training" of the early layers, like feature extraction, in their early days. So their neural networks aren't really trained from scratch, but rather the last layer is retrained to add more and more classes and labels.






                share|cite|improve this answer








                New contributor




                sd2017 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$
















                  10












                  10








                  10





                  $begingroup$

                  This is an a fascinating question that I've pondered over a lot also, and can come up with a few explanations why.




                  • Neural networks work nothing like the brain. Backpropagation is unique to neural networks, and does not happen in the brain. In that sense, we just don't know the general learning algorithm in our brains. It could be electrical, it could be chemical, it could even be a combination of the two. Neural networks could be considered an inferior form of learning compared to our brains because of how simplified they are.

                  • If neural networks are indeed like our brain, then human babies undergo extensive "training" of the early layers, like feature extraction, in their early days. So their neural networks aren't really trained from scratch, but rather the last layer is retrained to add more and more classes and labels.






                  share|cite|improve this answer








                  New contributor




                  sd2017 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  $endgroup$



                  This is an a fascinating question that I've pondered over a lot also, and can come up with a few explanations why.




                  • Neural networks work nothing like the brain. Backpropagation is unique to neural networks, and does not happen in the brain. In that sense, we just don't know the general learning algorithm in our brains. It could be electrical, it could be chemical, it could even be a combination of the two. Neural networks could be considered an inferior form of learning compared to our brains because of how simplified they are.

                  • If neural networks are indeed like our brain, then human babies undergo extensive "training" of the early layers, like feature extraction, in their early days. So their neural networks aren't really trained from scratch, but rather the last layer is retrained to add more and more classes and labels.







                  share|cite|improve this answer








                  New contributor




                  sd2017 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|cite|improve this answer



                  share|cite|improve this answer






                  New contributor




                  sd2017 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered Feb 25 at 4:22









                  sd2017sd2017

                  1012




                  1012




                  New contributor




                  sd2017 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  sd2017 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  sd2017 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.























                      9












                      $begingroup$


                      A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy regardless of color, make, etc.




                      The concept of "instances" gets easily muddied. While a child may have seen 5 unique instances of a car, they have actually seen thousands of thousands of frames, in many differing environments. They have likely seen cars in other contexts. They also have an intuition for the physical world developed over their lifetime - some transfer learning probably happens here. Yet we wrap all of that up into "5 instances."



                      Meanwhile, every single frame/image you pass to a CNN is considered an "example." If you apply a consistent definition, both systems are really utilizing a much more similar amount of training data.



                      Also, I would like to note that convolutional neural networks - CNNs - are more useful in computer vision than ANNs, and in fact approach human performance in tasks like image classification. Deep learning is (probably) not a panacea, but it does perform admirably in this domain.






                      share|cite|improve this answer









                      $endgroup$


















                        9












                        $begingroup$


                        A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy regardless of color, make, etc.




                        The concept of "instances" gets easily muddied. While a child may have seen 5 unique instances of a car, they have actually seen thousands of thousands of frames, in many differing environments. They have likely seen cars in other contexts. They also have an intuition for the physical world developed over their lifetime - some transfer learning probably happens here. Yet we wrap all of that up into "5 instances."



                        Meanwhile, every single frame/image you pass to a CNN is considered an "example." If you apply a consistent definition, both systems are really utilizing a much more similar amount of training data.



                        Also, I would like to note that convolutional neural networks - CNNs - are more useful in computer vision than ANNs, and in fact approach human performance in tasks like image classification. Deep learning is (probably) not a panacea, but it does perform admirably in this domain.






                        share|cite|improve this answer









                        $endgroup$
















                          9












                          9








                          9





                          $begingroup$


                          A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy regardless of color, make, etc.




                          The concept of "instances" gets easily muddied. While a child may have seen 5 unique instances of a car, they have actually seen thousands of thousands of frames, in many differing environments. They have likely seen cars in other contexts. They also have an intuition for the physical world developed over their lifetime - some transfer learning probably happens here. Yet we wrap all of that up into "5 instances."



                          Meanwhile, every single frame/image you pass to a CNN is considered an "example." If you apply a consistent definition, both systems are really utilizing a much more similar amount of training data.



                          Also, I would like to note that convolutional neural networks - CNNs - are more useful in computer vision than ANNs, and in fact approach human performance in tasks like image classification. Deep learning is (probably) not a panacea, but it does perform admirably in this domain.






                          share|cite|improve this answer









                          $endgroup$




                          A human child at age 2 needs around 5 instances of a car to be able to identify it with reasonable accuracy regardless of color, make, etc.




                          The concept of "instances" gets easily muddied. While a child may have seen 5 unique instances of a car, they have actually seen thousands of thousands of frames, in many differing environments. They have likely seen cars in other contexts. They also have an intuition for the physical world developed over their lifetime - some transfer learning probably happens here. Yet we wrap all of that up into "5 instances."



                          Meanwhile, every single frame/image you pass to a CNN is considered an "example." If you apply a consistent definition, both systems are really utilizing a much more similar amount of training data.



                          Also, I would like to note that convolutional neural networks - CNNs - are more useful in computer vision than ANNs, and in fact approach human performance in tasks like image classification. Deep learning is (probably) not a panacea, but it does perform admirably in this domain.







                          share|cite|improve this answer












                          share|cite|improve this answer



                          share|cite|improve this answer










                          answered Feb 25 at 22:40









                          spinodalspinodal

                          1986




                          1986























                              5












                              $begingroup$

                              As pointed out by others, the data-efficiency of artificial neural networks varies quite substantially, depending on the details. As a matter of fact, there are many so called one-shot learning methods, that can solve the task of labelling trams with quite good accuracy, using only a single labelled sample.



                              One way to do this is by so-called transfer learning; a network trained on other labels is usually very effectively adaptable to new labels, since the hard work is breaking down the low level components of the image in a sensible way.



                              But we do not infact need such labeled data to perform such task; much like babies dont need nearly as much labeled data as the neural networs you are thinking of do.



                              For instance, one such unsupervised methods that I have also successfully applied in other contexts, is to take an unlabeled set of images, randomly rotate them, and train a network to predict which side of the image is 'up'. Without knowing what the visible objects are, or what they are called, this forces the network to learn a tremendous amount of structure about the images; and this can form an excellent basis for much more data-efficient subsequent labeled learning.



                              While it is true that artificial networks are quite different from real ones in probably meaningful ways, such as the absence of an obvious analogue of backpropagation, it is very probably true that real neural networks make use of the same tricks, of trying to learn the structure in the data implied by some simple priors.



                              One other example which almost certainly plays a role in animals and has also shown great promise in understanding video, is in the assumption that the future should be predictable from the past. Just by starting from that assumption, you can teach a neural network a whole lot. Or on a philosophical level, I am inclined to believe that this assumption underlies almost everything what we consider to be 'knowledge'.



                              I am not saying anything new here; but it is relatively new in the sense that these possibilities are too young to have found many applications yet, and do not yet have percolated down to the textbook understanding of 'what an ANN can do'. So to answer the OPs question; ANN's have already closed much of the gap that you describe.






                              share|cite|improve this answer









                              $endgroup$


















                                5












                                $begingroup$

                                As pointed out by others, the data-efficiency of artificial neural networks varies quite substantially, depending on the details. As a matter of fact, there are many so called one-shot learning methods, that can solve the task of labelling trams with quite good accuracy, using only a single labelled sample.



                                One way to do this is by so-called transfer learning; a network trained on other labels is usually very effectively adaptable to new labels, since the hard work is breaking down the low level components of the image in a sensible way.



                                But we do not infact need such labeled data to perform such task; much like babies dont need nearly as much labeled data as the neural networs you are thinking of do.



                                For instance, one such unsupervised methods that I have also successfully applied in other contexts, is to take an unlabeled set of images, randomly rotate them, and train a network to predict which side of the image is 'up'. Without knowing what the visible objects are, or what they are called, this forces the network to learn a tremendous amount of structure about the images; and this can form an excellent basis for much more data-efficient subsequent labeled learning.



                                While it is true that artificial networks are quite different from real ones in probably meaningful ways, such as the absence of an obvious analogue of backpropagation, it is very probably true that real neural networks make use of the same tricks, of trying to learn the structure in the data implied by some simple priors.



                                One other example which almost certainly plays a role in animals and has also shown great promise in understanding video, is in the assumption that the future should be predictable from the past. Just by starting from that assumption, you can teach a neural network a whole lot. Or on a philosophical level, I am inclined to believe that this assumption underlies almost everything what we consider to be 'knowledge'.



                                I am not saying anything new here; but it is relatively new in the sense that these possibilities are too young to have found many applications yet, and do not yet have percolated down to the textbook understanding of 'what an ANN can do'. So to answer the OPs question; ANN's have already closed much of the gap that you describe.






                                share|cite|improve this answer









                                $endgroup$
















                                  5












                                  5








                                  5





                                  $begingroup$

                                  As pointed out by others, the data-efficiency of artificial neural networks varies quite substantially, depending on the details. As a matter of fact, there are many so called one-shot learning methods, that can solve the task of labelling trams with quite good accuracy, using only a single labelled sample.



                                  One way to do this is by so-called transfer learning; a network trained on other labels is usually very effectively adaptable to new labels, since the hard work is breaking down the low level components of the image in a sensible way.



                                  But we do not infact need such labeled data to perform such task; much like babies dont need nearly as much labeled data as the neural networs you are thinking of do.



                                  For instance, one such unsupervised methods that I have also successfully applied in other contexts, is to take an unlabeled set of images, randomly rotate them, and train a network to predict which side of the image is 'up'. Without knowing what the visible objects are, or what they are called, this forces the network to learn a tremendous amount of structure about the images; and this can form an excellent basis for much more data-efficient subsequent labeled learning.



                                  While it is true that artificial networks are quite different from real ones in probably meaningful ways, such as the absence of an obvious analogue of backpropagation, it is very probably true that real neural networks make use of the same tricks, of trying to learn the structure in the data implied by some simple priors.



                                  One other example which almost certainly plays a role in animals and has also shown great promise in understanding video, is in the assumption that the future should be predictable from the past. Just by starting from that assumption, you can teach a neural network a whole lot. Or on a philosophical level, I am inclined to believe that this assumption underlies almost everything what we consider to be 'knowledge'.



                                  I am not saying anything new here; but it is relatively new in the sense that these possibilities are too young to have found many applications yet, and do not yet have percolated down to the textbook understanding of 'what an ANN can do'. So to answer the OPs question; ANN's have already closed much of the gap that you describe.






                                  share|cite|improve this answer









                                  $endgroup$



                                  As pointed out by others, the data-efficiency of artificial neural networks varies quite substantially, depending on the details. As a matter of fact, there are many so called one-shot learning methods, that can solve the task of labelling trams with quite good accuracy, using only a single labelled sample.



                                  One way to do this is by so-called transfer learning; a network trained on other labels is usually very effectively adaptable to new labels, since the hard work is breaking down the low level components of the image in a sensible way.



                                  But we do not infact need such labeled data to perform such task; much like babies dont need nearly as much labeled data as the neural networs you are thinking of do.



                                  For instance, one such unsupervised methods that I have also successfully applied in other contexts, is to take an unlabeled set of images, randomly rotate them, and train a network to predict which side of the image is 'up'. Without knowing what the visible objects are, or what they are called, this forces the network to learn a tremendous amount of structure about the images; and this can form an excellent basis for much more data-efficient subsequent labeled learning.



                                  While it is true that artificial networks are quite different from real ones in probably meaningful ways, such as the absence of an obvious analogue of backpropagation, it is very probably true that real neural networks make use of the same tricks, of trying to learn the structure in the data implied by some simple priors.



                                  One other example which almost certainly plays a role in animals and has also shown great promise in understanding video, is in the assumption that the future should be predictable from the past. Just by starting from that assumption, you can teach a neural network a whole lot. Or on a philosophical level, I am inclined to believe that this assumption underlies almost everything what we consider to be 'knowledge'.



                                  I am not saying anything new here; but it is relatively new in the sense that these possibilities are too young to have found many applications yet, and do not yet have percolated down to the textbook understanding of 'what an ANN can do'. So to answer the OPs question; ANN's have already closed much of the gap that you describe.







                                  share|cite|improve this answer












                                  share|cite|improve this answer



                                  share|cite|improve this answer










                                  answered Feb 25 at 10:42









                                  Eelco HoogendoornEelco Hoogendoorn

                                  956




                                  956























                                      4












                                      $begingroup$

                                      One way to train a deep neural network is to treat it as a stack of auto-encoders (Restricted Boltzmann Machines).



                                      In theory, an auto-encoder learns in an unsupervised manner: It takes arbitrary, unlabelled input data and processes it to generate output data. Then it takes that output data, and tries to regenerate its input data. It tweaks its nodes' parameters until it can come close to round-tripping its data. If you think about it, the auto-encoder is writing its own automated unit tests. In effect, it is turning its "unlabelled input data" into labelled data: The original data serves as a label for the round-tripped data.



                                      After the layers of auto-encoders are trained, the neural network is fine-tuned using labelled data to perform its intended function. In effect, these are functional tests.



                                      The original poster asks why a lot of data is needed to train an artificial neural network, and compares that to the allegedly low amount of training data needed by a two-year-old human. The original poster is comparing apples-to-oranges: The overall training process for the artificial neural net, versus the fine-tuning with labels for the two-year-old.



                                      But in reality, the two-year old has been training its auto-encoders on random, self-labelled data for more than two years. Babies dream when they are in utero. (So do kittens.) Researchers have described these dreams as involving random neuron firings in the visual processing centers.






                                      share|cite|improve this answer










                                      New contributor




                                      Jasper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.






                                      $endgroup$









                                      • 1




                                        $begingroup$
                                        Agreed; except that auto-encoders in practice are not very powerful tools at doing much unsupervised learning at all; everything we know points at there being more going on, so the phrasing 'the two-year old has been training its auto-encoders' should not be taken too literally I suppose.
                                        $endgroup$
                                        – Eelco Hoogendoorn
                                        Feb 26 at 12:34
















                                      4












                                      $begingroup$

                                      One way to train a deep neural network is to treat it as a stack of auto-encoders (Restricted Boltzmann Machines).



                                      In theory, an auto-encoder learns in an unsupervised manner: It takes arbitrary, unlabelled input data and processes it to generate output data. Then it takes that output data, and tries to regenerate its input data. It tweaks its nodes' parameters until it can come close to round-tripping its data. If you think about it, the auto-encoder is writing its own automated unit tests. In effect, it is turning its "unlabelled input data" into labelled data: The original data serves as a label for the round-tripped data.



                                      After the layers of auto-encoders are trained, the neural network is fine-tuned using labelled data to perform its intended function. In effect, these are functional tests.



                                      The original poster asks why a lot of data is needed to train an artificial neural network, and compares that to the allegedly low amount of training data needed by a two-year-old human. The original poster is comparing apples-to-oranges: The overall training process for the artificial neural net, versus the fine-tuning with labels for the two-year-old.



                                      But in reality, the two-year old has been training its auto-encoders on random, self-labelled data for more than two years. Babies dream when they are in utero. (So do kittens.) Researchers have described these dreams as involving random neuron firings in the visual processing centers.






                                      share|cite|improve this answer










                                      New contributor




                                      Jasper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.






                                      $endgroup$









                                      • 1




                                        $begingroup$
                                        Agreed; except that auto-encoders in practice are not very powerful tools at doing much unsupervised learning at all; everything we know points at there being more going on, so the phrasing 'the two-year old has been training its auto-encoders' should not be taken too literally I suppose.
                                        $endgroup$
                                        – Eelco Hoogendoorn
                                        Feb 26 at 12:34














                                      4












                                      4








                                      4





                                      $begingroup$

                                      One way to train a deep neural network is to treat it as a stack of auto-encoders (Restricted Boltzmann Machines).



                                      In theory, an auto-encoder learns in an unsupervised manner: It takes arbitrary, unlabelled input data and processes it to generate output data. Then it takes that output data, and tries to regenerate its input data. It tweaks its nodes' parameters until it can come close to round-tripping its data. If you think about it, the auto-encoder is writing its own automated unit tests. In effect, it is turning its "unlabelled input data" into labelled data: The original data serves as a label for the round-tripped data.



                                      After the layers of auto-encoders are trained, the neural network is fine-tuned using labelled data to perform its intended function. In effect, these are functional tests.



                                      The original poster asks why a lot of data is needed to train an artificial neural network, and compares that to the allegedly low amount of training data needed by a two-year-old human. The original poster is comparing apples-to-oranges: The overall training process for the artificial neural net, versus the fine-tuning with labels for the two-year-old.



                                      But in reality, the two-year old has been training its auto-encoders on random, self-labelled data for more than two years. Babies dream when they are in utero. (So do kittens.) Researchers have described these dreams as involving random neuron firings in the visual processing centers.






                                      share|cite|improve this answer










                                      New contributor




                                      Jasper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.






                                      $endgroup$



                                      One way to train a deep neural network is to treat it as a stack of auto-encoders (Restricted Boltzmann Machines).



                                      In theory, an auto-encoder learns in an unsupervised manner: It takes arbitrary, unlabelled input data and processes it to generate output data. Then it takes that output data, and tries to regenerate its input data. It tweaks its nodes' parameters until it can come close to round-tripping its data. If you think about it, the auto-encoder is writing its own automated unit tests. In effect, it is turning its "unlabelled input data" into labelled data: The original data serves as a label for the round-tripped data.



                                      After the layers of auto-encoders are trained, the neural network is fine-tuned using labelled data to perform its intended function. In effect, these are functional tests.



                                      The original poster asks why a lot of data is needed to train an artificial neural network, and compares that to the allegedly low amount of training data needed by a two-year-old human. The original poster is comparing apples-to-oranges: The overall training process for the artificial neural net, versus the fine-tuning with labels for the two-year-old.



                                      But in reality, the two-year old has been training its auto-encoders on random, self-labelled data for more than two years. Babies dream when they are in utero. (So do kittens.) Researchers have described these dreams as involving random neuron firings in the visual processing centers.







                                      share|cite|improve this answer










                                      New contributor




                                      Jasper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.









                                      share|cite|improve this answer



                                      share|cite|improve this answer








                                      edited Feb 24 at 21:02





















                                      New contributor




                                      Jasper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.









                                      answered Feb 24 at 20:54









                                      JasperJasper

                                      1414




                                      1414




                                      New contributor




                                      Jasper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.





                                      New contributor





                                      Jasper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.






                                      Jasper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.








                                      • 1




                                        $begingroup$
                                        Agreed; except that auto-encoders in practice are not very powerful tools at doing much unsupervised learning at all; everything we know points at there being more going on, so the phrasing 'the two-year old has been training its auto-encoders' should not be taken too literally I suppose.
                                        $endgroup$
                                        – Eelco Hoogendoorn
                                        Feb 26 at 12:34














                                      • 1




                                        $begingroup$
                                        Agreed; except that auto-encoders in practice are not very powerful tools at doing much unsupervised learning at all; everything we know points at there being more going on, so the phrasing 'the two-year old has been training its auto-encoders' should not be taken too literally I suppose.
                                        $endgroup$
                                        – Eelco Hoogendoorn
                                        Feb 26 at 12:34








                                      1




                                      1




                                      $begingroup$
                                      Agreed; except that auto-encoders in practice are not very powerful tools at doing much unsupervised learning at all; everything we know points at there being more going on, so the phrasing 'the two-year old has been training its auto-encoders' should not be taken too literally I suppose.
                                      $endgroup$
                                      – Eelco Hoogendoorn
                                      Feb 26 at 12:34




                                      $begingroup$
                                      Agreed; except that auto-encoders in practice are not very powerful tools at doing much unsupervised learning at all; everything we know points at there being more going on, so the phrasing 'the two-year old has been training its auto-encoders' should not be taken too literally I suppose.
                                      $endgroup$
                                      – Eelco Hoogendoorn
                                      Feb 26 at 12:34











                                      4












                                      $begingroup$

                                      We don't learn to "see cars" until we learn to see



                                      It takes quite a long time and lots of examples for a child to learn how to see objects as such. After that, a child can learn to identify a particular type of object from just a few examples. If you compare a two year old child with a learning system that literally starts from a blank slate, it's an apples and oranges comparison; at that age child has seen thousands of hours of "video footage".



                                      In a similar manner, it takes artificial neural networks a lot of examples to learn "how to see" but after that it's possible to transfer that knowledge to new examples. Transfer learning is a whole domain of machine learning, and things like "one shot learning" are possible - you can build ANNs that will learn to identify new types of objects that it hasn't seen before from a single example, or to identify a particular person from a single photo of their face. But doing this initial "learning to see" part well requires quite a lot of data.



                                      Furthermore, there's some evidence that not all training data is equal, namely, that data which you "choose" while learning is more effective than data that's simply provided to you. E.g. Held & Hein twin kitten experiment. https://www.lri.fr/~mbl/ENS/FONDIHM/2013/papers/about-HeldHein63.pdf






                                      share|cite|improve this answer











                                      $endgroup$


















                                        4












                                        $begingroup$

                                        We don't learn to "see cars" until we learn to see



                                        It takes quite a long time and lots of examples for a child to learn how to see objects as such. After that, a child can learn to identify a particular type of object from just a few examples. If you compare a two year old child with a learning system that literally starts from a blank slate, it's an apples and oranges comparison; at that age child has seen thousands of hours of "video footage".



                                        In a similar manner, it takes artificial neural networks a lot of examples to learn "how to see" but after that it's possible to transfer that knowledge to new examples. Transfer learning is a whole domain of machine learning, and things like "one shot learning" are possible - you can build ANNs that will learn to identify new types of objects that it hasn't seen before from a single example, or to identify a particular person from a single photo of their face. But doing this initial "learning to see" part well requires quite a lot of data.



                                        Furthermore, there's some evidence that not all training data is equal, namely, that data which you "choose" while learning is more effective than data that's simply provided to you. E.g. Held & Hein twin kitten experiment. https://www.lri.fr/~mbl/ENS/FONDIHM/2013/papers/about-HeldHein63.pdf






                                        share|cite|improve this answer











                                        $endgroup$
















                                          4












                                          4








                                          4





                                          $begingroup$

                                          We don't learn to "see cars" until we learn to see



                                          It takes quite a long time and lots of examples for a child to learn how to see objects as such. After that, a child can learn to identify a particular type of object from just a few examples. If you compare a two year old child with a learning system that literally starts from a blank slate, it's an apples and oranges comparison; at that age child has seen thousands of hours of "video footage".



                                          In a similar manner, it takes artificial neural networks a lot of examples to learn "how to see" but after that it's possible to transfer that knowledge to new examples. Transfer learning is a whole domain of machine learning, and things like "one shot learning" are possible - you can build ANNs that will learn to identify new types of objects that it hasn't seen before from a single example, or to identify a particular person from a single photo of their face. But doing this initial "learning to see" part well requires quite a lot of data.



                                          Furthermore, there's some evidence that not all training data is equal, namely, that data which you "choose" while learning is more effective than data that's simply provided to you. E.g. Held & Hein twin kitten experiment. https://www.lri.fr/~mbl/ENS/FONDIHM/2013/papers/about-HeldHein63.pdf






                                          share|cite|improve this answer











                                          $endgroup$



                                          We don't learn to "see cars" until we learn to see



                                          It takes quite a long time and lots of examples for a child to learn how to see objects as such. After that, a child can learn to identify a particular type of object from just a few examples. If you compare a two year old child with a learning system that literally starts from a blank slate, it's an apples and oranges comparison; at that age child has seen thousands of hours of "video footage".



                                          In a similar manner, it takes artificial neural networks a lot of examples to learn "how to see" but after that it's possible to transfer that knowledge to new examples. Transfer learning is a whole domain of machine learning, and things like "one shot learning" are possible - you can build ANNs that will learn to identify new types of objects that it hasn't seen before from a single example, or to identify a particular person from a single photo of their face. But doing this initial "learning to see" part well requires quite a lot of data.



                                          Furthermore, there's some evidence that not all training data is equal, namely, that data which you "choose" while learning is more effective than data that's simply provided to you. E.g. Held & Hein twin kitten experiment. https://www.lri.fr/~mbl/ENS/FONDIHM/2013/papers/about-HeldHein63.pdf







                                          share|cite|improve this answer














                                          share|cite|improve this answer



                                          share|cite|improve this answer








                                          edited Feb 26 at 2:29

























                                          answered Feb 26 at 2:24









                                          PeterisPeteris

                                          20614




                                          20614























                                              4












                                              $begingroup$

                                              One thing that I haven't seen in the answers so far is the fact that one 'instance' of a real world object that is seen by a human child does not corresponds to an instance in the context of NN training.



                                              Suppose you're standing at a railway intersection with a 5 year old child and watch 5 trains pass within 10 minutes. Now, you could say "My child only saw 5 trains and can reliably identify other trains while a NN needs thousands of images!". While this is likely true, you are completely ignoring the fact that every train your child sees contains A LOT more information than a single image of a train. In fact, the brain of your child is processing several dozens images of the train per second while it is passing by, each from a slightly different angle, different shadows, etc., while a single image will provide the NN with very limited information.
                                              In this context, your child even has information that is not available to the NN, for example the speed of the train or the sound that the train makes.



                                              Further, your child can talk and ASK QUESTIONS! "Trains are very long, right?" "Yes.", "And they are very big too, right?" "Yes.". With two simple questions your child learn two very essential features in less than a minute!



                                              Another important point is object detection. Your child is able to identify immediately on which object, i.e. which part of the image, it needs to focus on, while a NN must learn to detect the relevant object before it can attempt to classify it.






                                              share|cite|improve this answer











                                              $endgroup$









                                              • 3




                                                $begingroup$
                                                I would add also that the child has context: it sees a train on the rails, be it at a station, level crossing etc. If it sees a huge (zeppelin size) balloon shaped and painted to look like a train in the sky, it won't say it's a train. It will say it looks like a train, but it won't attach a label "train" to it. I'm skeptical a NN will return a label "train-looking balloon" in this case. Similarly, a child won't mistake a billboard with a train on it with an actual train. A picture of a picture of a train is a picture of a train to a NN – it will return the label "train".
                                                $endgroup$
                                                – corey979
                                                2 days ago
















                                              4












                                              $begingroup$

                                              One thing that I haven't seen in the answers so far is the fact that one 'instance' of a real world object that is seen by a human child does not corresponds to an instance in the context of NN training.



                                              Suppose you're standing at a railway intersection with a 5 year old child and watch 5 trains pass within 10 minutes. Now, you could say "My child only saw 5 trains and can reliably identify other trains while a NN needs thousands of images!". While this is likely true, you are completely ignoring the fact that every train your child sees contains A LOT more information than a single image of a train. In fact, the brain of your child is processing several dozens images of the train per second while it is passing by, each from a slightly different angle, different shadows, etc., while a single image will provide the NN with very limited information.
                                              In this context, your child even has information that is not available to the NN, for example the speed of the train or the sound that the train makes.



                                              Further, your child can talk and ASK QUESTIONS! "Trains are very long, right?" "Yes.", "And they are very big too, right?" "Yes.". With two simple questions your child learn two very essential features in less than a minute!



                                              Another important point is object detection. Your child is able to identify immediately on which object, i.e. which part of the image, it needs to focus on, while a NN must learn to detect the relevant object before it can attempt to classify it.






                                              share|cite|improve this answer











                                              $endgroup$









                                              • 3




                                                $begingroup$
                                                I would add also that the child has context: it sees a train on the rails, be it at a station, level crossing etc. If it sees a huge (zeppelin size) balloon shaped and painted to look like a train in the sky, it won't say it's a train. It will say it looks like a train, but it won't attach a label "train" to it. I'm skeptical a NN will return a label "train-looking balloon" in this case. Similarly, a child won't mistake a billboard with a train on it with an actual train. A picture of a picture of a train is a picture of a train to a NN – it will return the label "train".
                                                $endgroup$
                                                – corey979
                                                2 days ago














                                              4












                                              4








                                              4





                                              $begingroup$

                                              One thing that I haven't seen in the answers so far is the fact that one 'instance' of a real world object that is seen by a human child does not corresponds to an instance in the context of NN training.



                                              Suppose you're standing at a railway intersection with a 5 year old child and watch 5 trains pass within 10 minutes. Now, you could say "My child only saw 5 trains and can reliably identify other trains while a NN needs thousands of images!". While this is likely true, you are completely ignoring the fact that every train your child sees contains A LOT more information than a single image of a train. In fact, the brain of your child is processing several dozens images of the train per second while it is passing by, each from a slightly different angle, different shadows, etc., while a single image will provide the NN with very limited information.
                                              In this context, your child even has information that is not available to the NN, for example the speed of the train or the sound that the train makes.



                                              Further, your child can talk and ASK QUESTIONS! "Trains are very long, right?" "Yes.", "And they are very big too, right?" "Yes.". With two simple questions your child learn two very essential features in less than a minute!



                                              Another important point is object detection. Your child is able to identify immediately on which object, i.e. which part of the image, it needs to focus on, while a NN must learn to detect the relevant object before it can attempt to classify it.






                                              share|cite|improve this answer











                                              $endgroup$



                                              One thing that I haven't seen in the answers so far is the fact that one 'instance' of a real world object that is seen by a human child does not corresponds to an instance in the context of NN training.



                                              Suppose you're standing at a railway intersection with a 5 year old child and watch 5 trains pass within 10 minutes. Now, you could say "My child only saw 5 trains and can reliably identify other trains while a NN needs thousands of images!". While this is likely true, you are completely ignoring the fact that every train your child sees contains A LOT more information than a single image of a train. In fact, the brain of your child is processing several dozens images of the train per second while it is passing by, each from a slightly different angle, different shadows, etc., while a single image will provide the NN with very limited information.
                                              In this context, your child even has information that is not available to the NN, for example the speed of the train or the sound that the train makes.



                                              Further, your child can talk and ASK QUESTIONS! "Trains are very long, right?" "Yes.", "And they are very big too, right?" "Yes.". With two simple questions your child learn two very essential features in less than a minute!



                                              Another important point is object detection. Your child is able to identify immediately on which object, i.e. which part of the image, it needs to focus on, while a NN must learn to detect the relevant object before it can attempt to classify it.







                                              share|cite|improve this answer














                                              share|cite|improve this answer



                                              share|cite|improve this answer








                                              edited 2 days ago

























                                              answered 2 days ago









                                              bi_scholarbi_scholar

                                              37413




                                              37413








                                              • 3




                                                $begingroup$
                                                I would add also that the child has context: it sees a train on the rails, be it at a station, level crossing etc. If it sees a huge (zeppelin size) balloon shaped and painted to look like a train in the sky, it won't say it's a train. It will say it looks like a train, but it won't attach a label "train" to it. I'm skeptical a NN will return a label "train-looking balloon" in this case. Similarly, a child won't mistake a billboard with a train on it with an actual train. A picture of a picture of a train is a picture of a train to a NN – it will return the label "train".
                                                $endgroup$
                                                – corey979
                                                2 days ago














                                              • 3




                                                $begingroup$
                                                I would add also that the child has context: it sees a train on the rails, be it at a station, level crossing etc. If it sees a huge (zeppelin size) balloon shaped and painted to look like a train in the sky, it won't say it's a train. It will say it looks like a train, but it won't attach a label "train" to it. I'm skeptical a NN will return a label "train-looking balloon" in this case. Similarly, a child won't mistake a billboard with a train on it with an actual train. A picture of a picture of a train is a picture of a train to a NN – it will return the label "train".
                                                $endgroup$
                                                – corey979
                                                2 days ago








                                              3




                                              3




                                              $begingroup$
                                              I would add also that the child has context: it sees a train on the rails, be it at a station, level crossing etc. If it sees a huge (zeppelin size) balloon shaped and painted to look like a train in the sky, it won't say it's a train. It will say it looks like a train, but it won't attach a label "train" to it. I'm skeptical a NN will return a label "train-looking balloon" in this case. Similarly, a child won't mistake a billboard with a train on it with an actual train. A picture of a picture of a train is a picture of a train to a NN – it will return the label "train".
                                              $endgroup$
                                              – corey979
                                              2 days ago




                                              $begingroup$
                                              I would add also that the child has context: it sees a train on the rails, be it at a station, level crossing etc. If it sees a huge (zeppelin size) balloon shaped and painted to look like a train in the sky, it won't say it's a train. It will say it looks like a train, but it won't attach a label "train" to it. I'm skeptical a NN will return a label "train-looking balloon" in this case. Similarly, a child won't mistake a billboard with a train on it with an actual train. A picture of a picture of a train is a picture of a train to a NN – it will return the label "train".
                                              $endgroup$
                                              – corey979
                                              2 days ago











                                              3












                                              $begingroup$

                                              I would argue the performance is not that different as you might expect, but you ask a great question (see the last paragraph).



                                              As you mention transfer learning: To compare apples with apples we have to look how many pictures in total and how many pictures of the class of interest a human / neural net "sees".



                                              1. How many pictures does a human look at?



                                              Human´s eye movement takes around 200ms which could be seen as kind of an "biological photo". See the talk by computer vision expert Fei-Fei Li: https://www.ted.com/talks/fei_fei_li_how_we_re_teaching_computers_to_understand_pictures#t-362785.



                                              She adds:




                                              So by age 3 a child would have seen hundreds of millions of pictures.




                                              In ImageNet, the leading database for object detection, there are ~14million labeled pictures. So a neural network being trained on ImageNet would have seen as many pictures as a 14000000/5/60/60/24*2 ~ 64 days old baby, so two months old (assuming the baby is awake half of her life).
                                              To be fair its hard to tell how many of this pictures are labeled. Moreover, the pictures, a baby sees, are not that diverse like in ImageNet. (Probably the baby sees her mother have of the time,... ;).
                                              However, i think its fair to say that your son will have seen hundreds of millions of pictures (and then applies transfer learning).



                                              So how many pictures do we need to learn a new category given a solid base of related pictures that can be (transfer) learned from?



                                              First blog post i found was this: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html. They use 1000 examples per class. I could imagine 2.5 years later even way less is required.
                                              However, 1000 pictures can be seen by a human in 1000/5/60 in 3.3 minutes.



                                              You wrote:




                                              A human child at age 2 needs around 5 instances of a car to be able to
                                              identify it with reasonable accuracy regardless of color, make, etc.




                                              That would be equivilant to forty seconds per instance (with various angles of that object to make it comparable).



                                              To sum up:
                                              As i mentioned, I had to make a few assumptions. But i think, one can see that the performance is not that different as one might expect.



                                              However, i believe you ask a great question and here is why:



                                              2. Would neural network perform better/different if they would work more like brains? (Geoffrey Hinton says yes).



                                              In an interview https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/, in late 2018, he compares the current implementations of neural networks with the brain. He mentions, in terms of weights, the artificial neural networks are smaller than the brain by a factor of 10.000. Therefore, the brain needs way less iterations of trainings to learn. In order to enable artificial neural networks, to work more like our brains, he follows another trend in hardware, a UK based startup called Graphcore. It reduces the calculation time by a smart way of storing the weights of a neural network. Therefore, more weights can be used and the training time of the artificial neural networks might get reduced.






                                              share|cite|improve this answer










                                              New contributor




                                              BigDataScientist is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.






                                              $endgroup$


















                                                3












                                                $begingroup$

                                                I would argue the performance is not that different as you might expect, but you ask a great question (see the last paragraph).



                                                As you mention transfer learning: To compare apples with apples we have to look how many pictures in total and how many pictures of the class of interest a human / neural net "sees".



                                                1. How many pictures does a human look at?



                                                Human´s eye movement takes around 200ms which could be seen as kind of an "biological photo". See the talk by computer vision expert Fei-Fei Li: https://www.ted.com/talks/fei_fei_li_how_we_re_teaching_computers_to_understand_pictures#t-362785.



                                                She adds:




                                                So by age 3 a child would have seen hundreds of millions of pictures.




                                                In ImageNet, the leading database for object detection, there are ~14million labeled pictures. So a neural network being trained on ImageNet would have seen as many pictures as a 14000000/5/60/60/24*2 ~ 64 days old baby, so two months old (assuming the baby is awake half of her life).
                                                To be fair its hard to tell how many of this pictures are labeled. Moreover, the pictures, a baby sees, are not that diverse like in ImageNet. (Probably the baby sees her mother have of the time,... ;).
                                                However, i think its fair to say that your son will have seen hundreds of millions of pictures (and then applies transfer learning).



                                                So how many pictures do we need to learn a new category given a solid base of related pictures that can be (transfer) learned from?



                                                First blog post i found was this: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html. They use 1000 examples per class. I could imagine 2.5 years later even way less is required.
                                                However, 1000 pictures can be seen by a human in 1000/5/60 in 3.3 minutes.



                                                You wrote:




                                                A human child at age 2 needs around 5 instances of a car to be able to
                                                identify it with reasonable accuracy regardless of color, make, etc.




                                                That would be equivilant to forty seconds per instance (with various angles of that object to make it comparable).



                                                To sum up:
                                                As i mentioned, I had to make a few assumptions. But i think, one can see that the performance is not that different as one might expect.



                                                However, i believe you ask a great question and here is why:



                                                2. Would neural network perform better/different if they would work more like brains? (Geoffrey Hinton says yes).



                                                In an interview https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/, in late 2018, he compares the current implementations of neural networks with the brain. He mentions, in terms of weights, the artificial neural networks are smaller than the brain by a factor of 10.000. Therefore, the brain needs way less iterations of trainings to learn. In order to enable artificial neural networks, to work more like our brains, he follows another trend in hardware, a UK based startup called Graphcore. It reduces the calculation time by a smart way of storing the weights of a neural network. Therefore, more weights can be used and the training time of the artificial neural networks might get reduced.






                                                share|cite|improve this answer










                                                New contributor




                                                BigDataScientist is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.






                                                $endgroup$
















                                                  3












                                                  3








                                                  3





                                                  $begingroup$

                                                  I would argue the performance is not that different as you might expect, but you ask a great question (see the last paragraph).



                                                  As you mention transfer learning: To compare apples with apples we have to look how many pictures in total and how many pictures of the class of interest a human / neural net "sees".



                                                  1. How many pictures does a human look at?



                                                  Human´s eye movement takes around 200ms which could be seen as kind of an "biological photo". See the talk by computer vision expert Fei-Fei Li: https://www.ted.com/talks/fei_fei_li_how_we_re_teaching_computers_to_understand_pictures#t-362785.



                                                  She adds:




                                                  So by age 3 a child would have seen hundreds of millions of pictures.




                                                  In ImageNet, the leading database for object detection, there are ~14million labeled pictures. So a neural network being trained on ImageNet would have seen as many pictures as a 14000000/5/60/60/24*2 ~ 64 days old baby, so two months old (assuming the baby is awake half of her life).
                                                  To be fair its hard to tell how many of this pictures are labeled. Moreover, the pictures, a baby sees, are not that diverse like in ImageNet. (Probably the baby sees her mother have of the time,... ;).
                                                  However, i think its fair to say that your son will have seen hundreds of millions of pictures (and then applies transfer learning).



                                                  So how many pictures do we need to learn a new category given a solid base of related pictures that can be (transfer) learned from?



                                                  First blog post i found was this: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html. They use 1000 examples per class. I could imagine 2.5 years later even way less is required.
                                                  However, 1000 pictures can be seen by a human in 1000/5/60 in 3.3 minutes.



                                                  You wrote:




                                                  A human child at age 2 needs around 5 instances of a car to be able to
                                                  identify it with reasonable accuracy regardless of color, make, etc.




                                                  That would be equivilant to forty seconds per instance (with various angles of that object to make it comparable).



                                                  To sum up:
                                                  As i mentioned, I had to make a few assumptions. But i think, one can see that the performance is not that different as one might expect.



                                                  However, i believe you ask a great question and here is why:



                                                  2. Would neural network perform better/different if they would work more like brains? (Geoffrey Hinton says yes).



                                                  In an interview https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/, in late 2018, he compares the current implementations of neural networks with the brain. He mentions, in terms of weights, the artificial neural networks are smaller than the brain by a factor of 10.000. Therefore, the brain needs way less iterations of trainings to learn. In order to enable artificial neural networks, to work more like our brains, he follows another trend in hardware, a UK based startup called Graphcore. It reduces the calculation time by a smart way of storing the weights of a neural network. Therefore, more weights can be used and the training time of the artificial neural networks might get reduced.






                                                  share|cite|improve this answer










                                                  New contributor




                                                  BigDataScientist is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                  Check out our Code of Conduct.






                                                  $endgroup$



                                                  I would argue the performance is not that different as you might expect, but you ask a great question (see the last paragraph).



                                                  As you mention transfer learning: To compare apples with apples we have to look how many pictures in total and how many pictures of the class of interest a human / neural net "sees".



                                                  1. How many pictures does a human look at?



                                                  Human´s eye movement takes around 200ms which could be seen as kind of an "biological photo". See the talk by computer vision expert Fei-Fei Li: https://www.ted.com/talks/fei_fei_li_how_we_re_teaching_computers_to_understand_pictures#t-362785.



                                                  She adds:




                                                  So by age 3 a child would have seen hundreds of millions of pictures.




                                                  In ImageNet, the leading database for object detection, there are ~14million labeled pictures. So a neural network being trained on ImageNet would have seen as many pictures as a 14000000/5/60/60/24*2 ~ 64 days old baby, so two months old (assuming the baby is awake half of her life).
                                                  To be fair its hard to tell how many of this pictures are labeled. Moreover, the pictures, a baby sees, are not that diverse like in ImageNet. (Probably the baby sees her mother have of the time,... ;).
                                                  However, i think its fair to say that your son will have seen hundreds of millions of pictures (and then applies transfer learning).



                                                  So how many pictures do we need to learn a new category given a solid base of related pictures that can be (transfer) learned from?



                                                  First blog post i found was this: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html. They use 1000 examples per class. I could imagine 2.5 years later even way less is required.
                                                  However, 1000 pictures can be seen by a human in 1000/5/60 in 3.3 minutes.



                                                  You wrote:




                                                  A human child at age 2 needs around 5 instances of a car to be able to
                                                  identify it with reasonable accuracy regardless of color, make, etc.




                                                  That would be equivilant to forty seconds per instance (with various angles of that object to make it comparable).



                                                  To sum up:
                                                  As i mentioned, I had to make a few assumptions. But i think, one can see that the performance is not that different as one might expect.



                                                  However, i believe you ask a great question and here is why:



                                                  2. Would neural network perform better/different if they would work more like brains? (Geoffrey Hinton says yes).



                                                  In an interview https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/, in late 2018, he compares the current implementations of neural networks with the brain. He mentions, in terms of weights, the artificial neural networks are smaller than the brain by a factor of 10.000. Therefore, the brain needs way less iterations of trainings to learn. In order to enable artificial neural networks, to work more like our brains, he follows another trend in hardware, a UK based startup called Graphcore. It reduces the calculation time by a smart way of storing the weights of a neural network. Therefore, more weights can be used and the training time of the artificial neural networks might get reduced.







                                                  share|cite|improve this answer










                                                  New contributor




                                                  BigDataScientist is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                  Check out our Code of Conduct.









                                                  share|cite|improve this answer



                                                  share|cite|improve this answer








                                                  edited Feb 27 at 22:57





















                                                  New contributor




                                                  BigDataScientist is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                  Check out our Code of Conduct.









                                                  answered Feb 27 at 22:51









                                                  BigDataScientistBigDataScientist

                                                  1315




                                                  1315




                                                  New contributor




                                                  BigDataScientist is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                  Check out our Code of Conduct.





                                                  New contributor





                                                  BigDataScientist is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                  Check out our Code of Conduct.






                                                  BigDataScientist is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                  Check out our Code of Conduct.























                                                      1












                                                      $begingroup$

                                                      I am an expert in this. I am human, I was a baby, I have a car, and I do AI.



                                                      The reason why babies pick up cars with far more limited examples is intuition. The human brain already has structures to deal with 3D rotations. Also, there are two eyes which provide parallax for depth mapping which really helps. You can intuit between a car and a picture of a car, because there is no actual depth to the picture. Hinton (AI researcher) has proposed the idea of Capsule Networks, which would be able to handle things more intuitively. Unfortunately for computers, the training data is (usually) 2D images, arrays of flat pixels. In order to not over-fit, much data is required so the orientation of the cars in the images is generalized. The baby brain can do this already and can recognize a car at any orientation.






                                                      share|cite|improve this answer








                                                      New contributor




                                                      Jason Hihn is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                      Check out our Code of Conduct.






                                                      $endgroup$


















                                                        1












                                                        $begingroup$

                                                        I am an expert in this. I am human, I was a baby, I have a car, and I do AI.



                                                        The reason why babies pick up cars with far more limited examples is intuition. The human brain already has structures to deal with 3D rotations. Also, there are two eyes which provide parallax for depth mapping which really helps. You can intuit between a car and a picture of a car, because there is no actual depth to the picture. Hinton (AI researcher) has proposed the idea of Capsule Networks, which would be able to handle things more intuitively. Unfortunately for computers, the training data is (usually) 2D images, arrays of flat pixels. In order to not over-fit, much data is required so the orientation of the cars in the images is generalized. The baby brain can do this already and can recognize a car at any orientation.






                                                        share|cite|improve this answer








                                                        New contributor




                                                        Jason Hihn is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                        Check out our Code of Conduct.






                                                        $endgroup$
















                                                          1












                                                          1








                                                          1





                                                          $begingroup$

                                                          I am an expert in this. I am human, I was a baby, I have a car, and I do AI.



                                                          The reason why babies pick up cars with far more limited examples is intuition. The human brain already has structures to deal with 3D rotations. Also, there are two eyes which provide parallax for depth mapping which really helps. You can intuit between a car and a picture of a car, because there is no actual depth to the picture. Hinton (AI researcher) has proposed the idea of Capsule Networks, which would be able to handle things more intuitively. Unfortunately for computers, the training data is (usually) 2D images, arrays of flat pixels. In order to not over-fit, much data is required so the orientation of the cars in the images is generalized. The baby brain can do this already and can recognize a car at any orientation.






                                                          share|cite|improve this answer








                                                          New contributor




                                                          Jason Hihn is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                          Check out our Code of Conduct.






                                                          $endgroup$



                                                          I am an expert in this. I am human, I was a baby, I have a car, and I do AI.



                                                          The reason why babies pick up cars with far more limited examples is intuition. The human brain already has structures to deal with 3D rotations. Also, there are two eyes which provide parallax for depth mapping which really helps. You can intuit between a car and a picture of a car, because there is no actual depth to the picture. Hinton (AI researcher) has proposed the idea of Capsule Networks, which would be able to handle things more intuitively. Unfortunately for computers, the training data is (usually) 2D images, arrays of flat pixels. In order to not over-fit, much data is required so the orientation of the cars in the images is generalized. The baby brain can do this already and can recognize a car at any orientation.







                                                          share|cite|improve this answer








                                                          New contributor




                                                          Jason Hihn is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                          Check out our Code of Conduct.









                                                          share|cite|improve this answer



                                                          share|cite|improve this answer






                                                          New contributor




                                                          Jason Hihn is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                          Check out our Code of Conduct.









                                                          answered Feb 27 at 22:55









                                                          Jason HihnJason Hihn

                                                          111




                                                          111




                                                          New contributor




                                                          Jason Hihn is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                          Check out our Code of Conduct.





                                                          New contributor





                                                          Jason Hihn is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                          Check out our Code of Conduct.






                                                          Jason Hihn is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                          Check out our Code of Conduct.






























                                                              draft saved

                                                              draft discarded




















































                                                              Thanks for contributing an answer to Cross Validated!


                                                              • Please be sure to answer the question. Provide details and share your research!

                                                              But avoid



                                                              • Asking for help, clarification, or responding to other answers.

                                                              • Making statements based on opinion; back them up with references or personal experience.


                                                              Use MathJax to format equations. MathJax reference.


                                                              To learn more, see our tips on writing great answers.




                                                              draft saved


                                                              draft discarded














                                                              StackExchange.ready(
                                                              function () {
                                                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f394118%2fwhy-do-neural-networks-need-so-many-training-examples-to-perform%23new-answer', 'question_page');
                                                              }
                                                              );

                                                              Post as a guest















                                                              Required, but never shown





















































                                                              Required, but never shown














                                                              Required, but never shown












                                                              Required, but never shown







                                                              Required, but never shown

































                                                              Required, but never shown














                                                              Required, but never shown












                                                              Required, but never shown







                                                              Required, but never shown







                                                              Popular posts from this blog

                                                              El tren de la libertad Índice Antecedentes "Porque yo decido" Desarrollo de la...

                                                              Castillo d'Acher Características Menú de navegación

                                                              Connecting two nodes from the same mother node horizontallyTikZ: What EXACTLY does the the |- notation for...