Large Language Models are trained on text corpora of approximately 50 BILLION words or more and, at best, have the common sense of a teenager. However, a teenager learns from an environment of approximately 50 MILLION words. Recent models may have comparable common sense to that of adults and are trained on hundreds of BILLION of words. However, adults are typically exposed to a maximum of hundreds of MILLION words in their environment.
The difference is approximately 1,000 times! It seems important to understand how the human mind is able to learn and use language with much less training than is required for transformer-based models.
In a research paper from 2017 , it was shown that children are typically exposed to around 12,000 words per day. This means that a 10-year-old child would have learned from approximately 50 million words (12,000 x 365 x 10). An adult human at the age of 20 would have had around 100 million words to learn in order to acquire real human-level common sense.
However, this is the highest estimate because it is plausible that not all the words heard by humans are taken into account. Not every word can capture our attention. Therefore, the real difference could be 10,000 times or even 100,000 times.
What mechanism gives humans the ability to form human-level common sense from much less data? Is it some kind of reasoning built on top of neural network architecture? Or is it some kind of conceptual ontology that is learned from the first part of text corpora and used for more efficient training of the subsequent parts of the text corpora? Finding the answer to this question is an interesting pursuit.
 Gilkerson, J., Richards, J. A., Warren, S. F., Montgomery, J. K., Greenwood, C. R., Kimbrough Oller, D., ... & Paul, T. D. (2017). Mapping the early language environment using all-day recordings and automated analysis. American journal of speech-language pathology, 26(2), 248-265.