Our understanding of human consciousness – what it is and how it relates to the brain’s activities – is at an early stage. Cognitive scientists have developed models to try to describe how the processes of the brain that underlie learning and remembering operate. These models are simplifications of processes that in reality are highly complex and only partially understood.

To teachers, the usefulness of such models lies in their simplicity. They describe essential aspects of how memory works that practitioners can draw upon alongside their experience and expertise to develop better informed strategies for helping students to learn.

Different Types of Memory

An especially useful model for how memory works makes a distinction between working memory and long-term memory (Atkinson and Shiffrin, 1968). We use our working memory to attend to the here and now, to filter the continuous stream of information coming in from our environment. More important information is passed to our long-term memory, the huge repository where everything we know is encoded in ‘schemas’ of related ideas, facts and procedural knowledge.

The working memory can be thought of as the site of our consciousness. The knowledge in our long-term memory lies outside of our consciousness until we recall it. For example, when asked what a panda looks like you can easily access that information from your long-term memory even though a moment ago you weren’t thinking about pandas.

Our long-term memory is apparently limitless but our working memory is extremely limited, both in terms of how much it can store and for how long (Miller, 1956). For this reason it can easily become overloaded: we have all had the experience of trying to hold a phone number in our working memory for a few seconds and losing it because of a distraction.

‘Learning’ is a process whereby information passes from the working memory to the long-term memory, where it is ‘encoded’ by linking it with what we already know. That is, we build up our knowledge gradually in ever-more-complex networks. To enable students to learn well it is helpful to understand in a little more detail how this process works.

Cognitive Load Theory: What is it and Why it Matters

One theory in particular, Sweller’s ‘Cognitive Load Theory’, has been described by Dylan Wiliam in a tweet in 2017 as ‘the single most important thing for teachers to know’. Sweller’s theory looks at our cognitive architecture and explains how we process information by connecting it to our existing knowledge through increasingly complex schemas. A single schema is like a single unit of information, and the more we know (i.e. hold in our long-term memory), the more our limited working memory is freed up to process new information. To function well, the working memory therefore depends upon the long-term memory to reduce the ‘load’ it experiences.

Learner drivers know what it is like to suffer from working memory overload when they try to attend to their feet, their hands, the road and their instructor all at once. An experienced driver by contrast can do this effortlessly even while doing something else cognitively complex, such as holding a conversation with a passenger. This is because the activities involved in driving have become fully automatic in the long-term memory, so even when paying attention to the road most of the experienced driver’s working memory is free to attend to the conversation. The more relevant knowledge we have assimilated into our long-term memory, the more effective our working memory is at processing incoming and new information. The same principle applies to students learning an academic subject.

It is easy for someone who has deep subject knowledge and can think more or less effortlessly in their discipline to underestimate how quickly a student who lacks that expertise can struggle with cognitive overload when encountering new information. To teach successfully, one needs to remain aware of what it is like not to have that knowledge or that fluency, and to know how to help one’s students to acquire it in stages.

Memory in the Age of Google

The claim that in the age of Google we no longer need to teach students facts, since they have easy access to all the information they could possibly need via their smartphone, betrays a misunderstanding of how thinking works. Until information has been integrated into the long-term memory a person has no choice but to try to engage with it using a very limited working memory (Christodoulou, 2014). This is a highly ineffective way of thinking because the working memory has limited capacity either to hold or to ‘encode’ information, and until someone has built up a large and reliable network of related knowledge in their long-term memory they will have no means of fully understanding the incoming information or of engaging with it critically. For example, an expert can easily spot a spurious argument because they have complex schemas of existing knowledge against which to assess it. A student who is looking a new topic up on Google can’t do that.

Although students can be taught skills such as how to assess an argument in terms of consistency, validity and soundness, the claim that students can be taught transferable thinking skills that they can apply across subjects needs a caveat: certain thinking skills do not exist separately from our long-term knowledge of the thing we are thinking about, and certain thinking skills cannot be applied across different fields of knowledge.

More complex skills, tasks and knowledge should be taught in stages so that the students gradually build up a schema of related knowledge in their long-term memory.

Effective Strategies for Teaching and Learning

A simple way to reduce the load on working memory is to remove distractions while learning. Many students like listening to music while they work because it helps their mood, but this has been shown to impair learning by taking up processing space in the brain (Perham and Currie, 2014). For the same reason they should not keep their smartphone next to them, even if it is switched off (Mendoza et al., 2018). If they are using a device in class it is important that students are not distracted by opening multiple windows and switching focus between their work and other material such as the Internet. The limited working memory simply cannot multitask effectively in that way.

The fact that the mind processes visual and auditory memory separately has implications for how we present it. Presenting two pieces of visual information simultaneously (for example, a diagram with multiple accompanying annotations) splits the attention and loads the working memory; but presenting a piece of visual information with a simultaneous oral explanation aids learning by using two channels to create connected verbal and visual images of the material. This effect (known as ‘dual coding theory’, Paivio, 1990) only applies if the oral explanation complements the visual one, however; putting up a slide of writing and talking over it in words that don’t match the text only splits the attention and overloads the working memory.

More complex skills, tasks and knowledge should be taught in stages so that the students gradually build up a schema of related knowledge in their long-term memory. This effectively increases the capacity of the working memory and allows them to take on more complex information. It’s important to establish what the students already know at the outset and to build upon that knowledge. It’s helpful to give them practice in worked examples or partially completed problems so that they can embed the process in their memory without overloading the working memory. As they become more expert at the process, the scaffolding can be gradually removed.

Learning is enhanced by distributing the process across multiple, spaced-out, short sessions (Capeda et al., 2008). This is because, paradoxically, forgetting aids remembering if one revisits the material just as one has begun to forget it. Students benefit from understanding this when they undertake revision. The common practice among students of last-minute ‘cramming’ before an examination is far from the best way to remember material: it works much better to use ‘spaced repetition’. Moreover, topics are best studied and revised when interleaved with different topics, not in a single block, so that the mind is continually making shifts and discriminating between the topics (Rohrer and Taylor, 2007).

There is a large evidence base to suggest that one of the most effective ways to improve long-term memory is ‘retrieval practice’: recalling information from memory by answering questions, tests or practice essays (Roediger and Karpicke, 2006). This not only assesses what the student knows, it also improves their ability to retain it for later recall (ibid.). It is worth doing this immediately after learning something so as to test for understanding; but to make sure that students have really learned something we should test what they know some time after they learned it, when they have begun to forget it. The cognitive effort involved in recalling the material helps to embed it in long-term memory. Doing this repeatedly increases the ease with which we connect new material with our existing knowledge. Indeed, the impacts of ‘retrieval practice’ are especially strong in stressful situations such as high stakes examinations because the practice establishes multiple pathways in the brain which circumvent the impairment to memory that stress causes (Smith et al., 2016).

Evidence for ‘what works’ in teaching and learning requires careful handling. It can be tempting to reach for headline findings that seem to offer clear guidance without looking at the research behind the headline. The evidence base may be thin or contradictory. It may have been generated not in classrooms with schoolchildren but in laboratories with psychology undergraduates. ‘Learning’ can describe a wide range of activities, and a technique that applies to simple retrieval of facts may not apply to learning that leads to a complex understanding. Then there is what Steve Higgins (2018) refers to as the Bananarama Principle: ‘It ain’t what you do, it’s the way that you do it’. Knowing that a technique has been shown by research to work in a laboratory does not tell a teacher how to apply it effectively with a particular class at a particular moment.

Nevertheless, research evidence can be a useful corrective to uninformed assumptions. We all tend to overvalue techniques for learning that feel easy (the so-called ‘fluency effect’). Reviewing a topic by reading through material, highlighting key points, and re-organising notes will give an impression of familiarity with the material which is quite different from having a deep understanding of it and being able to recall and use it in a different context.

Retrieval practice, distributed practice and interleaving constitute what Bjork (1994) has described as ‘desirable difficulties’: manipulations during learning that actually improve long-term performance and memory. Less experienced students are unlikely to persevere with them because they give an impression of slow progress. Students need to be taught to understand them and stick with them because the evidence suggests that they make learning more permanent.


Atkinson, R.C. & Shiffrin, R.M. (1968). Human memory: A proposed system and its control processes. In Spence, K.W.; Spence, J.T. (eds.). The psychology of learning and motivation. New York: Academic Press. pp. 89–195.

Bjork, R.A. (1994). ‘Memory and metamemory considerations in the training of human beings.’ In J. Metcalfe & A. Shimamura
(eds). Metacognition: knowing about knowing. Cambridge, Mass: MIT Press, pp.185-205.

Cepeda, N.J., Vul, E., Rohrer, D., Wixted, J.T., & Pahler, H. (2008). Spacing effects in learning: a temporal ridgeline of optimal retention. Psychological Science, 19(11), 1095-1102.

Christodoulou, D. (2014). Seven Myths About Education. Routledge. Myth Four:

Higgins, S.E. (2018). Improving Learning: Meta-Analysis of Intervention Research in Education. Cambridge: CUP.

Mendoza, J.S. et al. (2018). The effect of cellphones on attention and learning: The influences of time, distraction, and nomophobia. Computers in Human Behaviour, 86, 52-60.

Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review. 63 (2), 81–97.

Paivio, A. (1990). Mental Representations: A Dual Coding Approach. Oxford Psychology Series.

Perham, N., & Currie, H. (2014). Does listening to preferred music improve reading comprehension performance?. Applied Cognitive Psychology, 28 (2), 279-284.

Roediger III, H.L., & Karpicke, J.D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17(3), 249-255.

Rohrer, D, & Taylor, K. (2007). The shuffling of mathematics problems improves learning. Instructional Science, 35, 481-498.

Smith, A.M., Floerke, V.A., & Thomas, A.K. (2016). Retrieval practice protects memory against acute stress. Science,
354(6315), 1046-1048.