The Use of Think-aloud Methods in Qualitative Research An Introduction to Think-aloud Methods

T hink-aloud is a research method in which participants speak aloud any words in their mind as they complete a task. A review of the literature has shown that think-aloud research methods have a sound theoretical basis and provide a valid source of data about participant thinking, especially during language based activities. However, a researcher needs to design a process which takes into account a number of concerns, by selecting a suitable task, a role for the researcher, a source of triangulation

Educators today stress our students' need to develop their ability to think and solve problems. Many hope to promote this thinking by using constructivist or problem-based lessons in the classroom. But researchers are only beginning to map out the actual mechanics of human thought processes. How, then, may teacher-researchers find out if our lessons truly do develop student thinking? Off and on, over the past century, psychologists and educational researchers have attempted to answer these questions by using a method called think-aloud to try to see into the minds of individuals. Participants are asked to voice the words in their minds as they solve a wide variety of problems, from mathematical equations to visual puzzles to reading comprehension. Individual researchers and theorists have debated the effectiveness of think-aloud techniques to illuminate thought processes in their particular area of research or pedagogy. As yet, however, there have been few discussions of the application of think-aloud techniques to the qualitative realm of research.
I am a teacher-researcher who has used think-aloud techniques to research the thought processes involved when students combine sentences, an activity believed to promote writing skills (Charters, 2003). In my study, I used a combination of interview and think-aloud reports to explore the processes used by five participants, mostly adult ESL students, to solve several sentence combining problems. The results were interpreted through individual narrative and thick description. I found this approach effective. It not only provided a detailed picture of my participants' thought processes, but also helped to highlight individual differences in response.
In this paper, I will outline the roots of the think-aloud technique in cognitive and psycholinguistic theory and discuss its strengths and weaknesses as a research tool, based on both review of the literature and my own experience. From these findings, I can recommend an effective set of methods for think-aloud research which may avoid some of the limitations encountered by researchers in the past.

The Theory of Think-Aloud Research
Although think-aloud techniques in their current form have their roots in cognitive psychology, to understand the relationship of thought and words it is helpful to go back to Vygotsky's (1962) Thought and Language and its concept of "inner speech." His theory was that the "inner speech" of verbalized adult thought processes evolves from the "egocentric speech" of toddler monologues, also a form of "thinking aloud" with the goal of solving problems. Vygotsky described inner speech as "almost inaccessible to experiments" except through its earlier manifestation as egocentric speech (p. 131-132), but it is likely that the words adult participants in think-aloud studies utter are closely related to inner speech, especially considering their characteristics. Like egocentric speech, "think-aloud protocols" are "elliptical" in that they are usually not expressed in complete, reasoned sentences. Vygotsky noted that "sentences" in egocentric and inner speech are dominated by predicates, since the subject of the "talk" is usually visible and evident to the "speaker," and they may be even more fragmented. Similarly, anyone trying to understand another person's thought processes from think-aloud transcripts will find them more difficult to understand than normal speech or writing. This is a natural reflection of the purpose of inner speech, which is not meant to be communicative to anyone but the thinker.
Another important concept from Vygotsky's (1962) theory involves the relationship between abstract thought and inner speech. Although translation into language is necessary before thought can assume a form which others can understand, much of our thought is not "stored" verbally. As people develop and build our mental networks, our thoughts become increasingly abstract, and words are only part of their elaborate patterns of meaning. Vygotsky did not have the vocabulary of working and long-term memory, storage, and retrieval to clarify these concepts, but his ideas are important to an understanding of what think-aloud methods can and cannot reveal. Researchers need to be aware that even thinking aloud, which makes inner speech external, cannot reveal deeper thought processes in their true complexity because they have to be simplified into words before anyone, even the thinkers themselves, can really know them. This "bottleneck" between the breadth of abstract thought and the narrower, temporal emergence of verbal thought necessarily slows down thought processes. Researchers do not know for certain how much it also changes them before they are verbalized as "inner speech" which can be thought aloud (see also Pressley & Afflerbach, 1995).
A speaker often takes several minutes to disclose one thought. In his mind the whole thought is present at once, but in speech it has to be developed successively…. Precisely because thought does not have its automatic counterpart in words, the transition from thought to words leads through meaning… and then through words. (Vygotsky, 1962, p. 150) Vygotsky's (1962) understanding of the complex and dynamic relationship between thought and verbalized inner speech is useful to remember when studying the theoretical underpinnings of think-aloud methods in the simplified models of information processing theory, best explained in Ericsson and Simon's (1980) seminal study, Verbal Reports as Data. Ericsson and Simon (1980) stressed the importance of the theoretical basis of think-aloud methods and related "introspective" research techniques. Their theory is based on a distinction between working memory, in which concurrent reasoning takes place in verbal form, and long-term memory, where some of the ideas from working memory are eventually stored, not necessarily in words. The goal of think-aloud research is to give the researcher insight into the processes of working memory, but there are several difficulties of which researchers need to be aware. First of all, only "heeded" or noticed information goes into working memory. Also, since working memory has a limited capacity, this information is held there only briefly and can disappear as soon as new thought patterns supersede it. For this reason, only verbal reports which follow very rapidly after a thought process can be supposed to accurately reflect conscious thought, and researchers must focus on the participants' "immediate awareness," not delayed explanations for their actions (Cooper, 1999, Olson, Duffy & Mack, 1984. In addition, there are many thought processes which are not verbalized in working memory, either because they are automatic (such as recognition of familiar words and images) or because their "intermediate" processing passes through so quickly that there is no time to verbalize it (Davis & Bistodeau, 1993, Ericsson & Simon, 1980, Sugirin, 1999. For this reason, researchers need to choose the research task very carefully. Ideally, researchers should aim for thinking-aloud of processes which are naturally verbal (Ericsson & Simon, 1980), perhaps corresponding to Vygotsky's "inner speech." Processes which are not verbal, such as those involving physical actions or visual images, may be distorted when they are translated into words to meet the demands of a think-aloud task. Ericsson and Simon refer to a visual-spatial puzzle-solving study in which the group asked for "overt verbalization" of their strategies was more successful in solving the problem than a control group. Unfortunately for the accuracy of think-aloud research, this implies that requests for indirect verbalization might change the way people think. Researchers who want to use think-aloud techniques to reflect natural thought processes have to design their methodologies with great care to avoid over-influencing their participants. For these reasons, there are a number of potential problems to consider when you use think-aloud methods for research. Olson et al., (1984) stated that using think-aloud technique is one of the most effective ways to assess higher-level thinking processes (those which involve working memory) and that it could also be used to study individual differences in performing the same task. Ericsson and Simon (1980) conclude that even if their view of thought processes is necessarily incomplete, verbal reports such as those from think-aloud data are a "thoroughly reliable" source of information about thought processes (p. 247). Nonetheless, before designing a research plan which involves think-aloud methods, researchers need to decide on the type and level of difficulty of the research task, the degree of prompting which is appropriate, the use of other data to support inferences from think-aloud protocols, and the method of analysis.

Suitable Tasks for Think-Aloud Study
A task for think-aloud study needs to be chosen with care, keeping the cognitive abilities of the participants in mind. Ericsson and Simon (1980) found that demanding tasks creating a "high cognitive load" interfere with verbalization, because other processes crowd verbal information out of working memory. On the other hand, a simple task may also be unsuitable as "the closer readers' activities come to automaticity, the more problematic it may be for readers to describe these automatic or near-automatic happenings" (Pressley & Afflerbach, 1995, p. 132). This is why Akyel and Kamisli (1996) recommend that thinkaloud tasks require "cognitively demanding language use" beyond mere word recognition level so that participants cannot rely on automaticity to perform the task (p. 15-16). For all these reasons, a language-based activity at an intermediate level of difficulty for the target group is probably an appropriate task for think-aloud research because it requires more than an automatic response but should not be cognitively overwhelming. A task which can be broken into shorter units so that it can be worked on one unit at a time is also recommended because it would prevent overload of working memory. "Environmental supports" in the form of written texts which can be easily referred to also free up space in working memory so that higher-level thinking may occur. To see the effect of varying tasks, it may also be helpful to do as I and some other researchers (Johnson, 1992) have done, and include activities at several, successively more challenging levels of difficulty. In general, tasks which employ verbal thoughts naturally should provide the most accurate thinkaloud responses.

The Issue of Prompting
Ideally, participants in a think-aloud study should not need any coaching but should enunciate their inner speech spontaneously. Unfortunately, without some demonstrations and practice, students may not report their thought processes frequently or thoroughly enough to meet the researcher's needs. Nonetheless, some research methods which tried to improve their data by suggesting explicit strategies for students to talk about (Cooper, 1999;Davis & Bistodeau, 1993) or asking specific probing questions (Olson et al., 1984) ran the risk of "leading the witness" and so distorting thought processes more than necessary. Furthermore, Ericsson and Simon (1980) found that repeated practice of a task might promote automaticity before thought processes could be reported.
Some suggested strategies to encourage effective think-aloud responses seem less intrusive than detailed practice or explicit modeling. Sugirin (1999) used a KEEP TALKING sign to remind participants to verbalize all thoughts without addressing them in speech which might interfere with those thoughts. Gibson (1997) suggested a pretask orientation session which briefly explains the rationale and form of think-aloud research to reduce "cold start effect" (p. 58), but thought that researcher modeling might introduce bias into "think-aloud reporting." Nor should researchers prompt participants to use a targeted process. As Pressley and Afflerbach (1995) concluded, "researcher silence about how the [task] might be processed is more defensible than directions that prompt particular processes, especially when the goal is to learn about the processes people naturally use" (p. 132-133). After all, participant pauses and omissions may reveal interesting insights into individual differences in thought processes, while any shortfall in the think-aloud results may also be supplemented by other sources of data gathering. In order to ensure that thinkaloud reports as are complete as possible, it is necessary to have a "reliability check" to provide "triangulation" (Sugirin, 1999, p. 2).

The Need for "Triangulation"
Few researchers in the think-aloud literature relied on think-aloud transcripts as their only source of data gathering. As Ericsson and Simon (1980) stressed, think-aloud data from working memory will always be incomplete and exclude a number of thought processes which are not held in working memory long enough to be expressed verbally. Think-aloud utterances may also vary in quantity and quality. In response to these problems, the most widely used follow-up strategy is retrospective questioning. Although this involves difficult retrieval from long-term memory, and may be biased by researcher questioning, Nunan (1992) concluded that these problems are offset when combined with the concurrent data from working memory. When retrospective questioning is used only to illuminate and expand on think-aloud results, it may add depth of information about the participant's thought processes. Rankin (1988) also recommended a retrospective analysis, particularly for those participants who had difficulty with the think-aloud method, while Pressley and Afflerbach (1995) point out that participants' ability to describe their thought processes may provide helpful information on their metacognitive skills. Qi (1998) suggested that a follow-up interview may also allow the participants to "validate" the researchers' interpretation of their think-aloud utterances; this would be particularly important when some of those utterances may be in the participants' first language. Davis and Bistodeau (1993) used both a "recall protocol" focused on the content of the task (reading comprehension) and an "exit interview" to help interpret the think-aloud data. However, Gibson (1997) warned that it is better to let the participants recall the task as far as possible without using the audiotape as a prompt. He also noted that retrospective data are most reliable when the time lag between think-aloud recording and exit interview is very short.
There are a number of other strategies for data collection suggested by the literature. Akyel and Kamisli (1996) supplemented their think-aloud data with a questionnaire. Some of the studies reviewed by Ericsson and Simon (1980) involved control groups which performed the same task without thinkaloud reporting. However, there are so many potential causes of variations in think-aloud response that a test group/control group dichotomy is of limited help. Although some researchers have used videotaping as a clue to participants' physical action during think-aloud activities, Sugirin (1999) preferred casual researcher observation because it was less intimidating to the participants. Cullum (1998) listed a number of nonverbal cues which a researcher may note quietly; these include pauses, smiles, misreadings, and periods of silence which may indicate overload of working memory. Fontana and Frey (2000) also pointed out the need for researchers to notice not only the participants' choice of verbal language and terminology, but also their nonverbal communication, including "pace of speech,…body movement…and variations in vocal tone and volume" (p. 660-661). Noting what is not said might add to understanding of the interview transcripts, and help teachers, for example, "to visualize what…students are not hearing in their classrooms" (Peshkin, 2001, p. 247).
With my own participants, I wished to keep their think-aloud behaviour as natural as possible, even if it meant that the degree of information varied among them. Based on these suggestions, I collected my data by audiotaping participants' think-aloud utterances while sitting next to, not across from, them to minimize intimidation (Nunan, 1992) and taking informal written notes on their behaviour and tone of voice, trying to relate these observations to their progress throughout the task. During the "exit interview," after the tape recorder was turned off, I was able to add to my notes translations of first language utterances and participants' explanations of ambiguous passages, pauses, and other nonverbal responses. With this raw material in hand, I could then address the most challenging stage of the think-aloud research process: interpretation.

Interpretation of Think-Aloud Data
Since it would interrupt the natural flow of "inner speech" to ask participants for explanations of their actions during the think-aloud activity, and explanations given retrospectively are unreliable because they might not reflect the actual content of working memory (Ericsson & Simon, 1980), researchers must be prepared to make their own inferences as they interpret think-aloud data. Many participants' utterances are ambiguous; they may repeat a phrase using various intonations as they search for its meaning, but they do not articulate this speculation. Other theorists also pointed out the need for researchers to make inferences (Davis & Bistodeau, 1993). Although Rankin (1988) warned that any reconstruction of participant remarks should be "literal" and related closely to context, he admitted that some responses may represent more than one thought process and need to be interpreted as such. Olson et al., (1984) described their "impressions" of common participant responses (p. 265), which also involved inferencing. They referred to this as the "qualitative" part of their data.
In general, though many researchers used inference in describing and classifying their data, only one of those whom I examined used a completely qualitative approach (Cullum, 1998). Pressley and Afflerbach (1995) in their comprehensive study of think-aloud research into reading, also found that "various types of quantitative analyses were employed in the majority of studies" (p. 17). Is this widespread preference for quantitative research an indication that think-aloud data are most suitably analyzed this way? Ericsson and Simon (1980) gave think-aloud research respectability among social scientists by arguing that researcher inferences about the meaning of think-aloud utterances are as objective as behaviourist inferences about the purpose of visible actions and can be quantified with equal validity. But, in the end, how objective can purely quantitative analyses be? Johnson's (1992) process-oriented study of ESL students combining sentences, a similar study to my own, may serve as a good illustration of some limitations to quantitative analysis of think-aloud data.
Like the other researchers, Johnson (1992) used a small number of participants (9), presumably because think-aloud procedures are so expensive and time consuming (see also Pressley & Afflerbach, 1995). There was no exit interview, and the researcher disregarded pauses, "false starts," and other observable behaviours (p. 65). When the tapes were transcribed, participant utterances were divided into "communication units" equivalent to grammatical sentences: "a main clause and all the subordinate clauses attached to it" (p. 65), which correspond to the psycholinguistic concept of the "T unit" or "minimal terminable unit" of meaning (Cooper, 1999, p. 242). These units were then tabulated under one of 10 possible "cognitive strategies" and statistically analyzed to find which strategies were most frequently used. The results are plausible and useful, but, as many other quantitative think-aloud researchers also admitted, "limited because of the small sample size" (Johnson, 1992, p. 71). There are other problems with this method. Given that inner speech is "elliptical" or telegraphic, forcing students to produce complete sentences or artificially joining fragmented utterances into grammatically complete "communication units" may lead to misrepresentation of thought processes. Also, assigning each "communication unit" to one cognitive strategy oversimplifies the process; in reality, one utterance may reflect a more complex combination of strategies (Rankin, 1988). Finally, a research hypothesis which looks only for evidence of thought processes common to all participants disregards the existence of different thinking styles (see, for example, Merriam & Caffarella, 1999), in addition to different learning styles and intelligences. If only one participant uses a specific strategy to solve a problem, are his data less relevant than those of the others?
Think-aloud studies that employed quantitative methods have sought to generalize the experiences of certain populations of students. Researchers, such as Johnson (1992), have sought to generalize their subjects' experience by codifying their utterances and identifying numerical patterns amongst them. It may be equally valuable, however, to avoid such generalization, especially since differences among participant responses have been underplayed in previous, mostly quantitative, research (Pressley & Afflerbach, 1995). For all these reasons, the depth, variety, and complexity of thought processes may be equally effectively interpreted using a qualitative approach. Qualitative researchers, who prefer to describe, rather than codify, may allow a clearer picture of the variability of individual experience to emerge. However, there are a number of issues to bear in mind before qualitative researchers can present a fair and accurate interpretation of think-aloud data, including the interpretive angle, the nature of the participants, and the treatment of their data.

Think-Aloud Research through Case Study
Think-aloud researchers, as well as asking the narrative question "what happened," are grappling with the expository question "how does it (the target process) work?" They need a method that is flexible enough to accommodate such a blending of interpretive angles. Rankin (1988), although his own analysis was quantitative, suggested treating each think-aloud participant as a "small" "tightly focused" case study (p. 122). Such an approach may be appropriate for qualitative think-aloud researchers, for a number of reasons.
A pragmatic justification emphasizes the applied nature of case study research. As a method it can be advocated on grounds that it is more useful, more appropriate, more workable than other research designs for a given situation. Knowledge produced by case study would then be judged on the extent to which it is understandable and applicable…a pragmatic conception of truth undergirds this approach. (Merriam, 1988, p. 20) Many theorists have demonstrated that effective case study may be used to interpret results from a naturalistic or holistic point of view (Stake, 1994). According to these theorists, case studies can be non-experimental "when description and explanation (rather than prediction based on cause and effect) are sought" and when "it is impossible to identify all the important variables ahead of time" (Merriam, 1988, p. 7). Case studies also can deal with a "full variety of evidence-documents, artifacts, interviews and observations" (Yin, 1984, p. 20), and employ "any and all methods of gathering data, from testing to interviewing" (Merriam, 1988, p. 9). Most important, descriptive case study also allows the researcher to "illustrate the complexities of a situation,…show the influence of personalities, [and]… include vivid material -quotations, interviews…and so on" (Merriam, 1988, p. 14). Such an approach may give researchers greater flexibility to describe results naturally, without having to tailor them to a preconceived hypothesis, and to draw inferences from whichever source, think-aloud transcripts, exit interview or informal observation, is most revealing. This is what Cullum (1998) did in her descriptive, in-depth study of her daughter's experiences using think-aloud techniques to encourage her success and enjoyment in reading. A case study approach enabled Cullum to describe her subject, Annie, in great detail, making her family background, personality, and interests clear to the reader and providing a nuanced narrative of the changes Annie experienced through the course of the think-aloud experiment. This experiment did not pretend to be generalizable, for if a different child were exposed to the same experimental process, it is highly unlikely that the results would be the same. This does not negate the validity of Cullum's study and its findings that Annie did in fact benefit from think-aloud in a number of significant ways.

Choice of Participants
Although the typical case study, like Cullum's (1998), involves in-depth and possibly lengthy observation of one participant or situation, what Stake (1994) designates "collective case study" better fits a typical think-aloud study which explores the experiences of a small number of participants. Using more than one participant enables researchers to observe a wider range of responses, but, as Stake asserts, the choice of the cases need not necessarily be deliberate. "They may be similar or dissimilar, redundancy and variety each having voice. They are chosen because it is believed that understanding them will lead to better understanding…about a still larger collection of cases" (p. 237). Qualitative research is most effective when the researcher "develops categories from informants rather than specifying them in advance of the research" (Creswell, 1998, p. 77). This is because the naturalist understands that every research subject is unique, and thus "the concept of 'population' is itself suspect" (Lincoln & Guba, 1985, p. 298). Qualitative researchers believe that anyone they work with will have something worthwhile to reveal and that individual responses, however they could be categorized, are ultimately unique.
In my own study, I invited participation from any of my former students who expressed an interest. Those who volunteered could have been categorized according to sex, age, and background, but I did not focus on such superficial differences. Even if all my participants had turned out to be, say, female Chinese university graduates in their early 30s, there still might have been great variability in their experience. How could I have determined whether any differences were due to their nationality, language background, gender, or some completely different factor, such as learning style? It was more important to have participants who were interested in the study and willing to help me explore their writing processes as quasi-researchers than participants who represented a pre-selected range of backgrounds.

Qualitative Interviewing and Transcription
The qualitative model for interview-based research maintains that "questions…to which interviewing and participant observation may lead can only be toughly determined at that outset" (Howe & Moses, 1999, p. 40) and that "unstructured interviewing can provide a greater breadth of data than the other types" (Fontana & Frey, 2000, p. 652). Although the think-aloud data itself are centred on a structured task, the gathering of peripheral information about the participants may be more open-ended. In order to get some sense of the complexities of my own participants' situations, I began by encouraging them to tell me about their backgrounds and interests in a preliminary interview, using prompting questions only when they seemed unsure of what to say. In the post-interview, I invited participants to discuss their feelings about the thinkaloud experience, and reflect further on their thought processes. At all times I treated each participant as an individual, fellow adult, and, to some extent, fellow researcher, aiming for "the establishment of a human-to-human relation with the respondent and the desire to understand rather than to explain" (Fontana & Frey, 2000, p. 654). My study benefited from such an approach because open-ended discussion allowed some fascinating details to emerge that were not addressed by the planned questions. I also gained insights on what had been said and unsaid as I struggled to prepare an accurate and detailed transcription.
Transcription is a stage where qualitative researchers attempt to show respect for both their participants and their future readers, bearing in mind that "transcription itself is an interpretive process" (Kvale, 1996, p. 160). Qualitative transcribers should aim to describe the think-aloud activity in detail, including non-verbal observations such as tone of voice, while acknowledging that their own interpretation of the experience affects the way it will be written down. Kvale further pointed out that "analysis of the transcribed interviews is a continuation of the conversation which started in the interview situation" (p. 280), and it makes sense for the second parties in the conversation, the interviewers, to have their voices included at this stage too. In my own case, I attempted to interpret the transcripts holistically, joining together the participants' words and additional comments with my own observations based on the field notes and on my theoretical understanding. Unlike Johnson (1992), I made few explicit comparisons, interpreting each participant's data largely independently of the others' and describing participants' think-aloud utterances and nonverbal response as they appeared, without trying to classify them into hypothetical categories. In this way I, as researcher, was consciously constructing meaning from a variety of sources, but I kept in mind that my own was just one possible interpretation since "qualitative research leads to as many interpretations as there are researchers" (Kvale, 1996, p. 279).
In order that researchers' voices should not drown out the participants', it is standard practice to meet with participants to let them approve interview transcripts. According to Lincoln and Guba (1985), this sort of "member checking" plays a critical part in maintaining the trustworthiness of qualitative or naturalistic research, for several reasons. It ensures that the interpretation accords with what the participant intended by his words, it allows factual errors to be corrected, and it provides an opportunity for the participant to add further thoughts and comments (p. 314). In my own study, as soon as possible after the transcriptions were completed, my participants were invited to review and comment on the summaries I had made of their think-aloud experience, so that I could take into account their own perceptions of their thinking (Firestone, 1987). This also provided an opportunity for further reflection on the whole process.

The Issue of Reciprocity
Thus participants in a qualitative think-aloud study provide information in a number of different ways. They may narrate their relevant background experiences, complete activities while thinking aloud, and reflect on their thoughts immediately after. Later they may meet with the researchers again to review the interpretation of their earlier words and actions. In the end, then, qualitative researchers are asking a great deal more of their participants than merely their informed consent. They require "active assistance…, including a level of research cooperation which frequently amounts to colleagueship" (Wax, 1982, p. 44). This raises the ethical question about parity: Are the participants' potential gains from research studies in line with the demands on their time and intelligence? If participation is truly voluntary, they should be. From feedback during my own study, I can attest that at least some of my participants appreciated an opportunity to express their feelings about their difficulties and successes both in specific English courses and in language learning in general. Some found the audiotaping and transcribing process intellectually intriguing. Some also found that telling their story and seeing it written out in a coherent way helped them to look at their difficulties in context and provided greater self-understanding. These potential benefits may have provided some "reciprocity" for my participants' vital contribution to my study, and participants in other qualitative think-aloud studies may likewise feel rewarded (Dalton, Lantaigne-Richard, & Quattrocchi, 2000, p. 159).

Conclusion
In general, the literature of think-aloud research shows its strong theoretical foundation and confirms its value as a way of exploring individuals' thought processes. Many studies also provide helpful ideas on the design and implementation of research studies, including selecting a task which offers an effective level of cognitive challenge, allowing an authentic outlet for inner speech, and providing "triangulation" through informal observation and an indepth exit interview. However, the literature's emphasis on quantitative analysis may be too restricting. As previous researchers have found, looking for consistent mathematical patterns among statistically insignificant numbers of participants is limited in its usefulness and overlooks the possibility of individual differences. I recommend that future researchers consider designing and interpreting think-aloud research through a qualitative rather than quantitative lens.