assessment of deep word knowledge in elementary and advanced iranian efl learners: a comparison of selective and productive wat tasks

testing plays a vital role in any language teaching program. it allows teachers and stakeholders, including program administrators, parents, admissions officers and prospective employers to be assured that the learners are progressing according to an accepted standard (douglas, 2010). the problems currently facing language testers have both practical and theoretical implications but the first important issue is the problem of specifying language abilities and other factors that affect performance on language tests precisely enough to provide a basis for test development (bachman, 1996). a fundamental consideration in development and use of tests is being able to identify and eliminate the effect of various factors which influence the performance of students and test results other than the real language ability being tested. one of these factors is the type of test task which is used to assess a construct. finding appropriate test tasks to assess different features of language has been the concern of all language teachers as a way to ensure the sufficiency of their teaching method. testing vocabulary knowledge as the most fundamental knowledge needed to learn any language should be given primary attention in language testing. because adequacy of lexical knowledge and lexical skills is one of the key issues which can guarantee a successful language learning and language use. most of second language knowledge of individuals comes from reading materials comprehension of which seeks sufficient vocabulary knowledge. so assessment of word knowledge and using its results in teaching vocabulary are of great importance. vocabulary knowledge can be looked at from two dimensions: breadth and depth of word knowledge (read, 1993). the former deals with the number of words which an individual knows or in other words his/her vocabulary size. the latter concerns how much an individual knows about every single word. thus learning new words is more than the acquisition of isolated lexical units (read, 2004). new words are embedded in a lexical network and are connected to their similar words already existed in the schemata. this netlike connectivity identifies the depth of individuals vocabulary knowledge. still there are some researchers that characterize a third dimension to vocabulary knowledge. meara (1996) and laufer and nation(2001) suggested that an additional component of word knowledge is lexical accessibility or fluency. this dimension will be discussed later as one of the goals of this study is to find the relationship between the second and third dimension of word knowledge, i.e. depth of word knowledge and its accessibility. the primary purpose of this study is to find an appropriate way to assess the second dimension of vocabulary knowledge, i.e. the depth of vocabulary knowledge. till now, several studies have been done on assessment of deep word knowledge all of which focus on word association as a criterion for testing depth of word knowledge. each of these studies has used one of the two basically different tasks as word association test (wat): selected response tasks and productive response tasks. (greidanus & nienhuis, 2001; meara & fitzpatrick, 1999; read, 1993; schoonen & verhallen, 2008; verhallen, 1994; wesche & paribakht, 1996). read (1993) designed a simple selected response task as a word association test (wat93) to assess learners deep word knowledge. in his test each item represented a stimulus word which was followed by eight other words. the test takers who were university students had to choose four out of these eight words which they thought were related to the stimulus word. he found that native speakers of english posses a stable word association pattern but learners of english as a second language develops unstable and diverse word association patterns which are mostly based on phonological aspects of words. another example for a selected response task is a test task developed by schoonen and verhallen (2008) who designed a wat task adapted from read (1993) .they adapted the level of stimulus words to the level of elementary students and provided 6 options instead of 8 options for each stimulus word with semantically related distracters unlike distracters used by read which were semantically unrelated to the stimulus word. test takers had to draw line to connect the stimulus word and 3 related words in the item. the results showed that this type of format had been appropriate for assessing depth of word knowledge in pre-intermediate levels. greidanus and nienhuis( 2001)produced a word association test similar to the ones developed by read(1993) and verhallen and schoonen (2008) with a more focus on the nature of distracters. their concern was whether semantically related distracters as used by verhallen and schoonen (2008) can work better than semantically unrelated distracters like what was used by read (1993). they found semantically related distracters more appropriate for assessing deep word knowledge in advanced learners. in addition to selected response tasks to assess depth of word knowledge there had been some other efforts by researchers to examine this construct (lexical knowledge) by productive response test tasks (e.g. meara & fitzpatrick, 1999; verhallen, 1994; wesche & paribakht, 1996). meara and fitzpatrick (1999) developed lex 30 test which consisted of 30 items each representing a stimulus word and asking test takers to provide a set of words which they recalled when they thought about the stimulus word. they did this study with 46 adult participants with different mother languages and correlated the test results with results of wat93 tests used to assess vocabulary knowledge of native speakers, and found this task appropriate for non-native speakers too because it proved a high correlation with some acceptable vocabulary tests such as native speaker word association test developed by postman and keppel (1970) and kiss et al. ( 1973 ). as another productive task, wesche and paribakht (1996) developed a vocabulary knowledge scale (vks) to assess the productive vocabulary knowledge in a linear way , by asking learners to produce whatever they know of a stimulus word ( translation of the word or composing a sentence for it or writing a synonym or antonym). they also included a self assessment section to each item which asked students to express their evaluation of their familiarity with the stimulus word by filling this section. read (2000) suggested that scoring of such a test lacks reliability. another productive task to measure depth of word knowledge is structured interview provided by verhallen (1994), who interviewed dutch children to know their deep knowledge about stimulus words. he used simple words for this purpose and found that native speakers of dutch provided more paradigmatic responses than learners of dutch as a second language who mostly provided syntagmatic word association responses. as mentioned above, there had been many research studies to find an appropriate test task to measure depth of word knowledge which used both selected response and productive response tasks to develop special word association tasks (wat) for their purposes .but there have been no agreement on appropriateness of one of these tasks. none of these studies has investigated the efficiency of one type of task against the other in one study. all previous studies in this field include 2 or 3 models of one task to compare and analyze. this study will compare the test takers performance on two types of test task for assessing depth of word knowledge by using one of several models of each task as a testing tool and comparing their efficiency in terms of power of data elicitation. the most famous selective wat task developed by read (1993)(wat93) will be compared with lex30 which is a productive wat task, to study their effect on students performance on wat tasks. in addition to test task, test takers proficiency level can also affect their performance on these two types of wat tests. the effect of the proficiency level on students performance on wat tests has been investigated by some scholars. (e.g., marjolein cremer, daphne dingshoff, meike de beer and rob schoonen, 2010; schoonen and verhallen, 2008; zareva, 2005). marjolein et al (2010) conducted a research study to determine whether older students with higher language proficiency act better in wat tests. they administered a wat test adapted from schoonen and verhallen(2008) with an increase in number of items (108 items ) to 422 children and 54 adults and investigated the effect of age and language background on the learners performance . it was shown that adult learners with higher language proficiency act better in such tests. schoonen and verhallen (2008) administered their wat test to dutch native and nonnative speakers in grade 3 and grade 5 and found that fifth grade students had a better performance in these tests compared to 3rd grade students. zevara (2005) administered such a vocabulary test to 30 native and 34 nonnative speakers of english and divided the nonnative group into two proficiency groups, namely intermediate and advanced, to study the effect of proficiency level on their performance on vocabulary test. she found that advanced students provided more accurate responses compared to intermediate group. investigating the effect of language proficiency on wat tests is the second goal of this study. however these three studies mentioned above have missed some important point in their study. they used the same wat test to assess the performance of both groups. considering the fact that proficiency level of the two groups to be compared are not the same, using the same test without adaptation for their proficiency level does not seem appropriate. the difference showed in their performance on wat test might be attributed to the difference in their language knowledge rather than the type of test task. so there is a need for further research on the effect of proficiency level on the performance on wat tests by using two different wat tests suitable for each group of proficiency. this study aims at researching the effect of this variable by using four different wat tests. two of them will be productive tasks in two different levels and the other two will be selective tasks one for elementary and the other for advanced group. in addition to the type of test task and proficiency level, test takers attitudes may also affect their performance on wat tasks. according to brown (1994), attitudes, like all other aspects of development of cognition in human beings, are a part of interacting affective factors in the human experience. motivation raised from positive attitudes is a star player in success in any complex language task (brown, 1994). to put the learner in the spotlight, test takers attitudes toward these two types of test tasks will be included as a third goal in this study because it is an important criterion to judge the superiority of one of them against the other. although the two types of wat test tasks (productive vs. selective) and their effect on students performance on deep word knowledge assessment is the matter of focus in this study, attempts will be done to investigate the nature of third dimension of word knowledge i.e. fluency or lexical accessibility. as mentioned at the beginning of the proposal word knowledge is a three dimensional knowledge consisting of three components namely, breadth, depth and accessibility of words. when just the first dimension is being tested, it is very easy to develop a large sample of items and measure the total vocabulary an individual knows according to his/her proficiency level. but these tests are criticized for the superficial treatment of each stimulus word only testing one component of word knowledge. the alternative is depth tests which can test a word deeply tapping the learners related cluster of knowledge about that word containing collocations, syntagmatic and paradigmatic related words, synonym, antonym etc. these type of tests are a good measure of vocabulary knowledge if a measure for assessment of accessibility can be added to them. testing accessibility or automaticity in recalling words and its relationship to the depth of word knowledge is the fourth goal of this study. considering the word breadth issue, marzban and hadipour (2012), and akbarian (2010) have reported a positive correlation between breadth of word knowledge and depth of it. but there hasnt been any report on relationship between the second and third dimensions of word knowledge; depth and accessibility of word knowledge. there have been some efforts to test the 3rd dimension of word knowledge all of which used computer programs to measure lexical access time (aizawa and iso,2010; kadota,2010) in which test takers were asked to press a button when they could recognize the target equivalent of the native stimulus word in the computer screen. but none of them investigated its relationship with depth of word knowledge. furthermore the practicality of these tests to assess lexical accessibility is under question in iranian context because of equipment shortcomings. the other concern is that the authenticity of these tests is highly problematic. according to bachman (1996), a test developer should ensure that the test task which we use to elicit a language ability, is near to the characteristics of the use of that ability in the non-test situation and this is the concept of authenticity of a test which refers to the extent to which the tasks required on a given test are similar to normal or real life situation. some students may just press the button without recognizing the target equivalent word and this may measure the time of recalling but not the recalling itself. if a student claims that he recalls the target word, for example the english meaning of a persian word, then he should produce this word or at least chose its equivalent or synonym on computer screen as a proof to his claim. furthermore a stimulus word in an individuals first language may have two or more equivalents in target language. in such testing soft wares students are restricted to the represented equivalent word which is provided by test developer which may be unfamiliar to him/her. he or she may know another equivalent for stimulus word and wish to use it in the real life language use so this opportunity should be provided for him by test administrator. so an oral task or a written format of a recalling test which ask test takers to produce an english equivalent word to a persian stimulus word are authentic alternatives to the previous task mentioned above. to take in to account test fairness as an important consideration in test development however, productive task is again preferred because of creating a situation for test takers to produce freely whatever they recall about the target meaning of a stimulus word. this study will use a written productive test to this end. the performance of students on this test will be correlated to their performance on wat tests to see whether there is a relationship between depth of word knowledge and accessibility of word knowledge. accordingly , this study aims at answering these questions : 1- does type of test task (productive vs. selective) have any effect on test takers performance on testing deep word knowledge? 2- does test takers level of language proficiency affect their performance on wat tests? 3- what are test takers attitudes toward lex 30 and wat93 tests? 4- is there any relationship between depth of word knowledge and accessibility of word knowledge?

