Rater-Mediated Assessment of Iranian Undergraduate Students’College Essays: Many-Facet Rasch Modelling

Esfandiari, Rajab

doi:10.22049/jalda.2021.27032.1234

تعداد نشریات	6
تعداد شماره‌ها	118
تعداد مقالات	1,479
تعداد مشاهده مقاله	1,590,751
تعداد دریافت فایل اصل مقاله	1,488,615

	Rater-Mediated Assessment of Iranian Undergraduate Students’College Essays: Many-Facet Rasch Modelling
Journal of Applied Linguistics and Applied Literature: Dynamics and Advances
مقاله 6، دوره 9، شماره 1 - شماره پیاپی 17، تیر 2021، صفحه 93-119 اصل مقاله (5.57 M)
نوع مقاله: Research Article
شناسه دیجیتال (DOI): 10.22049/jalda.2021.27032.1234
نویسنده
Rajab Esfandiari^*
Associate Professor, Department of English Language, Faculty of Humanities, Imam Khomeini International University, Qazvin Iran
چکیده
In rater-mediated assessments, the ratings awarded to language learners’ written, or spoken, performances do not necessarily reflect their language abilities because a number of other construct-irrelevant factors may affect the knowledge they demonstrate. Rater subjectivity and rating scales are among the variables possibly influencing the final results. The purpose of the present study was to examine the extent to which university students’ ratings on their essays mirrored the effect of these two factors. To that end, 150 Iranian EFL teachers rated ten five-paragraph essays BA students had written as their course requirements at Imam Khomeini International University. The raters used two rating scales to rate the essays on a number of assessment criteria. The study rested on a partial rating design, and the Rasch-based computer program, FACETS, was used to analyze the data. Results of Facets analyses showed raters differed considerably in the amounts of severity they exercised when rating the essays. The results also showed rater bias interactions with holistic rating scales. The implications of the findings for proposing procedures for reducing the effects of such extraneous variables are discussed.
کلیدواژه‌ها
Analytic Scales؛ Bias؛ Holistic Scales؛ Rater Subjectivity؛ Severity

مراجع
Bachman, L. F., Lynch, B. K., & Mason, M. (1995). Investigating variability in tasks and rater judgments in a performance test of foreign language speaking. Language Testing, 12(2), 238-257.https://doi.org/10.1177/026553229501200206 Barkaoui, K. (2010). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54-74.https://doi.org/10.1080/15434300903464418 Bond, T., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences. Routledge. Bonk, W. J., & Ockey, G. J. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89-110.https://doi.org/10.1191/0265532203lt245oa Cronbach, L. I. (1990). Essentials of psychological testing (5th ed.). Harper and Row. Crusan, D. (2010). Assessment in the second language writing classroom. University of Michigan Press. Crusan, D. (2015). Dance, ten; looks: three: Why rubrics matter [Editorial]. Assessing Writing, 26(1),1–4.https://doi.org/10.1016/j.asw.2015.08.002 Dempsey, M. S., PytlikZillig, L. M., & Bruning, R. H. (2009). Helping preservice teachers learn to assess writing: Practice and feedback in a Web-based environment. Assessing Writing, 14(1), 38-61.https://doi.org/10.1016/j.asw.2008.12.003 Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly: An International Journal, 2(3), 197-221.https://doi.org/10.1207/s15434311laq0203_2 Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 5(2), 155–185.https://doi.org/10.1177/0265532207086780 Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments (2^nd edition). Frankfurt: Peter Lang. Elder, C., Knoch, U., Barkhuizen, G., & von Randow, J. (2005). Individual feedback to enhance rater training: Does it work? Language Assessment Quarterly, 2(3), 175-196.https://doi.org/10.1207/s15434311laq0203_1 Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a Many‐Faceted Rasch Model. Journal of Educational Measurement, 31(2), 93-112.https://doi.org/10.1111/j.1745-3984.1994.tb00436.x Engelhard, G., & Wind, S. A. (2017). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge. Farhady, H., Jafarpour, A., & Birjandi, P. (1994). Testing language skills: From theory to practice. The Organization for Researching and Composing University Textbooks in the Humanities (SAMT). Ferris, D. R., & Hedgcock, J. S. (2014). Teaching L2 composition: Purpose, process, and practice (3^rd ed.). Routledge. Hamp-Lyons, L. (1991). Second language writing: Assessment issues. In B. Kroll (Ed.), Second language writing: Research insights for the classroom(pp. 69-78). Cambridge University Press. Hamp-Lyons, L. (2011). Writing assessment: Shifting Issues, new tools, enduring questions. Assessing Writing, 16(1), 3–5.https://doi.org/10.1016/j.asw.2010.12.001 Harsch, C., & Martin, G. (2013). Comparing holistic and analytic scoring methods: Issues of validity and reliability. Assessment in Education: Principles, Policy & Practice, 20(3), 281-307.https://doi.org/10.1080/0969594X.2012.742422 Hyland, K., & Anan, E. (2006). Teachers’ perceptions of error: The effects of first language and experience. System, 34(4), 509-519.https://doi.org/10.1016/j.system.2006.09.001 Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10(2), 135-159. https://doi.org/10.1080/15434303.2013.769545 Jacobs, H. L., Zinkgraf, S. A., Wormuth, D. R., Hartfiel, V. F., & Hughey, J. B. (1981). Testing ESL composition: A practical approach.Newbury House. Kneeland, N. (1929). That lenient tendency in rating. Personnel Journal, 7, 356-366. Knoch, U. (2011). Rating scales for diagnostic assessment of writing: What should they look like and where should the criteria come from? Assessing Writing, 16(2), 81-96.https://doi.org/10.1016/j.asw.2011.02.003 Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26-43.https://doi.org/10.1016/j.asw.2007.04.001 Knoch, U., Zhang, B. Y., Elder, C., Flynn, F., Huisman, A., Woodward-Kron, R., Manias, E., & McNamara, T. (2020). I will go to my grave fighting for grammar: Exploring the ability of language-trained raters to implement a professionally-relevant rating scale for writing. Assessing Writing, 46, 1-14.https://doi.org/10.1016/j.asw.2020.100488 Kondo-Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 19(1), 3-31.https://doi.org/10.1191/0265532202lt218oa Kuiken, F., & Vedder, I. (2014). Rating written performance: What do raters do and why? Language Testing, 31(3), 329-348.https://doi.org/10.1177/0265532214526174 Lee, H. K. (2009). Native and nonnative rater behavior in grading Korean students’ English essays. Asia Pacific Education Review, 10(3), 387-397.https://doi.org/10.1007/s12564-009-9030-3 Lim, G. S. (2012). Developing and validating a mark scheme for Writing. Cambridge ESOL: Research Notes, 49, 6–9. Linacre, J. M. (2004). Optimizing rating scale effectiveness. In E. V. Smith & R.M. Smith (Eds.), Introduction to Rasch measurement (pp. 257–578). JAM Press. Linacre, J. M. (2007). Facets Rasch measurement computer program (Version 3.64.2) [Computer software]. Winsteps.com. Linacre, J. M. (2011). FACETS (Version 3.68.1) [Computer software]. Chicago, IL: MESA Press. Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54-71.https://doi.org/10.1177/026553229501200104 Marefat, F., & Heydari, M. (2016). Native and Iranian teachers’ perceptions and evaluation of Iranian students’ English essays. Assessing Writing, 27(1), 24-36.https://doi.org/10.1016/j.asw.2015.10.001 McNamara, T. F. (1996). Measuring second language performance. Addison Wesley Longman. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement, 3rd ed. (pp. 13–103). American Council on Education and Macmillan. Mousavi, S. A. (2012). An encyclopedic dictionary of language testing. Rahnama Press. Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422. Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II.Journal of Applied Measurement,5(2), 189-227. North, B. (2003). Scales for rating language performance: Descriptive models, formulation styles, and presentation formats. TOEFLMonograph, 24(pp. 1-106).file:///C:/Users/RAJABE~1/AppData/Local/Temp/NORTHETS2003.pdf Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413-428.https://doi.org/10.1037/0033-2909.88.2.413 Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling. Language Testing, 22(1), 1–30.https://doi.org/10.1191/0265532205lt295oa Upshur, J. A., & Turner, C. E. (1999). Systematic effects in the rating of second-language speaking ability: Test method and learner discourse. Language Testing, 16(1), 82–11.https://doi.org/10.1177/026553229901600105 Weigle, S. C. (2002). Assessing writing. Cambridge University Press. Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Palgrave MacMillan. White, E.M. (1985). Teaching and assessing writing. Jossey-Bass. Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305-335.https://doi.org/10.1177/026553229301000306 Wigglesworth, G. (1994). Patterns of rater behaviour in the assessment of an oral interaction test. Australian Review of Applied Linguistics, 17(2), 77–103. https://doi.org/10.1075/aral.17.2.04wig Wind, S. A. (2020). Do raters use rating scale categories consistently across analytic rubric domains in writing assessment? Assessing Writing, 43, 1-14.https://doi.org/10.1016/j.asw.2019.100416
آمار تعداد مشاهده مقاله: 640 تعداد دریافت فایل اصل مقاله: 751

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

آمار

Rater-Mediated Assessment of Iranian Undergraduate Students’College Essays: Many-Facet Rasch Modelling