Abstract
The researchers in this study undertook development of a webquest evaluation rubric and investigated its reliability. The rubric was created using the strengths of the currently available webquest rubrics with improvements based on the comments provided in the literature and feedback received from educators. After the rubric was created, 23 participants were given a week to evaluate three preselected webquests using the latest version of the rubric. A month later, the evaluators were asked to reevaluate the same webquests. The statistical analyses conducted on this rubric demonstrated high levels of reliability.
A webquest can be defined as “an inquiry-oriented activity in which some or all of the information that learners interact with comes from resources on the Internet” (Dodge, 1997, para. 2). Webquests provide a way to make use of the Internet, incorporating sound learning strategies. Rather than simply pointing students to websites that may encourage them to cut and paste, well-structured webquests direct students to Internet resources that require the use of critical thinking skills and a deeper understanding of the subject being explored (March, 2003).
The way a webquest activity is designed discourages students from simply surfing the Internet in an unstructured manner. A webquest is constructed and presented in six parts called building blocks: introduction, tasks, process, resources, evaluation, and conclusion. Similar to a lesson plan, a webquest organizes students’ learning experiences using these building blocks and allows the teacher to evaluate learning outcomes.
Student centered and inquiry based, the webquest is generally constructed around a scenario of interest to students who work in small groups by following the steps in the webquest model to examine the problems, propose hypotheses, search for information with the Web links provided by the instructor, analyze and synthesize the information using guided questions, and present solutions to the problems. (Zheng, Perez, Williamson, & Flygare, 2008, p. 296).
The critical attributes of a webquest activity include an introduction that sets the stage and provides some background information, a task that is doable and motivating, a set of web-linked information sources needed to complete the task, a description of the process the learners should go through to accomplish the task, some guidance on how to organize the information, and a conclusion that brings closure to the quest and reminds participants of what they have learned. (Dodge, 1997). The webquest method has been widely adopted in K-16 education (Zheng et al., 2008). Since its inception, the webquest model has been embraced by many educators, and consequently, numerous webquests have been created by teachers for all grade levels (MacGregor & Lou, 2004/2005).
One reason webquests have gained popularity is because they can be adapted by teachers. To design a successful webquest, teachers need to ‘‘compose explanations, pose questions, integrate graphics, and link to websites to reveal a real-world problem” (Peterson & Koeck, 2001, p. 10). Teachers report that the experience of designing and implementing webquests helps them ‘‘discover new resources, hone technology skills, and gain new teaching ideas by collaborating with colleagues’’ (p. 10).
Since webquests challenge students’ intellectual and academic ability rather than their simple web searching skills, they are said to be capable of increasing student motivation and performance (March, 2004), developing students' collaborative and critical thinking skills (Perkins & McKnight, 2005), and enhancing students’ abilities to apply what they have learned to new learning (Pohan & Mathison, 1998). Thus, webquests have been widely adopted and integrated into K-12 and higher education curricula (Zheng, Stucky, McAlack, Menchana, & Stoddart, 2005) and several staff development efforts (Dodge, 1995).
Many studies have been conducted to determine the effects of webquests on teaching and learning in different disciplines and grade levels. Researchers have claimed that webquest activities create positive attitudes and perceptions among students (Gorrow, Bing, & Royer, 2004; Tsai, 2006), increase the learners' motivation (Abbit & Ophus, 2008; Tsai, 2006), foster collaboration (Barroso & Clark, 2010; Bartoshesky & Kortecamp, 2003), enhance problem-solving skills, higher order thinking, and connection to authentic contexts (Abu-Elwan, 2007; Allan & Street, 2007; Lim & Hernandez, 2007), and assist in bridging the theory to practice gap (Laborda, 2009; Lim & Hernandez, 2007).
Although the Internet houses thousands of webquests, the quality of these webquests varies (Dodge, 2001; March, 2003). As a matter of fact, some of them may not be considered as real webquests (March, 2003). March asserted that a good webquest must be able to "prompt the intangible aha experiences that lie at the heart of authentic learning"(March, 2003, p. 42). Both Dodge (2001) and March (2003) indicated that a careful evaluation is needed before adapting a webquest to be used in classroom with students.
Webquests open the possibility of involving students in online investigations without requiring students to spend time searching for relevant materials. The advantage of webquests is the ability to plan and implement learning experiences with teacher identified relevant and credible websites with which the students can work confidently. Considering their increasing use, it is important for educators to be able to find and use high quality webquests.
Although webquests show great promise for enhancing student learning and motivation, the results of using webquests as teaching and learning tools may depend on how well webquests are designed in the first place. One of the biggest problems is that anybody using the right tools can create and publish a webquest online. Unlike books and journal articles that are reviewed and edited before they are published, no formal evaluating process exists to limit what goes on the Internet as a webquest. The result is a large number of webquests, which makes separating high-quality from low-quality webquests difficult. Therefore, careful and comprehensive evaluation of webquest design is an essential step in the decisions to use webquests. Due to their structure, rubrics provide a powerful means by which to judge the quality of webquests.
Evaluation of Webquests With Rubrics
Rubrics are scoring tools that allow educators to assess different components of complex performances or products based on different levels of achievement criteria. The rubric tells both instructor and student what is considered important when assessing (Arter & McTighe, 2001; Busching, 1998; Perlman, 2003).
One widely cited benefit of rubrics is the increased consistency of judgment when assessing performance and authentic tasks. Rubrics are assumed to enhance the consistency of scoring across students and assignments, as well as between raters. Another frequently mentioned benefit is the possibility of providing valid assessment of performance that cannot be achieved through the use of traditional written tests. Rubrics allow validity when assessing complex competencies without sacrificing reliability (Morrison & Ross, 1998; Wiggins, 1998). Another important benefit of using rubrics is the promotion of learning. In other words, rubrics are used as both assessment and teaching tools. This effect is presented in research on various types of assessment such as formative, self, peer, and summative assessment. The explicit criteria and standards that are essential building blocks of rubrics provide students with informative feedback, which in turn, promote student learning (Arter & McTighe, 2001; Wiggins, 1998).
Although many rubrics have been developed for evaluating webquests, only three are widely used. Dodge (1997) listed six critical attributes for a webquest, which was later revised and converted to a webquest evaluation rubric (Bellofatto, Bohl, Casey, Krill, & Dodge, 2001). This rubric is designed to evaluate the overall aesthetics, as well as the basic elements of a webquest. Every category is evaluated according to three levels: Beginning, Developing, and Accomplished. Every cell is worth a number of points. A teacher can score every category of the webquest using these three levels and come up with a score out of a total of 50 points characterizing the usefulness of a webquest.
March (2004) created a rubric for evaluating webquest design called a Webquest Assessment Matrix that has eight criteria (Engaging Opening/Writing, the Question/Task, Background for Everyone, Roles/Expertise, Use of the Web, Transformative Thinking, Real World Feedback, and Conclusion). One unique aspect of this evaluation rubric is that it does not have specific criteria for Web elements such as graphics and Web publishing. March suggested that one person's cute animated graphic can be another's flashing annoyance. The rubric includes eight categories, each of which is evaluated according to three levels; Low (1 point), Medium (2 points), and High (3 points). The maximum score is 24 points.
The enhancing Missouri’s Instructional Networked Teaching Strategies (eMINTS, 2006) National Center also created a rubric based on Dodge’s work. Webquest creators are asked to use this rubric to evaluate their webquest design before submitting it for eMINTS evaluation. The eMINTS National Center then evaluates submissions and provides a link on their website to the webquests that score 65 or more points out of 70 points. Teachers must first use the webquest in their classrooms before submitting it for evaluation. If teachers cannot use the webquest in their own classroom, implementation in another grade-appropriate classroom is viewed as acceptable.
Approved webquests join the permanent collection of eMINTS’s national database of resources for educators. These resources are available to all educators. eMINTS provides professional development programs for teachers in which these resources are shared with all teachers. Therefore, having a webquest accepted by eMINTS provides recognition for webquest creators.
While these webquest design evaluation rubrics are being used by many educators, there have also been discussions and suggestions regarding certain elements of these rubrics. For example, Maddux and Cummings (2007) discussed the lack of focus on the learner and recommended the addition of “learner characteristics” to the Rubric for Evaluating Webquests (Bellofatto et al., 2001):
The rubric ‘Rubric for Evaluating Webquests’ did not contain any category that would direct a webquest developer to consider any characteristics of learners, such as age or cognitive abilities. Instead, the rubric focused entirely on the characteristics of the webquest, which does nothing to ensure a match between webquest’s cognitive demands and learner characteristics, cognitive or otherwise. (p. 120)
Finally, they suggested that teachers who develop and use webquests should be mindful of students’ individual differences, including but not limited to age, grade, and cognitive developmental level. To remind teachers of the importance of these considerations, Dodge’s (1997) second item in his list of webquests’ critical attributes should be modified from “a task that is doable and interesting” to “a task that is doable, interesting, and appropriate to the developmental level and other individual differences of students with whom the webquest will be used” (p. 124).
Webquest design evaluation rubrics are mainly created to help educators identify high-quality webquests from a pool of thousands. This evaluation must be credible and trustworthy and grounded in evidence (Wiggins, 1998). In other words, an assessment rubric should be independent of who does the scoring and yield similar results no matter when and where the evaluation is carried out. The more consistent the scores are over different raters and occasions, the more reliable is the assessment (Moskal & Leydens, 2000).
The current literature provides examples of rubrics that are used to evaluate the quality of webquest design. However, reliability of webquest design rubrics has not yet been presented in the literature. This study aims to fill that gap by assessing the reliability of a webquest evaluation rubric, which was created by using the strengths of the currently available rubrics and making improvements based on the comments provided in the literature and feedback obtained from the educators.
Reliability
Assessments have implications and lead to consequences for those being assessed (Black, 1998), since they frequently drive the pedagogy and the curriculum (Hildebrand, 1996). For example, the high stakes testing in schools has driven teachers and administrators to narrow the curriculum (Black, 1998). They also shape learners’ motivations, their sense of priorities, and their learning tactics (Black, 1998).
Ideally, an assessment should be independent of who does the scoring, and the results should be similar no matter when and where the assessment is carried out, but this goal is hardly attainable. There is “nearly universal” agreement that reliability is an important property in educational measurement (Colton et al., 1997, p. 3). Many assessment methods require raters to judge some aspect of student work or behavior (Stemler, 2004). The designers of assessments should strive to achieve high levels of reliability (Johnson, Penny, & Gordon, 2000). Two forms of reliability are considered significant. The first form is interrater reliability, which refers to the consistency of scores assigned by multiple raters. The second is intrarater reliability, which refers to the consistency of scores assigned by one rater at different points of time (Moskal, 2000).
Interrater Reliability
Interrater reliability refers to “the level of agreement between a particular set of judges on a particular instrument at a particular time” and “provide[s] a statistical estimate of the extent to which two or more judges are applying their ratings in a manner that is predictable and reliable” (Stemler, 2004). Raters, or judges, are used when student products or performances cannot be scored objectively as right or wrong but require a rating of degree (Stemler, 2004).
Perhaps the most popular statistic for calculating the degree of consistency between judges is the Pearson correlation coefficient (Stemler, 2004).
Perhaps the most popular statistic for calculating the degree of consistency between judges is the Pearson correlation coefficient (Stemler, 2004).
One beneficial feature of the Pearson correlation coefficient is that the scores on the rating scale can be continuous. Like the percent-agreement statistic, the Pearson correlation coefficients can be calculated only for one pair of judges at a time and for one item at a time. Values greater than .70 are typically acceptable for consistency estimates of interrater reliability (Barrett, 2001; Glass & Hopkins, 1996; Stemler, 2004). In situations where multiple judges are used, Cronbach’s alpha can be used to calculate the interrater reliability estimates (Crocker & Algina, 1986). Cronbach’s alpha coefficient is used as measure of consistency when evaluating multiple raters on ordered category scales (Bresciani, Zeln, & Anderson, 2004). If the Cronbach’s alpha estimate is low, then the variance in the scores is due to error (Crocker & Algina, 1986).
Intrarater (Test-Retest) Reliability
Intrarater (test-retest) reliability refers to the consistency of scores assigned by one rater at different points of time (Carol, Deana, & Donald, 2007; Moskal & Leydens, 2000). Unlike measures of internal consistency that indicate the extent to which all of the questions that make up a scale measure the same construct, the test-retest reliability coefficient indicates whether or not the instrument is consistent over time and over multiple administrations.
In the case of a rubric, this would mean the same group of evaluators evaluating subjects using the same rubrics on two different occasions. If the correlation between the scores obtained from two separate administrations of the evaluation with the rubric is high, then the rubric is considered to have high test-retest reliability. The test-retest reliability coefficient is simply a Pearson correlation coefficient for the relationship between the total scores for the two administrations. Additionally, intraclass correlation coefficient (ICC) is used when consistency between ratings from the same raters are evaluated.
We developed a webquest rubric and investigated its reliability using multiple measures. The following section describes the process through which the webquest evaluation rubric was created and used for evaluation of three webquests by multiple evaluators. Results of reliability analyses on the rubric and discussions of findings are presented.
Procedures
Construction of the ZUNAL Webquest Evaluation Rubric
The ZUNAL rubric was developed in three stages (Figure 1). First, a large set of rubric items was generated based on the operational definitions and existing literature on currently available webquest rubrics (version 1). This step included item selections from the three most widely used rubrics created by Bellofatto et al. (2001), March (2004), and eMints (2006). In addition, the studies that critiqued current webquest rubrics and offered suggestions were also considered for item selection and modification (Maddux & Cummings, 2007). As a result of this process, this first version of the ZUNAL rubric was created around nine categories (version 1, Appendix A).
Second, graduate students (n = 15) enrolled in a course titled Technology and Data were asked to determine the clarity of each item on a 4-point scale ranging from 1 (not at all) to 6 (very well/very clear). They were also asked to supply written feedback for any items that were either unclear or unrelated to the constructs. Items were revised based on the feedback (version 2, Appendix A). Finally, K-12 classroom teachers (n = 23) who are involved with webquest creation and implementation in classrooms were invited for a survey that asked them to rate rubric elements for their value and clarity. Items were revised based on the feedback (final version, Appendix B).
No comments:
Post a Comment