sustainability Article | My Assignment Tutor

sustainabilityArticleAutomated Assessment in Programming Courses:A Case Study during the COVID-19 EraEnrique Barra 1,* , Sonsoles López-Pernas 1 , Álvaro Alonso 1 ,Juan Fernando Sánchez-Rada 1 , Aldo Gordillo 2 and Juan Quemada 11 Departamento de Ingeniería de Sistemas Telemáticos, Escuela Técnica Superior de Ingenieros deTelecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain;[email protected] (S.L.-P.); [email protected] (Á.A.); [email protected] (J.F.S.-R.);[email protected] (J.Q.)2 Departamento de Sistemas Informáticos, Escuela Técnica Superior de Ingenieros de Sistemas Informáticos,Universidad Politécnica de Madrid, 28031 Madrid, Spain; [email protected]* Correspondence: [email protected]Received: 5 August 2020; Accepted: 7 September 2020; Published: 10 September 2020Abstract: The COVID-19 pandemic imposed in many countries, in the short term, the interruption offace-to-face teaching activities and, in the medium term, the existence of a ‘new normal’, in whichteaching methods should be able to switch from face-to-face to remote overnight. However,this flexibility can pose a great difficulty, especially in the assessment of practical courses with a highstudent–teacher ratio, in which the assessment tools or methods used in face-to-face learning are notready to be adopted within a fully online environment. This article presents a case study describingthe transformation of the assessment method of a programming course in higher education to afully online format during the COVID-19 pandemic, by means of an automated student-centeredassessment tool. To evaluate the new assessment method, we studied students’ interactions with thetool, as well as students’ perceptions, which were measured with two different surveys: one for theprogramming assignments and one for the final exam. The results show that the students’ perceptionsof the assessment tool were highly positive: if using the tool had been optional, the majority of themwould have chosen to use it without a doubt, and they would like other courses to involve a tool likethe one presented in this article. A discussion about the use of this tool in subsequent years in the sameand related courses is also presented, analyzing the sustainability of this new assessment method.Keywords: assessment; assessment process; assessment tools; e-learning; assessment techniques;automated assessment; online education; computer science education1. IntroductionThe global pandemic of COVID-19 led to the suspension of face-to-face teaching activities in manycountries. In the higher education context, the abrupt transformation of classroom teaching into anonline format was carried out practically overnight with the aid of tools such as videoconferencingsoftware for synchronous activities or lecture recording programs for the creation of videos that can beshared with students through learning management systems (LMSs). However, the evaluation processof some courses could not be easily transformed to an online format, since assessing the attainment ofthe course learning objectives is often a complex procedure that supports the whole process of teachingand learning [1]. This evaluation is the result of the previous adaptation of the courses to the EuropeanHigher Education Area (EHEA) guidelines and principles [2] aligned with the national agencies thatgenerate the procedures, recommendations, guidelines, and support documents to implement thoserecommendations, such as ANECA (National Agency for Quality Assessment and Accreditation,Sustainability 2020, 12, 7451; doi:10.3390/su12187451 www.mdpi.com/journal/sustainabilitySustainability 2020, 12, 7451 2 of 24in its Spanish acronym) in Spain [3]. The finalist nature of the evaluation process was abandoned infavor of a new learning-oriented approach, in which feedback—which contributes to the continuousimprovement of learning—gains prominence [4–6].Carless et al. [7] established a conceptual framework for learning-oriented assessment. Carless [8]developed the concept of learning-oriented assessment itself through three characteristic elements:(1) assessment tasks should be designed to stimulate sound learning practices among students;(2) the involvement of students in the assessment process, as exemplified by the development ofevaluative skills; and (3) timely feedback which feeds forward by prompting student engagementand action. This learning-oriented assessment, with its three main characteristic elements, is usuallyimplemented in programming courses in the form of several programming assignments, along witha final face-to-face written exam in which students have to answer several theoretical and practicalquestions, although there is often a mid-term face-to-face exam as well [9,10]. An important distinctionto highlight is that the programming assignments are used for formative assessment, and the writtenexams are used for summative assessment. Taras [11] analyzed both types of assessment and theircharacteristics, and concluded that formative assessment is in fact summative assessment combinedwith feedback that can be used by the learner. Although this is the main difference between summativeand formative assessment, other authors—such as Harlen and James [12]—deeply characterize bothtypes and enumerate many other differences to conclude that, although they have separate functions,they can be used together, complementing each other. Carless states that learning-oriented assessmentcan be achieved through either formative or summative assessment, as long as the central focus is onengineering appropriate student learning [8].With the cancellation of classes due to COVID-19, in Spain, the Network of University QualityAgencies (REACU, in its Spanish acronym), together with ANECA, made public an agreement inwhich universities were requested to adopt evaluation methodologies that made the best possibleuse of the resources at their disposal, aligning themselves with the quality standards in force in theEuropean Higher Education Area (EHEA), so that the following general criteria were met [13,14]: (a)the use of different assessment methods, based on continuous assessment techniques andindividual tests; (b) these methods must enable the evaluation of the acquisition of the competences and learningoutcomes of the subjects; (c)the criteria and methods of evaluation, as well as the criteria for grading, should be made publicwell in advance and included in the subject teaching guides as addenda; In the context of programming courses, meeting the aforementioned criteria required programmingassignments and exams to be transformed to a fully online format. Programming assignments presentseveral benefits for teachers and students. For instance, they can help transfer theoretical knowledgeinto practical programming skills and enhance student programming skills [10]. To fully realize thesebenefits, assessment systems for programming assignments should be used in order to grade theassignments and provide feedback to the students. These systems can be classified in the first place byassessment type. The first type is manual systems, such as the system described in [15], which assiststhe instructor in assessing students’ assignments, but the assessment itself is performed manually bythe instructor. Then, there are automated assessment systems (for example [16], which assess students’solutions automatically). Finally, semi-automated systems, such as [17], assess students’ assignmentsautomatically, but also require that the instructor perform additional manual inspections. Automatedassessment systems can be also classified according to the strategy by which the assessment process istriggered. They can be student-centered, instructor-centered, or hybrid, the latter being a strategy aimedat exploring the strengths of both instructor-centered and student-centered approaches [18]. Overall,automated assessment systems have the potential to facilitate the transformation of programmingassignments into an online format, especially in scenarios with a high student–teacher ratio, with theSustainability 2020, 12, 7451 3 of 24support of other tools within a course LMS, such as the forum, notifications, or direct messages amongteachers and students.The face-to-face written exams also had to be transformed into an online format with the help ofthe available tools that each higher institution provided, although, in many cases, these were limited towhat the LMS allowed, due to the urgency of the transformation and the lack of time to acquire newtools and train the teachers on how to use them.Being a central part of the learning process, assessment is an important influence on students’learning and how they approach it. Entwistle [19] found that students’ perception of the learningenvironment determines how they learn, and not necessarily the educational context in itself. Accordingto Struyven, Dochy and Janssens [20], students approach learning differently depending on theirperceptions of the evaluation and assessment, varying among a deep approach (an active conceptualanalysis that generally results in a deep level of understanding), a surface approach (an intentionto complete the learning task with little personal engagement, often associated with routine andunreflective memorization), and a strategic or achieving approach (an intention to achieve thehighest possible grades by using well-organized and conscientious study methods and effectivetime management).The recent COVID-19 pandemic was very sudden, and led to a paradigm shift in teaching andlearning. Agencies have reacted quickly, creating generic recommendations and guidelines such as theones mentioned earlier, but further research is needed to successfully and efficiently adapt specificcourses to the new requirements. Addressing this research gap is especially urgent in contexts andscenarios in which the adaptation is not straightforward, such as in courses with high student–teacherratios or in practical courses where the assessment tools or methods used in face-to-face learning arenot ready to be adopted within a fully online environment.This article presents a case study describing the transformation of the assessment method of aprogramming course in higher education to a fully online format during the COVID-19 pandemic bymeans of an automated student-centered assessment tool. To evaluate the new assessment method,we studied students’ interactions with the tool, as well as students’ perceptions (meaning “the way thatsomeone thinks and feels about a company, product, service, etc.” [21]), measured with two differentsurveys: one for programming assignments and one for the final exam. The results obtained haveallowed us to analyze the sustainability of the newly developed assessment method and how it shouldbe improved for future editions of the same course and related ones, thus filling the research gapidentified earlier.The rest of the article is organized as follows. Existing literature on automated assessment systemsis reviewed in the next section. Section 3 explains the student assessment method followed in the casestudy presented, and its evaluation. Then, Section 4 shows and discusses the results obtained fromsaid evaluation. Lastly, Section 5 finishes with the conclusions of the article with an outlook on futurework, and Section 6 presents the limitations of the case study.2. Related WorkQuite a number of literature reviews on automated assessment systems for programmingassignments have been published over the past years [18,22–29]. These literature reviews haveclassified these systems according to different aspects, such as epoch [22], assessment type (automated,semi-automated, or manual) [18], analysis approach (static, dynamic, or hybrid) [27,28], assessmentprocess triggering (student-centered, instructor-centered, or hybrid) [18], and purpose (competitions,quizzes, software testing, or non-specialized) [18]. Moreover, these literature reviews have analyzed thefeatures offered by automated assessment systems [18,23–27]. In this regard, it is worth pointing outthat, generally, these systems provide electronic submission, automated and often immediate feedback,automated grading, and statistics reporting. Pieterse [29] investigated the factors that contribute to thesuccessful application of automated assessment systems for programming assignments, concluding thatthese factors include the quality and clarity of the assignments, well-chosen test data, useful feedback,Sustainability 2020, 12, 7451 4 of 24the testing maturity of students, the possibility of performing unlimited submissions, and additionalsupport. In total, more than 100 automated assessment systems for programming assignments have beenreported in the literature. Among the most popular of these systems are Mooshak [30], DOMjudge [31],CourseMarker [9], BOSS [32], WebWork [33], and Automata [34]. Automated assessment systems forprogramming assignments help teachers to evaluate programs written by students, and provide themwith timely feedback [35]. One of the main reasons for introducing these systems in programmingcourses is their capacity to dramatically reduce teachers’ workloads. In this regard, Bai [36] shows thatthese systems allow teachers to save time and, at the same time, provide quicker feedback and delivermore assignments. It must also be mentioned that these systems require a more careful pedagogicaldesign of the student programming assignments on the part of the instructors [22,23,29,37], and thatthey can change how students approach these assignments [24].Automated assessment systems are usually classified as formative assessment tools, as thefeedback these tools provide usually consists of “information communicated to the learner with theintention to modify his or her thinking or behavior for the purpose of improving learning” [38],but sometimes, these tools can be considered as summative assessment tools if they only provide thegrades or percentages. Most of the time, the feedback provided is a configuration option that theinstructor has to provide when creating the assignments. Keuning et al. [26] conducted a systematicreview of automated assessment tools with a special focus on the feedback generated.Regarding the reliability and validity of these systems, they have been studied comparingmanually graded assignments with the system generated grades. Gaudencio, Dantas and Guerrero [39]reported that instructors who manually graded assignments tended to agree more often with thegrades calculated by an automated assessment system (75–97%) than with the ones provided by otherinstructors (62–95%). Moreover, a number of authors [9,40,41] have also reported on the gradingconsistency rates between automated systems and instructors, highlighting the reliability and lack ofsubjectivity that these systems present.In addition to the benefits that automated assessment systems can provide for teachers, these systemscan yield important benefits for students as well. Several works have evaluated the use of automatedassessment systems for student programming assignments in the context of programming courses,concluding that this kind of system is capable of producing positive effects on both students’perceptions [37,42–48] and performance [47–49]. However, experiences have also been reported in whichthe use of these systems did not produce significant positive results. For example, Rubio-Sánchez et al. [50]evaluated Mooshak, and concluded that the generated feedback needs to be richer in order to improvestudent acceptance, and that there was no evidence to claim that its use helped to decrease the dropout rate.In this regard, it should be taken into account that the effectiveness of an automated assessment system ina programming course ultimately relies on how it is used and integrated into the course [29]. Therefore,it becomes clear that the teaching methodology adopted in a programming course plays a crucial role inthe successful application of an automated assessment system. Evidence of this fact is that the sameautomated assessment system can succeed in reducing the dropout rates in a programming course [49],but fail to do so in another programming course with a different teaching methodology [50]. Althoughexamining the effect of combining automated assessment systems for programming assignments withdifferent teaching methodologies would be a valuable contribution, no research work has addressedthis research issue yet. Moreover, no study has yet examined the use of these systems for providingstudent assessment methods in programming courses following the teaching methodologies adopted inresponse to the COVID-19 pandemic.3. Description of the Case StudyThe case study presented in this article follows the research design described by Yin [51]. Yin definesa case study as “an empirical inquiry that investigates a contemporary phenomenon (the ‘case’) indepth and within its real-world context”. The theoretical framework of this study was introduced inthe first two sections of this article. It is based on educational assessment, specifically learning-orientedSustainability 2020, 12, 7451 5 of 24assessment (either summative or formative), and the automated assessment tools. The research gapidentified was also stated as how to successfully address the adaptation of specific practical courseswith high student–teacher ratios to the new requirements that the COVID-19 pandemic imposes.The real-world context in this case study is a programming course in a higher education institution inSpain that had to be urgently adapted to a fully online format due to the pandemic. The contributionof this case study is two-fold: first, it illustrates how the evaluation of a programming course in ahigher education institution can be successfully transformed to the new requirements by means of astudent-centered automated assessment tool, and second, it analyzes students’ perceptions of the useof such a tool in the evaluation of the course and their interactions with it.The rest of this section describes the course context as it was before the pandemic, how it wastransformed due to the new requirements, and the automated student-centered assessment tool thatwas used for this transformation. Lastly, the instruments used to collect students’ interactions with theautomated assessment tool and the students’ opinions on its use are detailed.3.1. Course Context (Pre-COVID-19)The programming course analyzed in this study is part of the Bachelor’s Degree in TelecommunicationsEngineering at UPM (Universidad Politécnica de Madrid). It is a third-year course that accounts for4.5 ECTS (European Credit Transfer System) credits, which is equivalent to 115–135 h of student work.In this course, the students learn the basics of web development, including HTML (Hypertext MarkupLanguage), CSS (Cascading Style Sheets), JavaScript, and more advanced technologies, such as node.js,express, and SQL (Structured Query Language). The course follows the AMMIL (Active MeaningfulMicro Inductive Learning) Methodology [52]; hence, the complete program is recorded in videoas micro-lessons.There are nine programming assignments delivered to students through the Moodle platformused in the course. Students are required to submit all of the assignments, which account for 30% of thefinal grade. The remaining 70% corresponds to two written exams: one midterm and one final exam,each accounting for 35% of the final grade. In order to pass the course, students needed to achieve agrade greater than or equal to 4 out of 10 in the exams, and obtain a grade of at least 5 out of 10 in thecourse final grade.There were two main reasons for introducing an automated student-centered assessment tool inthis course. In the first place, the course has a high student–teacher ratio because it is a core course,and all the students pursuing the Telecommunications Engineering degree have to pass it (there are312 students and seven teachers in total). In the second place, it is very common for students not tohave high programming skills at this stage of their studies, finding it more difficult than other courses.This latter fact has three implications: in the first place, the number of programming assignmentsshould be high (multiple and frequent small assignments instead of one final project); in the secondplace, students need as much detailed feedback and help as possible, which can be proven to be anadditional motivating factor [53] and can improve the students’ learning experience [37]. Finally,this feedback should be provided frequently and timely, even immediately (after each execution of thetool) if possible, allowing the student to continue working on the assignments with the knowledgeof how well he/she is performing and learn from their mistakes, this fact has been widely studied toimprove student performance and promote learning [54–56]. Summing up, a high student–teacherratio, combined with a high number of assignments and the suitability of immediate feedback, make theuse of an automated student-centered assessment tool perfect for this kind of course. Due to the factthat the feedback provided is a central piece of an automated assessment tool, three teachers were incharge of designing it after reviewing the literature, studying successful case studies and inspectingsimilar tools.Although this programming course has a Moodle platform for distributing the didactic materialsand relies on an automated student-centered assessment tool for the programming assignments, it alsohas a high face-to-face load. The teachers dedicate half of a 50 min session to explain in detail eachSustainability 2020, 12, 7451 6 of 24assignment, and one or two extra sessions to solve each of them step by step. In addition, face-to-facetutorials are frequent in this subject, in order to solve doubts and guide students in their work, as theydo not have high programming skills.3.2. Assessment TransformationThe midterm exam was scheduled for the 16th of March, and the disruption of face-to-faceactivities due to the COVID-19 pandemic took place on the 11th of March. Hence, following theguidelines provided by the head of studies, the midterm exam was canceled and, consequently, the finalexam was worth 70% of the grade. In order to pass the course, the students needed to pass the exam(by achieving a grade greater than or equal to 5 out of 10) and obtain a grade of at least 5 out of 10 in thecourse’s final grade, which was calculated as the weighted sum of the exam and assignments’ scores.The COVID-19 pandemic not only led to the cancelation of the midterm exam, but also to the needto transform the face-to-face lessons, the programming assignments and the final exam into a fullyonline format. Fortunately, as the complete program was recorded in video as micro-lessons, only somelessons and tutorials had to be conducted via videoconferencing tools. The programming assignmentswere planned to make use of the automated student-centered assessment tool from the beginningof the course, as explained in the previous section, but as a complementary tool with the supportof the face-to-face sessions (in which the assignments were solved step-by-step) and tutorials. Now,in order to meet the new requirements, and to play a central role in the course assignments, althoughthe automated assessment tool was already online-based, the last four programming assignmentsthat were left when the COVID-19 situation emerged had to be adapted, explaining the problemstatements in a more detailed way, and giving more elaborate feedback to students. This tool also hadto be complemented with the LMS forums and videoconference sessions to substitute for face-to-faceassistance. To sum things up, the automated student-centered assessment tool went from being acomplementary tool in the programming assignments to being the primary tool.On the other hand, the final exam was a major challenge, since this programming course is a verypractical one, in which the skills and learning outcomes established are impossible to measure with justan online test. Additionally, students’ perceptions of the assessment characteristics play a positive rolein their learning, resulting in deeper learning and improved learning outcomes. Two characteristicshave special influence: authenticity [57,58] and feedback [59,60]. Hence, the teaching staff of the coursedecided to divide the final exam into two parts: the first part—intended to measure the more theoreticalconcepts—was a multiple-choice test with 30 questions to be completed in 30 min, whereas the secondpart of the exam—intended to measure practical programming skills—was a 40-min programmingassignment that made use of the same automated assessment tool used in the assignments of the course.The exam had multiple slightly different variants of a similar level of difficulty. Upon launchingthe assessment tool for the first time, each user was assigned a version of the exam based on theircredentials. Since this procedure slightly differed from previous exercises, a mock exam was publisheddays before the actual exam in order to help the students understand how the exam was going to beperformed, and to help them practice generating the problem statement and turning it in. Both parts ofthe final exam accounted for 50% of the final exam grade, and each part had to be passed with at least4 out of 10 points.To guarantee the validity of the assessment method, all of the changes planned were included inthe subject teaching guides as addenda, following the strategy made public by ANECA [14], and werenotified to the students well in advance of the final exam.3.3. Description of the Assessment ToolThis section describes the student-centered automated assessment tool used both in the programmingassignments of the course and the second part of the final exam. There is a wide set of assessmenttools for programming assignments available [18,22,24]. However, these tools are rarely used beyondthe institutions in which they were created, because they are difficult to adapt and extend to fit newSustainability 2020, 12, 7451 7 of 24courses [18,61]. The case study presented in this paper is another example of the latter, as the coursestaff had recently developed the automated student-centered assessment tool for the programmingassignments, so it was easily extended and adapted to support the new requirements imposed by theCOVID-19 pandemic and the interruption of the face-to-face activities.The automated student-centered assessment tool used in this case study is called autoCOREctor,and it consists of a client and a server. The tool was designed to be easily integrated with LearningManagement Systems (LMSs) and Version Control Platforms (VCPs). The LMS was used to storestudents’ submissions and grades, which is essential for legal purposes, because learning evidencemust be stored and kept in the course LMS. The VCP was employed to facilitate the management ofthe assignments’ problem statements and check the test suites’ integrity to avoid cheating. The mainfeatures of autoCOREctor are the following:• Student-centered: students start the assessment process. They can view the assignmentspecification, and develop and submit a solution for it. For each submission, the tool assessesthe student’s solution, considering the assessment parameters provided by the instructor.After the assessment process is completed, both the instructor and the students have accessto the assessment results.• Tests are run locally on the student’s computer: students can work offline, and the score theyobtain locally is later uploaded to the LMS once they decide to submit it, turning it into theirassignment grade.• Unlimited number of local test runs: students can run autoCOREctor as many times as they wish,getting immediate feedback about their work, as well as their current grade.• Unlimited number of submissions: students can submit their score and the solution of theassignment to the LMS as many times as they want in the timeframe determined by the instructor.• Penalty for late submissions: instructors can configure a grace period in which students cansubmit their assignments with a penalty in their grade.• The autoCOREctor client is distributed as an NPM (Node Package Manager) package uploadedto the official NPM repository. This feature facilitates the tool installation and update in casea new version is released by the teaching staff. The autoCOREctor client is therefore a CLI(Command Line Interface) tool that can work on any operating system—Linux, Windows orMac—indifferently. Also, if any student has any problem with the installation, the client can beused from free online services that include a terminal, such as Glitch [62].• Thorough documentation for instructors and students: documentation is made available tostudents in the LMS, and the list of available options can be listed through a shell command of theautoCOREctor client.• Integrity check of the test suite: autoCOREctor checks that the test suite that is run to obtain thefeedback and grade is the one that the instructor developed.• Learning analytics: the autoCOREctor client logs all of the interactions that students have with itas well as the evolution of the resulting grades, uploading all this information to the autoCOREctorserver for further analysis whenever students make a submission.• Learning analytics dashboards: the autoCOREctor server generates interactive graphs of theevolution of grades per assignment and student.• Generation of a scaffold for assignments’ problem statements and a basic test suite with examplesto facilitate instructors in its use in different contexts.• Possibility to define test cases and to specify the generated feedback, as well as the way in whichthe grades of the assignments are calculated in multiple programming languages.• Secure communications: all of the information that autoCOREctor sends uses secure sockets.Additionally, all of the assignments, logs, and grades are encrypted before being sent.The following paragraphs explain the generic design of the autoCOREctor tool and then thespecific deployment we have performed for this case study.Sustainability 2020, 12, 7451 8 of 24AutoCOREctor consists of a web server with a connector to send requests to the LMS, a connectorto communicate with the VCP, and a client program that is executed on the students’ computer.Figure 1 shows the architecture of the system from the point of view of a teacher. The details of thefunctionalities of each component and the interaction flow between them are as follows:1. The teacher creates an empty repository for the assignment in the Version Control Platform.2. The teacher creates and configures the assignment for the corresponding course in the LMS.3. The third step is to obtain an assignment template to implement the test suite and the assignmentproblem statement. For this, the teacher accesses the autoCOREctor web server interface, whereinformation about courses and assignments created by the teacher is provided. This informationis retrieved by the LMS Connector which, depending on the LMS and the authentication andauthorization mechanisms available, gathers it directly by the LMS API (Application ProgrammingInterface) or using a delegated authorization mechanism such as OAuth or OpenID Connect(Step 3a). Once this is achieved, the teacher can link the repository created in Step 1 to thespecific assignment of the LMS (Step 2). This link is stored in the server database, so no furtherinteractions are needed with the LMS. As a result of this link, the server creates a template ofthe assignment containing the needed metadata of both the repository and the assignment, to beused later when the students submit their results. Finally, the teacher downloads this template.4. Using the downloaded template, the teacher creates the assignment’s problem statement and thetest suite that will be run during its development by the students. Depending on the characteristicsof each course, the tests will be made using a specific testing framework. The downloadedtemplate contains the software libraries needed to develop such tests.5. When the tests and the assignment problem statement are ready, the teacher uploads them to theVCP, making it available to students. If the teacher eventually fixes mistakes in the assignmentor wants to create new versions, he/she only has to upload the changes to the VCP and, if theassignment is open and the students have already started working on it, notify the students toupdate the assignment with one Version Control System command.Sustainability 2020, 12, x FOR PEER REVIEW 8 of 242. The teacher creates and configures the assignment for the corresponding course in the LMS.3. The third step is to obtain an assignment template to implement the test suite and the assignmentproblem statement. For this, the teacher accesses the autoCOREctor web server interface, whereinformation about courses and assignments created by the teacher is provided. This informationis retrieved by the LMS Connector which, depending on the LMS and the authentication andauthorization mechanisms available, gathers it directly by the LMS API (ApplicationProgramming Interface) or using a delegated authorization mechanism such as OAuth orOpenID Connect (Step 3a). Once this is achieved, the teacher can link the repository created inStep 1 to the specific assignment of the LMS (Step 2). This link is stored in the server database,so no further interactions are needed with the LMS. As a result of this link, the server creates atemplate of the assignment containing the needed metadata of both the repository and theassignment, to be used later when the students submit their results. Finally, the teacherdownloads this template.4. Using the downloaded template, the teacher creates the assignment’s problem statement andthe test suite that will be run during its development by the students. Depending on thecharacteristics of each course, the tests will be made using a specific testing framework. Thedownloaded template contains the software libraries needed to develop such tests.5. When the tests and the assignment problem statement are ready, the teacher uploads them tothe VCP, making it available to students. If the teacher eventually fixes mistakes in theassignment or wants to create new versions, he/she only has to upload the changes to the VCPand, if the assignment is open and the students have already started working on it, notify thestudents to update the assignment with one Version Control System command.Figure 1. Automated assessment tool architecture from the point of view of the teacher.On the other hand, Figure 2 shows the architecture from the point of view of the students. Thedetails of the functionalities of each component and the interaction flow between them are as follows:1. Students download an assignment from the Version Control Platform after obtaining its specificrepository link from the LMS assignment. It contains the problem statement of the assignment,the template with the source code to develop it, and the test suite that the teacher has defined.2. Before starting to develop the assignment’s solution, the students have to download anFigure 1. Automated assessment tool architecture from the point of view of the teacher.On the other hand, Figure 2 shows the architecture from the point of view of the students. The detailsof the functionalities of each component and the interaction flow between them are as follows:Sustainability 2020, 12, 7451 9 of 241. Students download an assignment from the Version Control Platform after obtaining its specificrepository link from the LMS assignment. It contains the problem statement of the assignment,the template with the source code to develop it, and the test suite that the teacher has defined. 2.Before starting to develop the assignment’s solution, the students have to download anauthentication token from the LMS. This token will be used by the LMS Connector in Step 5 tosubmit the students’ results to the LMS. 3. While developing the assignment, students can use the autoCOREctor client to run the testsdefined by the teacher as many times as they want, receiving immediate feedback and a scoreeach time. Every time the assessment tool client is executed, it checks whether the assignment hasbeen updated in the Version Control Platform (Step 3a). To do this, the client has to be adapted touse the specific API of the Version Control Platform. Moreover, the client saves a history file withthe evolution of the scores achieved by the student.4. Students use this same client to submit their results to the autoCOREctor server. During thisprocess, the server checks that the version of the tests the student is uploading is updated with thelatest version at the VCP (Step 4a). Again, the VCP Connector has to be adapted to the API of theVersion Control Platform. Moreover, in order to ensure the integrity and authenticity of the results,the autoCOREctor client encrypts and signs the information containing such results with a sharedkey with the server. Thus, when receiving the information, the server is in charge of decryptingthe data and validating the signature. Furthermore, the server checks that the tests defined by theteacher have not been modified by comparing a hash code sent by the client together with thescore with the same hash created from the version available in the VCP. This submission can beperformed as many times as the student wants while the assignment is open, keeping the lastresult uploaded.5. Finally, the server stores the file of the assignment solution, the results, and the history in itsdatabase and sends the score to the LMS using the LMS Connector. If a penalty for late submissionshas been configured by the teacher in the assignment, the score is adapted accordingly in case itis necessary.Sustainability 2020, 12, x FOR PEER REVIEW 9 of 243. While developing the assignment, students can use the autoCOREctor client to run the testsdefined by the teacher as many times as they want, receiving immediate feedback and a scoreeach time. Every time the assessment tool client is executed, it checks whether the assignmenthas been updated in the Version Control Platform (Step 3a). To do this, the client has to beadapted to use the specific API of the Version Control Platform. Moreover, the client saves ahistory file with the evolution of the scores achieved by the student.4. Students use this same client to submit their results to the autoCOREctor server. During thisprocess, the server checks that the version of the tests the student is uploading is updated withthe latest version at the VCP (Step 4a). Again, the VCP Connector has to be adapted to the APIof the Version Control Platform. Moreover, in order to ensure the integrity and authenticity ofthe results, the autoCOREctor client encrypts and signs the information containing such resultswith a shared key with the server. Thus, when receiving the information, the server is in chargeof decrypting the data and validating the signature. Furthermore, the server checks that the testsdefined by the teacher have not been modified by comparing a hash code sent by the clienttogether with the score with the same hash created from the version available in the VCP. Thissubmission can be performed as many times as the student wants while the assignment is open,keeping the last result uploaded.5. Finally, the server stores the file of the assignment solution, the results, and the history in itsdatabase and sends the score to the LMS using the LMS Connector. If a penalty for latesubmissions has been configured by the teacher in the assignment, the score is adaptedaccordingly in case it is necessary.Figure 2. Automated assessment tool architecture from the point of view of the students.Some important aspects should be commented on, relating to the reliability and validity of theassessment process. Regarding reliability, autoCOREctor performs the integrity check of the testsuites that it uses, so all of the students are assessed with the same test suite. It also signs and encryptseverything that is sent to avoid cheating. With respect to validity, before starting its use in the course,it was validated among the members of the course staff; some of them completed the assignmentsand some corrected the solutions, verifying that the grades were very similar. The same procedurewas followed with random submissions of the final examFigure 2. Automated assessment tool architecture from the point of view of the students.Some important aspects should be commented on, relating to the reliability and validity of theassessment process. Regarding reliability, autoCOREctor performs the integrity check of the test suitesSustainability 2020, 12, 7451 10 of 24that it uses, so all of the students are assessed with the same test suite. It also signs and encryptseverything that is sent to avoid cheating. With respect to validity, before starting its use in the course,it was validated among the members of the course staff; some of them completed the assignmentsand some corrected the solutions, verifying that the grades were very similar. The same procedurewas followed with random submissions of the final exam, verifying again that the scores reported byautoCOREctor were similar to those given by the course teachers. This verification was very informal,and since further research is needed on the validity of the grades generated by automated assessmentsystems, it constitutes an interesting work that will be addressed in the near future.In our case study, we used GitHub [63] as the VCP and Moodle as the LMS. Thus, teachers createnew GitHub repositories for each assignment and recieve a URL that is registered in the autoCOREctorserver linked to the assignment created in Moodle. In order to retrieve the courses and assignmentscreated by a teacher, we use the Moodle API [64], authenticating teachers by means of their usernameand password. As the course in which we evaluated the platform is about Node.js technology, the toolcreates assignment templates with a package.json file that contains metadata about the LMS courseand assignment.In the case of this course, the tests are written by the teacher using the Mocha [65] and Zombie [66]frameworks, and the Chai library [67]. The students also need to download an authentication tokenfrom Moodle that is later used by the server to upload their scores. This token is included togetherwith the students’ email, the score, the test version, the hash for checking the integrity of the tests,and the signature in a JSON (JavaScript Object Notation) file created by the client.3.4. Data Collection InstrumentsThree instruments were used in this study in order to evaluate the students’ perceptions and theirinteractions with autoCOREctor: (1) a survey to collect students’ opinions on the use of autoCOREctorin the assignments, (2) another survey to collect students’ opinions on the use of this tool in the finalexam, and (3) the tool itself, which automatically records data on students’ interactions with it, in orderto obtain information on students’ usage of the tool.In order to collect students’ perceptions on using autoCOREctor for the programming assignmentsin the course, a survey was conducted after the termination of the last assignment of the course.This survey was designed by the authors of this article, and was validated by three faculty members.It included some initial demographic questions, a set of closed-ended questions addressing students’general opinion and acceptance of the tool, and a list of statements with which they needed toagree or disagree using a 5-point Likert scale. These questions were aimed at assessing students’attitudes towards the use of the autoCOREctor, students’ thoughts on its usability and usefulness as anassessment method, their perceptions on the feedback and grades received, their opinions on the mainfeatures of the tool, and whether they prefer it over other assessment methods. At the end of the survey,there was a space in which the students could leave suggestions, complaints, and other comments.Moreover, with the aim of collecting students’ perceptions toward using autoCOREctor for thefinal exam of the course, a survey was conducted after the exam. This survey was also designedby the authors of this article, and was validated by three faculty members. It included some initialdemographic questions, a set of closed-ended questions addressing students’ general opinions andacceptance of the activity, and a list of statements with which they needed to agree or disagree using a5-point Likert scale. These questions were similar to the ones posed in the first survey, but they wereaimed at assessing students’ attitudes towards the use of the automated assessment tool in the specificcontext of the exam. Once again, at the end of the survey, there was a space in which the studentscould leave suggestions, complaints, and other comments.Lastly, autoCOREctor automatically recorded data on student interactions with it both for theassignments and the final exam. Specifically, the following data were collected for each of the nineassignments and for the exam: the number of times each student ran the tests locally, the number ofSustainability 2020, 12, 7451 11 of 24times each student submitted their solution to the LMS, and the score they obtained in each executionof the test suite.3.5. Data AnalysisThe data gathered by the three instruments mentioned earlier were processed using Excel andPython, along with the Numpy, Scipy and Pandas software packages. A descriptive quantitativeanalysis was performed in the resulting datasets.The survey data were analyzed using mean (M), standard deviation (SD), and median absolutedeviation (MAD), as well as the median (MED) value, which is more representative of the centraltendency in scaled non-normal distributions such as those of Likert-type variables. In addition,Cronbach’s alpha (α) was calculated in order to assess the internal consistency of both surveys,confirming their reliability at α > 0.9 for both of them. Furthermore, in view of the reduced number ofresponses to the open-ended questions, only an informal qualitative analysis was performed on these.The usage data logs collected by the automated assessment tool were pre-processed in order toremove spurious logs, such as the ones generated by the faculty staff when testing the assignments.Then, the data were aggregated by assignment, and by whether the log corresponded to a localuse of the tool or a submission. The resulting dataset includes the number of average executions(local and submissions) per student and assignment, as well as the total executions per assignment.This information allowed us to analyze the evolution of the usage pattern of autoCOREctor throughoutthe course, as well as the workload to which autoCOREctor is subjected.4. Results and Discussion4.1. Results of the Student Survey on the Use of the Automated Assessment Tool on Programming AssignmentsTable 1 shows the results of the student survey conducted after carrying out the programmingassignments of the course, including, for each question, the mean (M), median (MED), standard deviation(SD), and median absolute deviation (MAD), along with the number of answers (N). The survey wascompleted by 85 students (65 men and 20 women), with a median age of 21 (MAD = 1.0). Figure 3shows the distribution of the students’ responses.The results of the survey show that the students had a positive overall opinion of the use of theautoCOREctor automated assessment tool in the programming assignments of the course (MED = 4.0,MAD = 1.0). In terms of usability, the students strongly believe autoCOREctor was very easy to install(MED = 5.0, MAD = 0.0) and to use (MED = 5.0, MAD = 0.0). These aspects are of special relevance,since many students find programming difficult to learn, as evidenced by the high failure rates thatprogramming courses usually have [68]; thus, using the assessment tool should not pose yet anotherdifficulty for them.Students’ opinions diverged when asked whether they agreed with the survey statements that thefeedback provided by autoCOREctor was useful (MED = 3.0, MAD = 1.0), that it was easy to understand(MED = 4.0, MAD = 1.0), and that it helped them to improve their assignments (MED = 3.0, MAD = 1.0).One possible reason why not all of the students found the feedback provided by autoCOREctor usefulis probably that it merely pointed out the errors found in their solutions, but did not tell them howto fix them, which students would have found to be of more utility, despite its being detrimentalto their learning. However, the most likely reason why they did not find the feedback provided bythe tool extremely useful is that the students often make syntax mistakes in their code, which canprevent autoCOREctor from functioning as expected and cause it to throw default error messages thatcan be somewhat cryptic, instead of displaying the ones provided by the teachers for each specificaspect of the assignment that is not working properly. In this regard, previous studies have alsofound that students have difficulties understanding the feedback provided by automated assessmentsystems [45], and some of them concluded that the quality of the feedback needs to be improvedin order to increase student acceptance [44,50]. Nonetheless, in this study, most students were verySustainability 2020, 12, 7451 12 of 24confident that they would rather receive the feedback provided by autoCOREctor than no feedbackat all (MED = 5.0, MAD = 0.0), and that if using the tool had been optional, the majority of themwould have chosen to use it without a doubt (MED = 5.0, MAD = 0.0), providing more evidence that,overall, autoCOREctor was useful for students. In fact, most of them agreed that autoCOREctor helpedthem discover errors in their assignment that they did not know they had (MED = 4.0, MAD = 1.0),which, without autoCOREctor, would only have been possible through manual teacher assessmentand, consequently, infeasible in such a crowded course.Table 1. Results of the student survey on the use of autoCOREctor on programming assignments.Question N M MED SD MADQ1. What is your general opinion on autoCOREctor on a scale of1 (Very bad) to 5 (Very good)? 85 3.9 4 0.9 1.0State Your Level of Agreement with the Following Statements on autoCOREctor on a Scale of1 (Strongly Disagree) to 5 (Strongly Agree)Q2. autoCOREctor was easy to install 85 4.5 5.0 0.9 0.0Q3. autoCOREctor was easy to use 85 4.3 5.0 1.0 0.0Q4. The feedback provided by autoCOREctor was useful 85 3.3 3.0 1.1 1.0Q5. The feedback provided by autoCOREctor was easy to understand 85 3.3 4.0 1.1 1.0Q6. The feedback provided by autoCOREctor helped me improvemy assignments 85 3.5 3.0 1.1 1.0Q7. I’d rather receive the feedback provided by autoCOREctor than nofeedback whatsoever 85 4.6 5.0 0.9 0.0Q8. If using autoCOREctor had been optional, I would have chosen touse it without a doubt 85 4.6 5.0 0.9 0.0Q9. autoCOREctor has helped me discover errors in my assignmentsthat I did not know I had 85 3.8 4.0 1.1 1.0Q10. Having to pass the autoCOREctor tests for each assignment hasmade me spend more time in the assignments than I would have if Ihad not used it79 2.9 3.0 1.4 1.0Q11. autoCOREctor has increased my motivation to work onthe assignments 81 3.5 4.0 1.0 1.0Q12. Being able to run the autoCOREctor tests repeated times with nopenalty and obtaining instant feedback has made me invest more timein the course assignments83 4.3 5.0 1.0 0.0Q13 The grades provided by autoCOREctor were fair 85 4.3 5.0 1.0 0.0Q14. Thanks to autoCOREctor I think I got a better grade in theassignments that I would have without it 85 4.6 5.0 1.0 0.0Q15. In general, using autoCOREctor has improved myprogramming knowledge 84 3.9 4.0 1.1 1.0Q16. I think by using autoCOREctor I have improved my programmingknowledge more than I would have with a manual assessment 84 4.0 4.0 1.1 1.0Q17. In general, I believe the use of automated assessment tools such asautoCOREctor improves the evaluation process of the courseassignments when compared to the classical manual procedure85 4.6 5.0 0.9 0.0Q18. I would like to have tools like autoCOREctor in other courses 84 4.7 5.0 0.8 0.0Indicate How Useful you Find Each of the Following Features of autoCOREctor on a Scale of 1 (Useless) to 5(Very Useful)Q19. It allows to run the tests an unlimited number of times 84 4.9 5.0 0.5 0.0Q20. It has a command to directly upload the assignment to Moodle 84 4.7 5.0 0.8 0.0Q21. It allows to run the tests locally on the student’s computer 83 4.6 5.0 0.8 0.0Q22. It provides instant feedback each time the tests are run 84 4.4 5.0 1.0 0.0Q23. It provides documentation to learn how to use it and the optionsit provides 77 3.9 4.0 1.0 1.0On another note, the students’ opinions diverged regarding whether having to pass the testssuites provided by autoCOREctor for each assignment made them spend more time working on themthan they would have without the tool (MED = 3.0, MAD = 1.0). However, they strongly agreed withthe fact that being able to run the tests repeatedly and obtaining instant feedback made them investSustainability 2020, 12, 7451 13 of 24more time (MED = 5.0, MAD = 0.0). On the one hand, since students know their grade at every step ofthe way while working on an assignment, they can see how more work translates into a higher gradein real-time, encouraging them to keep working and dedicating time to the assignment, and to strivefor the best score. In this regard, the students somewhat agreed that autoCOREctor has increased theirmotivation to work on the assignments (MED = 4.0, MAD = 1.0). On the other hand, once they reachthe maximum score, they stop working on the assignments and turn them in, whereas, without theautomated assessment tool, they keep testing each assignment manually until they are sure it worksproperly, which takes considerably more time. Hence, since the feedback provided by the tool helpsthem to identify mistakes, they do not spend as much time testing. Despite the fact that this is generallyregarded as a negative aspect of automated assessment tools, the programming course analyzed in thisstudy is focused on implementation and does not cover software testing, so the students do not have theknowledge required to properly develop testing suites for their code anyway. Thus, besides manuallytesting their solutions, autoCOREctor is the only resource they have to check if their code is correct.In addition, the feedback provided by autoCOREctor does not reveal how to fix mistakes, but insteadmerely points them out, so they still need to figure out how to fix them by themselves, which also takesconsiderable time. It should be noted that, even though the students do not know how to performsoftware tests, they are taught how to debug their code using standard development tools so thatthey can track down the errors identified by autoCOREctor. In sum, although is not clear if usingautoCOREctor requires students to invest more or less time in the assignments, the results suggestthat they do not spend as much time manually looking for errors as they would without the tool but,thanks to the instantaneous feedback, they spend more time working on the solution of the assignmentitself and debugging the errors pointed out by the tool.Moreover, most of the students strongly believed that the grades provided by autoCOREctor werefair (MED = 5.0, MAD = 0.0). They also very strongly agreed with the statement that—thanks to theautomated assessment tool—they got a better grade in the assignments than they would have withoutit (MED = 5.0, MAD = 0.0), which was an expected outcome, since the instant feedback and unlimitednumber of attempts allowed them to progressively improve their grades until they were content.On the one hand, since—by using autoCOREctor—they spent more time working on the assignmentsand obtained a better grade than they would have without the tool, it is not surprising that they foundthe grades provided by the tool fair. On the other hand, it could be argued that autoCOREctor isa little too sensitive to typing errors and mishaps since, with just a slight syntax error in the code,the tool would provide a grade of 0, even if all of the features were correctly implemented. However,this limitation did not impact the students’ opinions in this case study. Hence, it is likely that thestudents’ positive perceptions on the grades provided by autoCOREctor were a result of the unlimitednumber of attempts; thus, their opinions apply not to the score calculated by autoCOREctor each timethey ran the tests, but rather to the grade they ultimately obtained for each assignment in the course,thanks to the unlimited number of attempts and the instant feedback that allowed them to fix the errorsand achieve the best possible score. Consequently, the authors believe that, if the number of attemptshas been limited, students’ perceptions on grading fairness would have been much more negative.With regard to self-efficacy (i.e., students’ perceived skills [69]), students reported that theyhave improved their programming knowledge by using autoCOREctor (MED = 4.0, MAD = 1.0),even more than they would have with manual assessment (MED = 4.0, MAD = 1.0). These were theexpected outcomes, since autoCOREctor allowed them to recieve feedback on their actions right away,helping them learn from their mistakes along the way. In this regard, it is worth mentioning that thepiece of feedback that the students receive from autoCOREctor is not only a number, which is usuallythe case in manual assessment in large-enrollment courses, but is rather a message that tells them whatis wrong or missing for each part of the assignment, inviting them to fix it. Moreover, the studentsstrongly believe that the use of automated assessment tools, such as autoCOREctor, improves theevaluation process of the course assignments when compared to a manual procedure consisting inmanually submitting the code to the LMS and waiting for the teacher to correct it after the deadlineSustainability 2020, 12, 7451 14 of 24 (MED = 5.0, MAD = 0.0), which is the usual procedure in most courses. Thus, the fact that the feedback provided by autoCOREctor is instantaneous and actually feed-forward (i.e., feedback that lets studentsact upon it, giving them a chance to improve their solutions) is regarded highly by the students, since itallowed them to learn from their mistakes, as opposed to traditional feedback, which is commonly not provided in a timely manner, as students usually receive it when they have already forgotten about how they reached their solution and they are often not allowed to improve their work on thebasis of the feedback received. In essence, it is safe to conclude that the students are satisfied with theautomated assessment tool in this course, and that they would very much like to have similar tools inother courses (MED = 5.0, MAD = 0.0).Sustainability 2020, 12, x FOR PEER REVIEW 12 of 24Q17. In general, I believe the use of automated assessment tools such asautoCOREctor improves the evaluation process of the course assignmentswhen compared to the classical manual procedure85 4.6 5.0 0.9 0.0Q18. I would like to have tools like autoCOREctor in other courses 84 4.7 5.0 0.8 0.0Indicate How Useful you Find Each of the Following Features of autoCOREctor on a Scale of 1 (Useless)to 5 (Very Useful)Q19. It allows to run the tests an unlimited number of times 84 4.9 5.0 0.5 0.0Q20. It has a command to directly upload the assignment to Moodle 84 4.7 5.0 0.8 0.0Q21. It allows to run the tests locally on the student’s computer 83 4.6 5.0 0.8 0.0Q22. It provides instant feedback each time the tests are run 84 4.4 5.0 1.0 0.0Q23. It provides documentation to learn how to use it and the options itprovides 77 3.9 4.0 1.0 1.0Figure 3. Distribution of responses to the student survey on the use of autoCOREctor onprogramming assignments.Figure 3. Distribution of responses to the student survey on the use of autoCOREctor onprogramming assignments.Sustainability 2020, 12, 7451 15 of 24When asked to rate the different features of autoCOREctor, the students showed a very strongpreference for the ability to run the tests an unlimited number of times (MED = 5.0, MAD = 0.0),which came as no surprise since, each time the tests were executed, the tool provided them with theirscore and a piece of instantaneous custom feedback, as mentioned earlier. The second most highlyrated aspect was the command to directly upload the assignment to the LMS (MED = 5.0, MAD = 0.0).Since this command allowed the students to submit their assignments automatically, without directlyinteracting with the LMS, it was very convenient for them, and prevented them from making mistakeswhen packaging and uploading the assignment. The students highly valued being able to run thetests locally on their computer (MED = 5.0, MAD = 0.0), since not having to upload the assignmentto a remote server each time they want to receive feedback saves a lot of time and avoids errorsderived from running the code in different execution environments. Predictably, the students highlyappreciated the instant feedback received each time the tests were run, as discussed earlier (MED = 5.0,MAD = 0.0). Lastly, the students also found that the documentation provided to help them install anduse autoCOREctor was useful (MED = 4.0, MAD = 1.0). Since the tool is so easy to install and use,the documentation was not needed in most cases, so it comes as no surprise that some students mightnot have used it, or found it superfluous. As can be seen in Figure 3, overall, the students’ opinions onall of the features of autoCOREctor are very positive, and there are barely any negative responses forany of them.In the last field of the survey—reserved for complaints, suggestions and other comments—thestudents confirmed that autoCOREctor motivated them to keep trying to get a better score and madeworking on the assignment more fun. Moreover, they also stated that autoCOREctor was useful whenfinding the mistakes they made along the way. Some students complained about the quality of thefeedback received, saying it was too generic. It should be noted that this feedback depends on thespecific test suite that is being executed, and is not a limitation of the automated assessment tool itself.Moreover, giving feedback that is too specific, telling students how to fix their mistakes, would preventthem from reaching the solution by themselves. As mentioned, although the feedback provided byautoCOREctor informs students of what is wrong or missing in their assignments, they still need tofind which part of their code is erroneous or incomplete, and figure out how to fix it. This is indeed acrucial part of learning programming since it promotes self-assessment, making students develop thehighest level of Bloom’s taxonomy, which deals with evaluation [70].4.2. Results of the Student Survey on the Use of the Automated Assessment Tool in the Final ExamThe results of the student survey conducted after using autoCOREctor in the final exam of the courseare shown in Table 2, including, for each question, the mean (M), median (MED), standard deviation(SD), and median absolute deviation (MAD), along with the number of answers (N). The survey wascompleted by 45 students (35 men and 10 women), with a median age of 21 (MAD = 1.0). Figure 4shows the distribution of the students’ responses.According to the students, autoCOREctor was easy to use in the exam (MED = 4.0, MAD = 1.0).In fact, one of the main reasons that encouraged the faculty staff to use autoCOREctor in the examwas precisely that students were already accustomed to this tool, and thus incorporating it into theexam would not bring any additional difficulty to those students who spent time using autoCOREctorthroughout the course. The main difference between the use of autoCOREctor in the assignments andin the exam was that, in the exam, the tool provided a different problem statement for each studentfrom among several options, making it more difficult for them to copy from one another. In thisregard, the students stated that generating their problem statement was very easy as well (MED = 5.0,MAD = 0.0).Regarding feedback, the students’ opinions were similar to the ones in the previous survey.It should be mentioned that autoCOREctor was not used in the final exam merely as an instrument tomeasure students’ competence, but rather it allowed the transformation of the exam (traditionally asummative assessment activity) into a learning-oriented assessment activity, since the students wereSustainability 2020, 12, 7451 16 of 24provided with the same type of feedback as in the programming assignments as well as an unlimitednumber of attempts to solve the exam. The only additional constraints of using autoCOREctor in theexam compared to its use in the assignments, apart from the different problem statements mentionedearlier, were that the students were given 40 min to complete their task (instead of several weeks),and that they could not speak to each other during the exam. When asked about the feedback providedby autoCOREctor during the exam, most of the students perceived it as useful (MED = 4.0, MAD = 1.0)and easy to understand (MED = 4.0, MAD = 1.0), and stated that it helped them to improve their solution(MED = 4.0, MAD = 1.0), although there were many students that thought otherwise. One possibleexplanation is that, since the exam was similar to the programming assignments, the students that hadreviewed for the exam were confident of their solution, and used autoCOREctor only to verify thatthey were not forgetting to implement any feature and that they did not have any typos that couldcause the autoCOREctor tests to fail. In fact, once again, using autoCOREctor allowed some of them todiscover errors in their exam that they did not know they had (MED = 4.0, MAD = 1.0). By findingout the errors and missing features in their solutions before submitting, the students were able to usethe knowledge they already had to amend those issues. This chance to incrementally improve theirsolutions is usually not given to students in final exams, at least definitely not in paper-based ones.Thus, using autoCOREctor allowed the students not to lose points because of a missing feature orsmall mistake, and thus they obtained a grade that was more representative of their actual knowledge,and that reflects their acquired skills better than a classic paper-based exam would. In fact, as far asgrades are concerned, the students thought the grades provided by autoCOREctor were fair (MED = 5.0,MAD = 0.0), and they believed that they obtained a somewhat better grade in the practical final examthan if they had not used the tool (MED = 4.0, MAD = 1.0), since they would not have had any sort offeedback during the exam, and would have had to rely solely on their manual tests. That is probablyone of the reasons why they stated that, if using autoCOREctor had been optional in this test, most ofthem would have chosen to use it anyway (MED = 5.0, MAD = 0.0).Table 2. Results of the student survey on the use of the automated assessment tool in the final exam.Question N M MED SD MADState Your Level of Agreement with the Following Statements on autoCOREctor on a Scale of1 (Strongly Disagree) to 5 (Strongly Agree)Q1. autoCOREctor was easy to use 45 4.0 4.0 0.9 1.0Q2. Generating my problem statement was easy 45 4.6 5.0 0.7 0.0Q3. The feedback provided by autoCOREctor was useful 43 3.8 4.0 1.1 1.0Q4. The feedback provided by autoCOREctor was easy to understand 43 3.6 4.0 1.1 1.0Q5. The feedback provided by autoCOREctor helped me improve mysolution to the exam 42 3.5 4.0 1.4 1.0Q6. autoCOREctor has helped me discover errors in my exam that I didnot know I had 43 3.5 4.0 1.4 1.0Q7. The grades provided by autoCOREctor were fair 45 4.2 5.0 1.3 0.0Q8. Thanks to autoCOREctor I got a better grade in the practical finalexam than if I had not used autoCOREctor 45 3.8 4.0 1.4 1.0Q9. If using autoCOREctor had been optional in this exam, I wouldhave chosen to use it 43 4.4 5.0 1.0 0.0Q10. I think autoCOREctor is an adequate tool for a practicalprogramming exam 45 4.1 4.0 1.0 1.0Q11. The final exam using autoCOREctor adequately assesses thecompetences attained during the course 45 3.8 4.0 1.1 1.0Q12. autoCOREctor assesses practical competences better thanquiz-based tests or open-ended questions 44 4.2 4.0 0.9 1.0Q13. I prefer to take a practical test using autoCOREctor than an oralexam via videoconference 45 4.6 5.0 0.8 0.0Q14. If there were a face-to-face practical final exam in the computerlaboratory, I would like it to be based in autoCOREctor 42 4.5 5.0 0.8 0.0Q15. I would like the practical exams of other courses to involve a toollike autoCOREctor 45 4.2 4.0 1.0 1.0Sustainability 2020, 12, 7451 17 of 24Sustainability 2020, 12, x FOR PEER REVIEW 16 of 24Q15. I would like the practical exams of other courses to involve a tool likeautoCOREctor 45 4.2 4.0 1.0 1.0Figure 4. Distribution of responses to the student survey on the use of autoCOREctor in the finalexam.According to the students, autoCOREctor was easy to use in the exam (MED = 4.0, MAD = 1.0).In fact, one of the main reasons that encouraged the faculty staff to use autoCOREctor in the examwas precisely that students were already accustomed to this tool, and thus incorporating it into theexam would not bring any additional difficulty to those students who spent time usingautoCOREctor throughout the course. The main difference between the use of autoCOREctor in theassignments and in the exam was that, in the exam, the tool provided a different problem statementfor each student from among several options, making it more difficult for them to copy from oneanother. In this regard, the students stated that generating their problem statement was very easy aswell (MED = 5.0, MAD = 0.0).Regarding feedback, the students’ opinions were similar to the ones in the previous survey. Itshould be mentioned that autoCOREctor was not used in the final exam merely as an instrument tomeasure students’ competence, but rather it allowed the transformation of the exam (traditionally asummative assessment activity) into a learning-oriented assessment activity, since the students wereprovided with the same type of feedback as in the programming assignments as well as an unlimitednumber of attempts to solve the exam. The only additional constraints of using autoCOREctor in theexam compared to its use in the assignments, apart from the different problem statements mentionedearlier, were that the students were given 40 min to complete their task (instead of several weeks),and that they could not speak to each other during the exam. When asked about the feedbackprovided by autoCOREctor during the exam, most of the students perceived it as useful (MED = 4.0,MAD = 1.0) and easy to understand (MED = 4.0, MAD = 1.0), and stated that it helped them to improvetheir solution (MED = 4.0, MAD = 1.0), although there were many students that thought otherwise.One possible explanation is that, since the exam was similar to the programming assignments, thestudents that had reviewed for the exam were confident of their solution, and used autoCOREctoronly to verify that they were not forgetting to implement any feature and that they did not have anytypos that could cause the autoCOREctor tests to fail. In fact, once again, using autoCOREctorFigure 4. Distribution of responses to the student survey on the use of autoCOREctor in the final exam.Overall, the results of the survey show that the students think autoCOREctor is an adequate toolfor a practical programming test (MED = 4.0, MAD = 1.0). The students somewhat agree with thestatement that the final exam powered by this tool adequately assessed the competences attainedduring the course (MED = 4.0, MAD = 1.0), at least more so than quiz-based tests or open-endedquestions (MED = 4.0, MAD = 1.0), which in itself is a great outcome, since it was the main aim ofusing autoCOREctor. Most of the students also strongly agreed that they prefer taking a practical testusing autoCOREctor, rather than an oral exam via videoconference (MED = 5.0, MAD = 0.0), probablybecause they do not undergo so much pressure as in the latter. Furthermore, the majority of studentsstrongly agreed that, in the event of a face-to-face practical final exam in the computer laboratory,they would like it to be based on autoCOREctor as well (MED = 5.0, MAD = 0.0). These resultshighlight that the assessment method used in the course is not only adequate for distance scenarios,but for face-to-face ones as well. This is of special relevance, since there is very little certainty as towhat the future has in store regarding going back to school or keeping the remote model, so adoptingflexible solutions is the key to being able to switch from one scenario to another overnight. Moreover,this approach is easily transferable to other disciplines. In fact, most of the students agreed that thepractical exams of other courses should involve a tool like autoCOREctor (MED = 4.0, MAD = 1.0).In the space reserved for comments, suggestions, and complaints, the students reiterated that theywere satisfied with autoCOREctor. However, some of them brought up the issues regarding feedbackthat were discussed in the previous subsection. In fact, one student said that the limited time availablein the exam prevented him from finding the errors pointed out by autoCOREctor. Thus, if feedbackwas important for students in programming assignments, in the context of a final exam, in which timeis limited, the importance of feedback is even greater, since it can make a huge difference in students’grades, and in their overall experience of using the tool during the exam. The amount of feedbackprovided should be carefully selected in order to let students know that something is wrong or missing,whilst also allowing them to realize their own mistakes and fix them.Sustainability 2020, 12, 7451 18 of 244.3. Usage Data Collected by the Automated Assessment ToolIn order to study students’ usage pattern of autoCOREctor, the data collected by the tool itself areprovided in Table 3. These data include, for each of the nine programming assignments, for the exam,and overall: the average number of local executions of autoCOREctor per student, the total number oflocal executions of autoCOREctor overall, the average number of submissions per student, and thetotal number of submissions overall.Table 3. Students’ usage data of the automated assessment tool.Assignment Average LocalExecutions per StudentTotal LocalTestsAverage Submissionsper StudentTotalSubmissionsAssignment 1 22.6 6605 1.1 331Assignment 2 10.2 3056 1.1 316Assignment 3 16.5 4623 1.2 329Assignment 4 10.9 3059 1.1 303Assignment 5 21.1 5982 1.3 361Assignment 6 11.0 2997 1.1 295Assignment 7 9.1 2711 1.2 334Assignment 8 7.3 2954 1.1 306Assignment 9 8.2 2092 1.1 285Overall Assignments 13.1 3787 1.1 318Final Exam 14.2 2947 2.2 441As can be extracted from the data, the number of local executions of the test suites varied greatlyamong the different assignments, since each one of them had a different level of difficulty and numberof features that students needed to develop. The first assignment was the one with more test executions,which is an expected result, since the students were still getting acquainted with autoCOREctor.When the COVID-19 pandemic struck, the students were working on Assignment 5. The data showthat the number of executions on this particular assignment increased compared to the precedingand subsequent ones, but since then the trend was downwards. In view of the great differencesamong the number of executions of the different assignments, this decline cannot be attributed tothe pandemic, since it might be due to other causes, such as the growing programming expertiseof the students, for instance. Overall, the number of local executions of the test suites adds up to3787 per assignment on average (13.1 per student on average), and 2947 (14.2 per student on average)in the final exam. These figures represent the number of times that the students’ assignments wereevaluated by autoCOREctor, which is by no means close to what it would be with manual assessment.In addition, the fact that the test suites run in the students’ computers makes the tool more scalable,since all of the computational load needed to run the tests is removed from the server, which onlyneeds to handle submissions.On another note, the average number of submissions per student and assignment is close to one.This is the number of times that the students turned in their assignments once they were satisfied withtheir grades. As can be seen, in the programming assignments, the students mostly waited until theygot their desired grade before submitting, whereas in the case of the final exam, this number doubled.The reason for this is that, during the exam, the students made a submission as soon as they got apassing grade in order to secure it and, after that, they kept working towards achieving a better score,and submitted their work again before the time was up. Overall, the data reveal that the studentsmade extensive use of autoCOREctor, and that the design of the tool made it possible to withstand thehigh demand of the students.5. Conclusions and Future WorkThis article presents a case study describing the transformation of the assessment method of aprogramming course in higher education into a fully online format during the COVID-19 pandemic,Sustainability 2020, 12, 7451 19 of 24by means of a student-centered automated assessment tool called autoCOREctor. This tool wasrecently developed by the teaching staff of the programming course used in this case study, so it waseasily extended and adapted to be used for both the programming assignments and the final examunder the new requirements that the COVID-19 pandemic imposed. The use of a student-centeredautomated assessment tool, not only for the assignments but also for the final exam, constitutes a novelcontribution of this article, and could help the teachers of related courses to take the same approachunder the same or similar circumstances. To evaluate the new assessment method, we studied students’interactions with the tool, as well as students’ perceptions, as measured with two different surveys:one for the programming assignments and one for the final exam. The results show that students’perceptions of the assessment tool were positive. Previous research works [57–60] state that students’perceptions play a positive role in their learning, resulting in deeper learning and improved learningoutcomes. Two assessment characteristics have a special influence on students’ effective learning:authenticity [57,58] and feedback [59,60]. Regarding the former, the students stated that autoCOREctorassessed practical competences better than quiz-based tests or open-ended questions, and, in general,that using autoCOREctor improved their programming knowledge. When asked about the latter,they stated that the feedback provided was useful and easy to understand, and that they would ratherreceive the feedback provided by autoCOREctor than no feedback whatsoever. One important concernshould be stated about authenticity: using an automated assessment tool is not as authentic as apractical exam where students do not have any feedback or help from an assessment tool and theyhave to debug their programs to find their errors. This is something that should be considered bythe teaching staff before using this kind of tool. In this case study, it was an easy decision, as in thiscontext the students do not have high programming skills, and software testing is out of the scope ofthe course.Based on the evidence that is presented in this study, it can be suggested that student-centeredautomated assessment systems can be a great help for students when appropriately integrated intothe teaching method of the course. The students stated that, if using the tool had been optional,they would have chosen to use it without a doubt, and they would like other courses to involve a toollike autoCOREctor. Furthermore, they assert that they dedicated more time to the assignments andthat they obtained better grades thanks to the tool. Finally, the generated grades were consideredfair by them, both in the exam and in the programming assignments. On the one hand, this is animportant result, as fairness has proven to be positively correlated with student motivation andeffective learning [71], but, on the other hand, it should be further researched, as in our case study wehave not delved deeper into it.Since autoCOREctor is a versatile tool that can be adapted to different scenarios, in this case study,we took advantage of this versatility to be able to urgently change the course assessment method andadapt it to a fully online format in a timely manner. Programming assignments and exams are part ofthe course assessment, but they have different characteristics; the former is a formative assessment andthe latter is a summative assessment, the main difference between them being whether the studentreceives feedback and how elaborate this feedback is [11]. By using autoCOREctor, the teacher is theone in charge of writing the feedback received by students, which can be very detailed, revealing tothe student how to solve the problem found or where to look for a possible solution (which is moreadequate in the programming assignments), or it can be more scarce, only showing the error foundand letting students apply their knowledge to solve the problem (which is more suited to exams).Regarding the use of an automated student-centered assessment tool in the exam, the experiencereported in this article constitutes another original contribution to the existing body of knowledge,as no work has been found in the literature reporting this specific use of an automated assessmentsystem. For this particular use, several advantages and disadvantages were identified in this case study.In the first place, the main advantage is that it enables the assessment of practical competences forcrowded courses in remote scenarios that could not be measured with an online test alone, applying thesame assessment criteria to all of the students taking the exam. The tool allows the teachers of theSustainability 2020, 12, 7451 20 of 24course to control and monitor the whole assessment process, providing them with learning analyticsthat can be used to improve future editions, which is a great advantage. This advantage can also posea disadvantage, since the whole exam relies on the tool and, if it fails, the exam cannot be completed.To mitigate this drawback, a mock exam should be carried out days before the real exam, and analternative submission method should be enabled in the Moodle platform in case the tool server hangsup or freezes. Additionally, a videoconference room can be facilitated for students during the exam,in order to solve any technical problem they might find. Another great advantage is that it has provento be flexible enough to be used in both scenarios—face-to-face and fully online—making it a veryadequate tool to be used over the next few years, in which the COVID-19 pandemic is still threateningto interrupt face-to-face activities again. In Spain, the Ministry of Universities has distributed somerecommendations and guidelines to adapt next year’s course to what they call the ‘new normal’ in thepresence of COVID-19 [72]. These recommendations include an enhanced digitization strategy andteaching method adaptability, being able to switch from face-to-face to remote overnight. One finalcharacteristic identified which could bring an advantage is that, since the autoCOREctor client isexecuted on students’ computers, the grade is generated in said environment, avoiding softwaredifferences between the student’s environment and the assessment environment, which could otherwisemake the grade different between both scenarios (as might happen in instructor-centered assessmenttools). However, at the same time, this can pose a security issue if students hack the tool and obtain agrade without solving the problem statements that were set out. The teaching staff has not identifiedany security pitfalls to date, but this could be an interesting future work to analyze it through.With all these concerns in mind, the teaching staff of the course is convinced of using it over thecoming years, regardless of whether classes are face-to-face or remote. Moreover, if the pandemic doesnot subside, and its requirements are sustained over time, the use of automated assessment systemsfor both assignments and exams can play an important role in practical courses.Since autoCOREctor has proven to be an effective tool for use in a crowded programming courseat a higher education institution, another interesting future research topic would be to analyze itsinvolvement in the assessment of the most crowded courses nowadays: MOOCs (Massive Open OnlineCourses). Special attention should be paid to students’ perceptions of the tool and their use of it.Besides this, in MOOCs about programming, the use of student-centered automated assessment toolsas an alternative to the traditional method of assessment in general-purpose MOOCs (i.e., online testsand peer-to-peer evaluation) could be further researched, determining whether the use of these toolscan bring any advantage and provide an effective assessment method.6. LimitationsSeveral limitations of this case study should be noted. First of all, the surveys were only validatedfor internal consistency using Cronbach’s Alpha, and were reviewed by the members of the course staff(several of them e-learning experts), but not further validations—such as principal components analysisor a pilot test—were performed. Second of all, as happens with all home assignments, although thestudent introduces his/her private credentials (i.e., email and token) in the tool, it cannot be ensuredthat the student is the one doing the assignment. The methods to assure this might also constitute aninteresting future work. Finally, additional and more robust conclusions could be drawn if the toolwere used in another scenario or context, such as another practical course that allowed us to obtaincomparable data.Author Contributions: Conceptualization: E.B., S.L.-P., Á.A., A.G.; software: E.B., S.L.-P., Á.A.; validation: E.B.,S.L.-P., Á.A., J.F.S.-R., J.Q.; data curation: S.L.-P.; writing—original draft preparation: E.B., S.L.-P., Á.A., A.G.;writing—review and editing: E.B., S.L.-P., Á.A., J.F.S.-R., A.G.; supervision: J.Q. All authors have read and agreedto the published version of the manuscript.Funding: This research received no external funding.Conflicts of Interest: The authors declare no conflict of interest.Sustainability 2020, 12, 7451 21 of 24References1. García-Peñalvo, F.J.; Corell, A.; Abella-García, V.; Grande, M. Online Assessment in Higher Education in theTime of COVID-19. Educ. Knowl. Soc. 2020. [CrossRef]2. Zhang, L.-Y.; Liu, S.; Yuan, X.; Li, L. Standards and Guidelines for Quality Assurance in the European HigherEducation Area: Development and Inspiration. In Proceedings of the International Conference on EducationScience and Development (ICESD 2019), Shenzhen, China, 19–20 June 2019.3. ANECA. Guía de Apoyo para la Redacción, Puesta en Práctica y Evaluación de los Resultados del Aprendizaje;ANECA: Madrid, Spain, 2020. Available online: http://www.aneca.es/content/download/12765/158329/file/learningoutcomes_v02.pdf (accessed on 24 August 2020).4. McAlpine, M. Principles of Assessment; CAA Centre, University of Luton: Luton, UK, 2002; ISBN 1-904020-01-1.5. Nicol, D.; MacFarlane-Dick, D. Formative assessment and selfregulated learning: A model and sevenprinciples of good feedback practice. Stud. High. Educ. 2006, 31, 199–218. [CrossRef]6. Luo, T.; Murray, A.; Crompton, H. Designing Authentic Learning Activities to Train Pre-Service TeachersAbout Teaching Online. Int. Rev. Res. Open Distrib. Learn. 2017, 18, 141–157. [CrossRef]7. Carless, D.; Joughin, G.; Liu, N. How Assessment Supports Learning: Learning-Oriented Assessment in Action;Hong Kong University Press: Hong Kong, China, 2006.8. Carless, D. Learning-oriented assessment: Conceptual bases and practical implications. Innov. Educ.Teach. Int. 2007, 44, 57–66. [CrossRef]9. Higgins, C.A.; Gray, G.; Symeonidis, P.; Tsintsifas, A. Automated Assessment and Experiences of TeachingProgramming. ACM J. Educ.l Resour. Comput. 2005, 5, 5. [CrossRef]10. Robins, A.; Rountree, J.; Rountree, N. Learning and teaching programming: A review and discussion. Int. J.Phytoremediat. 2003, 21, 137–172. [CrossRef]11. Taras, M. Assessment—Summative and formative—Some theoretical reflections. Br. J. Educ. Stud. 2005, 53,466–478. [CrossRef]12. Harlen, W.; James, M. Assessment and learning: Differences and relationships between formative andsummative assessment. Int. J. Phytoremediat. 1997, 21, 365–379. [CrossRef]13. REACU. Acuerdo de REACU de 3 de abril de 2020, ante la Situación de Excepción Provocada por el COVID-19;REACU: Madrid, Spain, 2020.14. ANECA. Estrategia de ANECA para el Aseguramiento de la Calidad en la Enseñanza Virtual; ANECA: Madrid,Spain, 2020.15. Dawson-Howe, K.M. Automatic Submission and Administration of Programming Assignments. ACM SIGCSEBull. 1995, 27, 51–53. [CrossRef]16. Jackson, D. A semi-automated approach to online assessment. In Proceedings of the 5th Annual SIGCSE/SIGCUE ITiCSE Conference on Innovation and Technology in Computer Science Education—ITiCSE ’00,Helsinki, Finland, 10–14 July 2000; ACM Press: New York, NY, USA, 2000; pp. 164–167.17. Blumenstein, M.; Green, S.; Nguyen, A.; Muthukkumarasamy, V. GAME: A generic automated markingenvironment for programming assessment. In Proceedings of the International Conference on InformationTechnology: Coding Computing, ITCC, Las Vegas, NV, USA, 5–7 April 2004; pp. 212–216.18. Souza, D.M.; Felizardo, K.R.; Barbosa, E.F. A systematic literature review of assessment tools for programmingassignments. In Proceedings of the 2016 IEEE 29th Conference on Software Engineering Education andTraining, CSEEandT 2016, Dallas, TX, USA, 6–8 April 2016; pp. 147–156.19. Entwistle, N.J. Approaches to learning and perceptions of the learning environment—Introduction to theSpecial Issue. High. Educ. 1991, 22, 201–204. [CrossRef]20. Struyven, K.; Dochy, F.; Janssens, S. Students’ perceptions about evaluation and assessment in highereducation: A review. Assess. Eval. High. Educ. 2005, 30, 325–341. [CrossRef]21. Perception Definition in the Cambridge Dictionary. Available online: https://dictionary.cambridge.org/dictionary/english/perception (accessed on 24 August 2020).22. Douce, C.; Livingstone, D.; Orwell, J. Automatic Test-Based Assessment of Programming: A Review. ACM J.Educ. Resour. Comput. 2005, 5, 4. [CrossRef]23. Ala-Mutka, K.M. A survey of automated assessment approaches for programming assignments. Comput. Sci.Educ. 2005, 15, 83–102. [CrossRef]Sustainability 2020, 12, 7451 22 of 2424. Ihantola, P.; Ahoniemi, T.; Karavirta, V.; Seppälä, O. Review of recent systems for automatic assessment ofprogramming assignments. In Proceedings of the 10th Koli Calling International Conference on ComputingEducation Research, Koli Calling’10, Koli, Finland, 28–31 October 2010; ACM Press: New York, NY, USA,2010; pp. 86–93.25. Caiza, J.C.; del Álamo, J.M. Programming assignments automatic grading: Review of tools andimplementations. In Proceedings of the 7th International Technology, Education and Development Conference(INTED 2013), Valencia, Spain, 4–5 March 2013; pp. 5691–5700.26. Keuning, H.; Jeuring, J.; Heeren, B. Towards a systematic review of automated feedback generationfor programming exercises. In Proceedings of the Annual Conference on Innovation and Technology inComputer Science Education, ITiCSE, Arequipa, Peru, 11–13 July 2016; Association for Computing Machinery:New York, NY, USA, 2016; pp. 41–46.27. Ullah, Z.; Lajis, A.; Jamjoom, M.; Altalhi, A.; Al-Ghamdi, A.; Saleem, F. The effect of automatic assessment onnovice programming: Strengths and limitations of existing systems. Comput. Appl. Eng. Educ. 2018, 26,2328–2341. [CrossRef]28. Lajis, A.; Baharudin, S.A.; Ab Kadir, D.; Ralim, N.M.; Nasir, H.M.; Aziz, N.A. A review of techniques inautomatic programming assessment for practical skill test. J. Telecommun. Electron. Comput. Eng. (JTEC)2018, 10, 109–113.29. Pieterse, V. Automated Assessment of Programming Assignments. In Proceedings of the 3rd ComputerScience Education Research Conference (CSERC ’13), Heerlen, The Netherlands, 4–5 April 2013; pp. 45–56.30. Leal, J.P.; Silva, F. Mooshak: A Web-based multi-site programming contest system. Softw. Pract. Exp. 2003,33, 567–581. [CrossRef]31. Eldering, J.; Gerritsen, N.; Johnson, K.; Kinkhorst, T.; Werth, T. DOMjudge. Available online: https://www.domjudge.org (accessed on 24 August 2020).32. Joy, M.; Griffiths, N.; Boyatt, R. The BOSS Online Submission and Assessment System. ACM J. Educ.Resour. Comput. 2005, 5, 2. [CrossRef]33. Gotel, O.; Scharff, C. Adapting an open-source web-based assessment system for the automated assessmentof programming problems. In Proceedings of the Sixth Conference on IASTED International ConferenceWeb-Based Education, Anaheim, CA, USA, 14–16 March 2007; Volume 2, pp. 437–442.34. Srikant, S.; Aggarwal, V. A system to grade computer programming skills using machine learning.In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,New York, NY, USA, 23–27 August 2014; Association for Computing Machinery: New York, NY, USA, 2014;pp. 1887–1896.35. Chakraverty, S.; Chakraborty, P. Tools and Techniques for Teaching Computer Programming: A Review.J. Educ. Technol. Syst. 2020. [CrossRef]36. Bai, X. Enhancing the learning process in programming courses through an automated feedback andassignment management system. Issues Inf. Syst. 2016, 17, 165–175.37. Amelung, M.; Krieger, K.; Rösner, D. E-assessment as a service. IEEE Transact. Learn. Technol. 2011, 4,162–174. [CrossRef]38. Shute, V.J. Focus on Formative Feedback. Rev. Educ. Res. 2008, 78, 153–189. [CrossRef]39. Gaudencio, M.; Dantas, A.; Guerrero, D.D.S. Can computers compare student code solutions as well asteachers? In Proceedings of the 5th ACM Technical Symposium on Computer Science Education—SIGCSE,Atlanta, GA, USA, 5–8 March 2014; Association for Computing Machinery: New York, NY, USA; pp. 21–26.40. Jurado, F.; Redondo, M.A.; Ortega, M. Using fuzzy logic applied to software metrics and test cases to assessprogramming assignments and give advice. J. Netw. Comput. Appl. 2012, 35, 695–712. [CrossRef]41. Singh, R.; Gulwani, S.; Solar-Lezama, A. Automated feedback generation for introductory programmingassignments. In Proceedings of the 34th ACM SIGPLAN conference on Programming language designand implementation—PLDI ’13, Seatle, WA, USA, 16–22 June 2013; Association for Computing Machinery(ACM): New York, NY, USA, 2013; p. 15.42. Spacco, J.; Hovemeyer, D.; Pugh, W.; Emad, F.; Hollingsworth, J.K.; Padua-Perez, N. Experiences withmarmoset: Designing and using an advanced submission and testing system for programming courses.In Proceedings of the 11th Annual SIGCSE Conference on Innovation and Technology in Computer ScienceEducation, ITiCSE 2006, Bologna, Italy, 26–28 June 2006; Association for Computing Machinery (ACM):Bologna, Italy, 2006; p. 13.Sustainability 2020, 12, 7451 23 of 2443. Edwards, S.H. Improving student performance by evaluating how well students test their own programs.J. Educ. Resour. Comput. 2003, 3, 1. [CrossRef]44. Gutiérrez, E.; Trenas, M.A.; Ramos, J.; Corbera, F.; Romero, S. A new Moodle module supporting automaticverification of VHDL-based assignments. Comput. Educ. 2010, 54, 562–577. [CrossRef]45. Ramos, J.; Trenas, M.A.; Gutiérrez, E.; Romero, S. E-assessment of Matlab assignments in Moodle: Applicationto an introductory programming course for engineers. Comput. Appl. Eng. Educ. 2013, 21, 728–736. [CrossRef]46. Restrepo-Calle, F.; Ramírez Echeverry, J.J.; González, F.A. Continuous assessment in a computer programmingcourse supported by a software tool. Comput. Appl. Eng. Educ. 2019, 27, 80–89. [CrossRef]47. Gordillo, A. Effect of an Instructor-Centered Tool for Automatic Assessment of Programming Assignmentson Students’ Perceptions and Performance. Sustainability 2019, 11, 5568. [CrossRef]48. Wang, T.; Su, X.; Ma, P.; Wang, Y.; Wang, K. Ability-training-oriented automated assessment in introductoryprogramming course. Compu. Educ. 2011, 56, 220–226. [CrossRef]49. Gárcia-Mateos, G.; Fernández-Alemán, J.L. A course on algorithms and data structures using on-line judging.In Proceedings of the Conference on Integrating Technology into Computer Science Education, ITiCSE, Paris,France, 6–9 July 2009; ACM Press: New York, NY, USA, 2009; pp. 45–49.50. Rubio-Sánchez, M.; Kinnunen, P.; Pareja-Flores, C.; Velázquez-Iturbide, Á. Student perception and usage ofan automated programming assessment tool. Comput. Hum. Behav. 2014, 31, 453–460. [CrossRef]51. Yin, R.K. Case Study Research: Design and Methods, 5th ed.; SAGE Publications, Inc.: London, UK; ThousandOaks, CA, USA, 2014; ISBN 1452242569.52. Quemada, J.; Barra, E.; Gordillo, A.; Pavon, S.; Salvachua, J.; Vazquez, I.; López-Pernas, S. Ammil:A methodology for developing video-based learning courses. In Proceedings of the ICERI2019 Proceedings,Seville, Spain, 11–13 November 2019; pp. 4893–4901.53. Richard, J. Light Making the Most of College: Students Speak Their Minds; Harvard University Press: Cambridge,MA, USA, 2001; ISBN 9780674004788.54. Edwards, S.H. Using Test-Driven Development in the Classroom: Providing Students with Automatic,Concrete Feedback on Performance. In Proceedings of the international conference on education andinformation systems: Technologies and applications EISTA, Orlando, FL, USA, 31 July–2 August 2003.55. Epstein, M.L.; Lazarus, A.D.; Calvano, T.B.; Matthews, K.A.; Hendel, R.A.; Epstein, B.B.; Brosvic, G.M.Immediate feedback assessment technique promotes learning and corrects inaccurate first responses.Psychol. Rec. 2002, 52, 187–201. [CrossRef]56. Fu, X.; Peltsverger, B.; Qian, K.; Liu, J.; Tao, L. APOGEE-Automated Project Grading and Instant FeedbackSystem for Web Based Computing. In Proceedings of the 39th ACM Technical Symposium on ComputerScience Education SIGCSE’08, New York, NY, USA, 12–15 March 2008; pp. 77–81.57. Gulikers, J.T.M.; Bastiaens, T.J.; Kirschner, P.A.; Kester, L. Authenticity is in the eye of the beholder: Studentand teacher perceptions of assessment authenticity. J. Vocat. Educ. Train. 2008, 60, 401–412. [CrossRef]58. Gulikers, J.; Bastiaens, T.; Kirschner, P. Authentic assessment, student and teacher perceptions: The practicalvalue of the five-dimensional framework. J. Vocat. Educ. Train. 2006, 58, 337–357. [CrossRef]59. Gibbs, G.; Simpson, C. Conditions Under Which Assessment Supports Students’ Learning. Learn. Teach.High. Educ. 2005, 1, 3–31.60. Higgins, R.; Hartley, P.; Skelton, A. The conscientious consumer: Reconsidering the role of assessmentfeedback in student learning. Stud. High. Educ. 2002, 27, 53–64. [CrossRef]61. Rößling, G.; Joy, M.; Moreno, A.; Radenski, A.; Malmi, L.; Kerren, A.; Naps, T.; Ross, R.J.; Clancy, M.;Korhonen, A.; et al. Enhancing learning management systems to better support computer science education.ACM SIGCSE Bull. 2008, 40, 142–166. [CrossRef]62. Glitch. Available online: https://glitch.com/ (accessed on 24 August 2020).63. GitHub. Available online: https://github.com (accessed on 4 August 2020).64. Moodle API. Available online: https://docs.moodle.org/dev/Core_APIs (accessed on 4 August 2020).65. Mocha. Available online: https://mochajs.org/ (accessed on 4 August 2020).66. ZombieJS. Available online: http://zombie.js.org/ (accessed on 4 August 2020).67. Chai. Available online: https://www.chaijs.com/ (accessed on 4 August 2020).68. Watson, C.; Li, F.W. Failure rates in introductory programming revisited. In Proceedings of the 2014Conference on Innovation & Technology in Computer Science Education ITiCSE ’14, Uppsala, Sweden,21–25 June 2014; pp. 39–44.Sustainability 2020, 12, 7451 24 of 2469. Bandura, A. Self-efficacy mechanism in human agency. Am. Psychol. 1982, 37, 122–147. [CrossRef]70. Bloom, B.S. Taxonomy of Educational Objectives: The Classification of Education Goals by a Committee of Collegeand University Examiners; David McKay: Philadelphia, PA, USA, 1956.71. Chory-Assad, R.M. Classroom justice: Perceptions of fairness as a predictor of student motivation, learning,and aggression. Commun. Q. 2002, 50, 58–77. [CrossRef]72. Ministerio de Universidades. Recomendaciones del Ministerio de Universidades a la Comunidad Universitariapara Adaptar el Curso Universitario 2020–2021 a una Presencialidad Adaptada; Ministerio de Universidades:Madrid, Spain, 2020. Available online: https://www.ciencia.gob.es/stfls/MICINN/Universidades/Ficheros/Recomendaciones_del_Ministerio_de_Universidades_para_adaptar_curso.pdf (accessed on 4 August 2020).‘ 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

QUALITY: 100% ORIGINAL PAPER – NO PLAGIARISM – CUSTOM PAPER

Leave a Reply

Your email address will not be published. Required fields are marked *