DEVELOPMENT AND VALIDATION OF MULTIPLE CHOICE TEST OF THE GEOMETRY PART OF MATHEMATICS FOR SECONDARY CLASS

http://dx.doi.org/10.31703/gssr.2019(IV-II).27      10.31703/gssr.2019(IV-II).27      Published : Jun 2019
Authored by : Nasreen Akhter , Ahmad Akhtar Usmani , Sabiha Iqbal

27 Pages : 203-210

    Abstract

    This study  overviews development and validity of a pool of multiple-choice test items of geometry part of Mathematics for secondary level. A table of specifications was prepared and a pool of 48 multiple-choice type test items was developed from the test universe. The content validity and face validity of test items was determined with the help of a team of experts. Sample of the study was 488 students of class 10. After determining the validity and  reliability through item analysis and quantitative and qualitative analysis of the test, it was concluded that 30 out of 48 items in the test were valid, reliable and suitable for measurement of the learning achievements in the course. Therefore, these items are useful in the boards of examinations for 9th class in Punjab (Pakistan).

    Key Words

    Achievement Test, Difficulty Power, Discrimination Power, Item Analysis, Reliability, Validity

    Introduction

    The history of formal student achievement assessment starts in 1838 when American New York State started it as a test of truth. The period from 1840 to 1875 recognized numerous types of tests in the history of American educational testing. Formal oral testing was replaced by written examinations and written testing was replaced by standardized examinations (Clements, 2007). The tests of achievement have therefrom become a specific component of the oral and written examinations to confirm the learning and prescribed material (Donald, 2013). These tests have set of rules in the design of construction. Ideally, tests should be standardized and have their own norms for their interpretation of scores during analysis (Mullis et.al., 2007). 

    The procedure of standardized tests is difficult, expensive and time-consuming task. Standard procedure includes; selection of universe of test, setting of testing objectives, decision about the items’ format, preparation of table of specifications, item bank construction, validity determination of items, pilot  testing of items, analysis of pilot testing data and improvement of defective/ poor items in the light of analysis of all items, final try out of improved items and analysis of final try out data (Roid, 2012). Item analysis can be analyzed qualitatively in terms of their content and appearance, which includes the construction of face and content validity. Quantitative analysis of test items includes the measurement of item difficulty, distractor effect and items discrimination. (Rudner, 2011). Quantitative Item analysis procedure includes evaluation of difficulty level, discrimination level and distractor effect of each item of the test (Runder, 2011). Item difficulty indicates the proportion of correct and incorrect responses regarding an item. (Hulin, Drasgow & Parsons, 2013).  An item having difficulty range between 0.20 to 0.80 is considered suitable. Discrimination level of an item is quality of an item that explains the power of an item to differentiate between high achievers and low achievers (Kelly, 2014). Ideally, an item having discrimination power in upper positive limits is always appreciated and an item having statistical value of discrimination power minimum to 0.30 is suggested for rejection (Roid, 2012). 

    In general, development and validation of tests for use of teachers in education adopting a standardized procedure is taken difficult. The institutions do not have resources in the form of money, time and trained staff to develop valid and reliable tests that can be used for a specific period of time (Akhter & Bahoo, 2015; Akhtar, 2015; Munir, 2016).  The procedure of change in curriculum also discourage evaluators to use teacher made tests. But, researchers in the field can  

    do this laborious job. Otherwise, testing organizations can hire such technical staff on payment that standardized tests can be provided in institutions to improve the quality of assessment. Therefore, the present study was conducted to develop and validate a pool of items for evaluation of learning achievement of students in the Geometrical part of mathematics for secondary class. This is helpful for examination authorities to select some items for quality assessment of students.

    Research Methodology

    By purpose this study was research and development and by method this study followed the procedure of survey method in which standard procedure of test validation was adopted. In this study; text book of mathematics for 9th class; English medium taught in Punjab province was used for construction of tests items. Test items covered first three levels of cognitive domains and was administered to 10th class at the start of commencement of classes in new session.

     

    Population and Sampling  

    All students of 10th grade of English medium schools in Punjab province were the population of the study. A sample of 488 students from 20 schools of Punjab province in Pakistan was taken by using simple random sampling technique. Ninety-six students participated in pilot testing and 392 students participated in final tryout of test.

     

    Instrument of the study

    This study comprised a multiple-choice type test of Geometry and this test was used as instrument for data collection in the study. The steps in the development of the test  were as follow:

    1.     Text book of Mathematics for secondary level 9th published by Punjab Text Book Board Lahore was consulted. All nine chapters (9 to 17, PP.168 to 267) of geometry part of mathematics were defined as the universe of the test.

    2.     A table of specification was prepared. It was delimited to first three levels of Bloom’s taxonomy of cognitive domains.

    Table 1. Table of Specifications of Items

    Chapter No.

    Knowledge (Item no)

    Comprehension(Item no)

    Application (Item no)

    Total

    9

    33

    2, 34

    3, 4

    05

    10

    5

    13, 32

    35, 47

    05

    11

    9, 29

    12, 30

    8, 21, 25

    07

    12

    36, 46

    41

    10, 14

    05

    13

    6, 37. 39

    11, 23, 26

    7, 38

    08

    14

    22, 48

    Nil

    40

    03

    15

    1, 16

    15

    27, 42

    05

    16

    17, 20, 28

    18

    19, 31

    06

    17

    24

    43

    44, 45

    04

    Total

    17

    13

    18

    48

     

     

     

     

     

    3.     Following the rules of development of multiple-choice items, 48 items with four options were prepared.

    4.     To examine the content validity of the test items three subject experts were approached. They were teaching this subject from at least 10 years and had M. Sc mathematics with M. Ed level qualification. To analyze the face validity of test, a team of two experts having Ph. D degree in educational assessment were chosen. Researchers sorted out poor items following the evaluation of experts and improved them according to suggestions of experts.

    5.     Forty-eight multiple-choice type test items with four options were arranged in a test booklet form and required number of copies were prepared. Researchers administered test booklets in examination conditions. The students were given open time to solve the test. Average time taken by students was 90 minutes in both trials.

    6.     Scoring of answer books was done by the researchers. Each correct response was given one mark. Zero mark was fixed for wrong answers.

    7.     Item analysis criteria was decided (see table 2) and each item of the test was analyzed on the basis of data. None of the rejected items of pilot test was included in final test. All revisions of items were incorporated keeping in view their defects pointed out in item analysis, and improved items were again included in the final test.

    Table 2. Item Analysis Criteria of the Test

    Item’s Interpretation

    Item Analysis Criteria

    Difficulty Power

    Discrimination Power

    Distractor Effect

    Selected item

    0.20-0.80

    0.30 and above

    At least 2% response on each option

    Needs Revision

    0.20 – 0.80

    0.20 – 0.29

    Any distractor had less than 2% response

    Rejected item

    Below 0.20

    Below than 0.20

    Below than 2% response on option

    Results and Discussion

    Item Analysis of the Test 

    Item analysis was done in two steps; item analysis of pilot testing data and item analysis of final try-out testing data. For understanding and comparison, results of pilot and final tryout are detailed in table 3.

    Table 3. Item Analysis Results of Pilot and Final Try out Data

    Note= * = Right options/ Key, ? = Discrimination level, D = Difficulty level, Rm = Remarks, S = Selected item, R = Rejected item

    Regarding the Pilot Testing

    The table 3 demonstrates that 33 items were selected, 10 items needed revisions while 5 items were rejected. It showed that item numbers 18, 22, 25, 34 and 39 did not match the criteria of selection of items because their discrimination level was less than 0.20. So, these were rejected. On the other hand, the item numbers 4, 12, 16, 19, 24, 27, 31, 37, 42 and 45 had discrimination level between 0.20 to 0.29. So, these items were considered for revision. While other items match the criteria of selected items.

    Regarding the Final Try out Results

    The item numbers 18, 22, 25, 34 and 39 were rejected in pilot test. So, these serials numbers were left blank in the table of final try-out test. Similarly, the item numbers 4, 12, 16, 19, 27, 31, 33, 37, 42 and 45 were revised before final try-out test.

    The part of final try out results in table 3 demonstrates that 30 items were selected, 7 items required revisions and 6 items were rejected. It showed that item numbers 8, 13, 16, 27, 42 and 45 did not match the criteria of selection of items. So, these items were rejected. On other hand, the item number 4, 12, 19, 24, 31, 33 and 37 had discrimination level greater than 0.20 but less than 0.30. Therefore, these items were needing revision for next try out. While other items matched the criteria of selection of items, so these were selected for future use.

     

    Difficulty Index of the Final try out Items

    Difficulty level of test play role in evaluation of quality of test. Analysis of difficulty level of the test (see table 4) exhibited that majority of items of the test were in middle difficulty range. A small number of test items (7%) were out of the range of decided criterion of difficulty range. These items were difficult for the respondents.

    By comparing the quality of test with criterion of Kelly (2014), 93% of items in the test were in acceptable range and none of the item in test was easy item (0.71 to 1.00) because 93% items in the test were in the range of middle difficulty level i.e. 0.31 to 0.70 ((see table 4) and none of he items was very easy for the respondents.

    Table 4. Distribution of Item Difficulty Index of final try out items

    Difficulty Range

    f

    %

    0.10 to 0.19

    3

    07

    0.20 to 0. 29

    0

    00

    0.30 to 0.39

    1

    02

    0.40 to 0.49

    9

    21

    0.50 to 0.59

    27

    63

    0.60 to 0.61

    3

    07

    Total

    43

    100

    Discrimination Index of the Final try Out Items

    Discrimination index of the items tried out in final exhibited that 70% of the items were meeting criterion of selected items. Moreover, all items were positively discriminating high achievers from low achievers (see table 5).

    By comparing the results of this test regarding discrimination power of items with the criterion given by Roid (2012) about definition of quality of test items, this test has less (14%) number of poor items and some (16%) marginal items in discriminating high achievers to low achievers but majority of reasonably good or good items. He has elaborated that good items have a discrimination index of 0.40 and higher; reasonably good items from 0.30 to 0.39; marginal items from 0.20 to 0.29, and poor items less than 0.20.

    Table 5. Distribution of Discrimination Power of Final try out Items

    Discrimination Range

    f

    %

    0.11 to 0.19

    6

    14

    0.20 to 0.29

    7

    16

    0.30 to 0.39

    15

    35

    4.41 to 0.47

    15

    35

    Total = 0.11 to 0.47

    43

    100

     

    Reliability of the Test

    Table 6. Summary of Reliability and Standard error of Measurement of Pilot and Final try out Results

    Test Trial Stage

    Sample

    n

    Variance

    ? pq

    SD

    RT

    SE

    Pilot try out

    96

    48

    145.62

    11.611

    12.06

    0.93

    3.19

    Final try out

    392

    43

    26.107

    10.1951

    5.11

    0.62

    3.15

    Note:  n= number of items in the test, RT= reliability of test applying formula KR 20, SE= standard error of measurement

    Table 6 describes that reliability (using Kuder Richardson 20) of pilot try out data is good enough because acceptable reliability estimates ranging is considered from 0.856 to 0.958 (Wilson, 2005) and reliability in range of 0.832 to 0.931is acceptable for the student, class and school environment scale in criterion defined by Popham (2005).

    But, analysis of reliability of final try out data shows although acceptable but low (r= 0.62) in comparison to pilot try out data of the test. Analysis of literature regarding the reliability of tests indicates that value of correlation of teacher made test approximately 0.50 is accepted but nearly 0.60 is accepted and appreciated. According to Mohamad, et al (2015, p.165), although value of reliability close to most upper range (1.00) is good but in social sciences, value of correlation 0.60 is acceptable. But criterion described by Bahoo (2015) and Raza (2012) explains that a test having reliability in the range of 0.60 and 0.70 describes reliability of a test slightly low and needs to be supplemented by more trials. But, In the present study, standard error of measurement of pilot as well as final try out of data; 3.19 and 3.15 signifies a small piece of error in terms of raw scores in results that is evidence of better reliability of results.

     

    Descriptive Analysis of Final Try-Out Test Score of the Examinees

    Table 7.Statistical Analysis of Test Scores of final try out of Data

    Try out

    Total items

    n

    Range

    Minimum score

    Maximum score

    Mean

    Median

    Mode

    Skewness

    Kurtosis

    Pilot try out

    48

    096

    40

    06

    46

    27.29

    30

    15

    -0.143

    -1.34

    Final try out

    43

    392

    28

    13

    41

    27.497

    28

    27

    -0.193

    -.126

    Table 7 shows value of range of data in final try out was less than of pilot try out.  Analysis of scores of mean, median and mode explores that the test was difficult for the respondents and figure 1 and 2 exhibits that spread of scores of students was not exactly on normal distribution but near to normal in final distribution. 

    Figure 1

    Spread of Scores of Pilot try out Data

    Figure 2

    Spread of Scores of Final try out Data

    Gender and Locality Related Comparison of Scores of Students

    Table 8. Gender and Locality wise Analysis of Final try out Data

    Gender wise Analysis

    Locality wise Analysis

    T- Test for Equality of Means

    T- Test for Equality of Means

    Item

    t

    Sig

    Mean Difference

    SED

    Item

    t

    Sig

    Mean Difference

    SED

    1

    -.201

    .830

    -.010

    .049

    1

    -.835

    .404

    -.041

    .049

    2

    .499

    .618

    .025

    .050

    2

    -.195

    .845

    .010

    .050

    3

    -.222

    .825

    -.011

    .051

    3

    -.432

    .666

    -.022

    .052

    4

    .490

    .619

    -.024

    .049

    4

    1.466

    .143

    .072

    .049

    5

    .094

    .925

    .005

    . 050

    5

    .648

    .517

    .033

    .050

    6

    .425

    .671

    .022

    . 051

    6

    .769

    .442

    .039

    .051

    7

    -.161

    .871

    -.008

    . 051

    7

    .567

    .571

    .029

    .051

    8

    -.272

    .786

    -.014

    . 051

    8

    .349

    .727

    .018

    .052

    9

    .985

    .325

    .050

    . 050

    9

    .445

    .657

    .023

    .051

    10

    -1.33

    .171

    .071

    . 051

    10

    .228

    .820

    .012

    .052

    11

    -.676

    .499

    -.034

    . 051

    11

    -.558

    .577

    .029

    .051

    12

    -.534

    .549

    -.027

    . 051

    12

    .436

    .663

    .022

    .051

    13

    .168

    .876

    .009

    . 051

    13

    -.101

    .919

    .005

    .051

    14

    .081

    .936

    .004

    . 051

    14

    -1.05

    .200

    .050

    .051

    15

    -.761

    .447

    -.037

    . 051

    15

    .180

    .850

    .010

    .051

    16

    -1.03

    .303

    -.053

    . 048

    16

    -.060

    .952

    -.003

    .051

    17

    -.318

    .750

    -.016

    . 050

    17

    -1.22

    .203

    -.065

    .051

    18

    1.184

    .237

    -.060

    . 051

    18

    -1.33

    .165

    -.071

    .051

    19

    -.367

    .712

    .019

    . 051

    19

    -1.98

    .051

    -.099

    .051

    20

    -.024

    .981

    -.001

    . 050

    20

    -1.19

    .234

    -.060

    .051

    21

    .597

    .551

    -.031

    . 051

    21

    -1.34

    .178

    -.069

    .051

    22

    .324

    .746

    .016

    . 050

    22

    -1.02

    .285

    -.054

    .050

    23

    .115

    .909

    .006

    . 050

    23

    -1.28

    .201

    -.065

    .050

    24

    -.040

    .968

    .002

    . 051

    24

    -1.18

    .236

    -.060

    .051

    25

    -.283

    .771

    -.014

    . 051

    25

    -.811

    .418

    -.041

    .051

    26

    .340

    .733

    .017

    . 051

    26

    .860

    .390

    .044

    .051

    27

    .028

    .978

    .001

    . 050

    27

    .740

    .460

    .037

    .051

    28

    1.205

    .229

    .061

    . 051

    28

    .363

    .784

    .014

    .051

    29

    -1.18

    .239

    -.059

    . 050

    29

    .360

    .717

    .018

    .050

    30

    -.197

    .844

    -.010

    . 051

    30

    .945

    .345

    .041

    .051

    31

    .187

    .856

    -.009

    . 050

    31

    .740

    .460

    .037

    .051

    32

    .202

    .840

    .010

    . 051

    32

    .315

    .753

    .016

    .051

    33

    .843

    .400

    .043

    . 050

    33

    -1.56

    .118

    -.079

    .051

    34

    .306

    .760

    .009

    . 050

    34

    .611

    .542

    .031

    .051

    35

    .080

    .937

    .004

    . 050

    35

    .152

    .879

    .008

    .050

    36

    .997

    .319

    .050

    . 051

    36

    -.142

    .887

    -.007

    .051

    37

    .322

    .747

    .016

    . 051

    37

    .024

    .981

    .001

    .051

    38

    -.531

    .599

    -.029

    . 050

    38

    -.436

    .661

    -.022

    .051

    39

    .099

    .921

    .005

    . 051

    39

    -.184

    .854

    -.009

    .051

    40

    -1.16

    .245

    -.059

    . 051

    40

    .816

    .412

    .042

    .051

    41

    1.184

    .237

    .060

    . 051

    41

    -.142

    .887

    -.007

    .051

    42

    -.821

    .412

    -.041

    . 051

    42

    -.310

    .757

    -.016

    .051

    43

    -.549

    .584

    -.028

    . 051

    43

    1.208

    .228

    .061

    .050

    44

    .597

    .537

    .030

    . 050

    44

    .521

    .603

    .027

    .051

    45

    -.785

    .433

    -.039

    . 051

    45

    -.953

    .341

    -.048

    .050

    46

    -.747

    .455

    -.031

    . 050

    46

    .154

    .878

    .008

    .050

    47

    .943

    .346

    -.048

    . 051

    47

    -.808

    .419

    -.041

    .051

    48

    -.442

    .658

    -.023

    . 051

    48

    .390

    .698

    .020

    .052

    Total

    -.098

    .922

    -.052

    .524

    Total

    .744

    .457

    .392

    .526

    Note: Sig ? 0.05, n=392, SED= Standard Error of Difference

    The table 8 revealed that the values of mean difference between boys and girls and rural and urban are nearly same regarding all items. Moreover, t test results indicate no significant mean difference between the test scores of groups.

    Table 9. Comparison of Mean Scores of Students in Final Test Based on Gender and Locality

    Locality

    n

    Mean

    Gender

    n

    Mean

    Urban

    235

    27.340

    Male

    229

    27.476

    Rural

    157

    27.732

    Female

    163

    27.527

    Total

    392

    27.497

    Total

    392

    27.497

    Table 9 describes comparison of mean difference of scores of rural versus urban and boys versus girls in whole test. It explains no big difference between performance of groups. This also explores that test was equally favorable for students on the bases of gender and locality.

    Conclusion

    It was found that all items were in the range of middle difficult in the final test. Discrimination level of 5 items in pilot study and 6 items in final try-out was less than the criteria of the selection of items, so these eleven items were rejected. Thirty items of the test met the criteria of the selection of items. It was found that the mean score of final test was 27.497, its median was 28, the mode was 27, while the standard deviation of test was 5.11. It was found that the value of standard error of measurement was low and it indicated that the test has very low chance of error in results. The reliability of test was 0.93 for pilot try out but 0.62 for final try out data, that showed that test was reliable but needs more experimentation of testing on the samples. It was concluded that all test items had positive discrimination level. It was also concluded that the scores of students in final test were although not exactly but near to normal distribution.

    Recommendations

    The study recommended that 30 selected items became the basis for standardization of an achievement test of geometrical part of secondary level mathematics. These items may be used in further researches on other samples. The board of secondary examinations may select these items for the board examinations. 

References

  • Akhtar, H. (2015). Development of a semi standardized test of Economics for Class X1. M. Un Published M. Phil Thesis. Bahawalpur: The Islamia University of Bahawalpur
  • Akhter, N. & Bahoo, R. (2015). Development of a Semi standardized Test of Education for Intermediate level. Journal of Educational Research, 18(2), 5 to 19.
  • Clements, D.H., & Sarama, J. (2007). Effects of a preschool mathematics curriculum: Summative research on the building blocks project. Journal for Research in Mathematics Education. 38, 136-163.
  • Donald, S. (2013). The Unthinkable: Standardized testing and the Future of American Education (5th ed.). Columbus: Columbus Metropolitan Club
  • Gowdhaman, K. & Nachimuthu K. (2013). Item Analysis of History Achievement Test on Difference Index (DI) in the Criterion Referenced Measurement. India: Department of Education, Periyar University.
  • Hulin, C. L., Drasgow, F. & Parsons, C. K. (2013) Item response theory (5th Ed). Homewood, IL: Dow-Jones Irwin
  • Kelly, M. (2014). The Purposes of Tests. Retrieved 3 December, 2014, Available on

Cite this article

    APA : Akhter, N., Usmani, A. A., & Iqbal, S. (2019). Development and Validation of Multiple-Choice Test of the Geometry Part of Mathematics for Secondary Class. Global Social Sciences Review, IV(II), 203-210. https://doi.org/10.31703/gssr.2019(IV-II).27
    CHICAGO : Akhter, Nasreen, Ahmad Akhtar Usmani, and Sabiha Iqbal. 2019. "Development and Validation of Multiple-Choice Test of the Geometry Part of Mathematics for Secondary Class." Global Social Sciences Review, IV (II): 203-210 doi: 10.31703/gssr.2019(IV-II).27
    HARVARD : AKHTER, N., USMANI, A. A. & IQBAL, S. 2019. Development and Validation of Multiple-Choice Test of the Geometry Part of Mathematics for Secondary Class. Global Social Sciences Review, IV, 203-210.
    MHRA : Akhter, Nasreen, Ahmad Akhtar Usmani, and Sabiha Iqbal. 2019. "Development and Validation of Multiple-Choice Test of the Geometry Part of Mathematics for Secondary Class." Global Social Sciences Review, IV: 203-210
    MLA : Akhter, Nasreen, Ahmad Akhtar Usmani, and Sabiha Iqbal. "Development and Validation of Multiple-Choice Test of the Geometry Part of Mathematics for Secondary Class." Global Social Sciences Review, IV.II (2019): 203-210 Print.
    OXFORD : Akhter, Nasreen, Usmani, Ahmad Akhtar, and Iqbal, Sabiha (2019), "Development and Validation of Multiple-Choice Test of the Geometry Part of Mathematics for Secondary Class", Global Social Sciences Review, IV (II), 203-210
    TURABIAN : Akhter, Nasreen, Ahmad Akhtar Usmani, and Sabiha Iqbal. "Development and Validation of Multiple-Choice Test of the Geometry Part of Mathematics for Secondary Class." Global Social Sciences Review IV, no. II (2019): 203-210. https://doi.org/10.31703/gssr.2019(IV-II).27