GSSR - Global Social Sciences Review

IDENTIFYING FEATURES OF PAKISTANI LEARNERS WRITING THROUGH MDA AND COHMETRIX

http://dx.doi.org/10.31703/gssr.2021(VI-III).13 10.31703/gssr.2021(VI-III).13
Authored by : Rabia Tabassum , Mahwish Farooq , Muhammad Asim Mahmood

The Influence of Environmental Factors and Access to Financial Capital on the Link Between Entrepreneurial Orientation and SMEs Performance: A Case from Pakistan

Muhammad Fayaz et al. (Mar 2020)

This paper is an attempt to explore the buffering role of environmental factors and access to financial capital on entrepreneurial orientation (EO) and SMEs performance relationship in Pakistan. Th...

Role, Status and Perception of Female Education Regarding Socio-Economic Development

Waqar Un Nisa Faizi et al. (Mar 2020)

This research work is to explore the role, status and perception of female education with respect to a budgetary change in District Peshawar. Education changes people leadership and makes them live...

Relative Importance of Project key Success Factors in Country Specific Context

Rao Aamir Khan et al. (Dec 2019)

This research, based on literature review, identifies important project success factors that are clustered under various categories. All project success factors are evaluated by project managers. I...

Personality of Students: The Role of Pakistani Teachers

Syed Afzal Shah et al. (Dec 2019)

This research investigated the impact of four features of teachers behavior viz. teachers support, facilities provision, security in classroom, and motivation of teachers on five aspects namely agr...

Oil Price Fluctuations and Volatility of Firm Risk

Asif Rahman et al. (Jun 2019)

Prior literature reports that macro-economic factors of a country affect stock exchange performance and thus firm performance. Recent strands of literature and the fluctuations in currency have a s...

Most Viewed Articles

Analysis of MPhil/PhD Supervisor's Relationship Development and Communication Competence

Muhammad Sher Baz Ali et al. (Mar 2021)

The purpose of this research was to analyse the relationship development and communication competence of MPhil/PhD supervisors. A confirmatory mixed-methods research design was used to conduct the ...

Shah Waliullah and his Concept of Welfare State: An Analysis

Zahir Shah et al. (Mar 2018)

Muslim scholars have contributed enormously to the concept of welfare State. Among them Shah Waliullah is the rising sun who was born in South Asian sub-continent in the 19th century. He was holdin...

Political Islam in Perspective

Zahid Anwer et al. (Jun 2016)

The paper critically evaluates the notion that political Islam is a threat to world peace. The argument is developed in the light of Islamic history and fundamentals of Islam and the research is ba...

Causes of Teacher's Favoritism and Its Effects on the University Students: A Case Study

Amjid Ali et al. (Jun 2018)

This study aims at exploring and examining the causes of teacher's favoritism and its effects on the university students. Quantitative tools are applied to collect data for the study and to check i...

Effect of Self-Regulated Learning Strategies on Eighth Grade Students' Motivation for Learning English

Umm e Habiba et al. (Mar 2020)

This was proposed to investigate the effect of self-regulated learning strategies on students’ motivation for learning English. This was an experimental research. Intact group pretest and pos...

13 Pages : 119â€’127

Abstract

Learner language has been a source of interest for researchers of all times as it possesses common features of language in use. For investigating this, Multi-dimensional analysis (MDA) by Biber is one such approach that empirically studies practiced language and establishes grounds for those varieties too which are striving for their place in linguistic cline (Crossely, et al., 2014). The present research is an effort to explore common patterns of learner language, which are explored through Coh-Metrix (an online data tagging tool used to assess cohesion, coherence, readability level, etc.) to study those features and their respective functions while partially using MDA methodology. Following Biber's methodology, Factor analysis was conducted, and four dimensions were identified, which provided clues for the functional association of these dimensions. The results show that Pakistani learners' argumentative writing possesses narrative features and is dominatingly overlapping at the level of vocabulary, syntactic constructions, and passage development, and even in argumentation. These findings help us to establish the fact that Pakistani English has its own identity. These results are helpful for linguists as well as teachers as the knowledge of common linguistic and syntactic structures can be assessed easily while keeping in mind the grade level of the students.

Key Words

Coh-Metrix, Factor Analysis, Multidimensional Analysis, ICLE, Corpus Linguistics

Introduction

Researches on Pakistani English addressed issues of the Pakistani English language at two levels. One, such studies focused on individual linguistic features, and these features were studied descriptively. These studies helped to establish the grounds for Pakistani English as an independent variety like Baumgartner (1996) studied the unique style of Pakistani English by reporting its lexical and grammatical attributes. Talaat (1988) compared Pakistani English with British English and explored the lexical variety of Pakistani English. Mehmood (2009) studied the syntactic and phonological features of Pakistani English. But currently, this trend has transformed into a more empirical and objective approach known as a corpus-based approach.

Corpus-based studies, largely known as empirical studies, have been adopted as a method to read linguistic features of Pakistani English. These corpus-based studies are helping the linguists in knowing the features of various verities, comparing their features, and bringing the points of unity and diversity in native and non-native varieties. ICLE corpus being center for the study of learners writings has gained much importance in this regard 'we can use the ICLE corpora to know about the linguistic features of learners writing, bringing out quantitative statistics about word frequency in the use of words, word categories, syntactic structures and discourse features (Granger, 1998).

Literature Review

A number of studies were conducted on exploring the linguistic characteristics of texts or registers. These studies mainly intended to focus on linguistic specifications, which further added the factor solution (statistical approach to make bundles of meaningfully similar features) to be studied. The factorial study leads towards the investigation of the communicative function of every text. Therefore, the study of linguistic features in a large number of data, register, discourse, etc., with the help of various data tagging tools, remains central in this regard (in the case of the present study tool is coh-metrix).

There are a number of studies conducted through CM, and these studies validates the utility of CM in various aspects. The notable studies range from cohesion and LSA indices (McNamara et al., 2010) to lexical diversity indices and L2 index (McCarthy & Jarvis, 2010, Crossley, Salsbury, &McNamara, 2009). Cm has also helped the researchers to establish evidences regarding text levels, their patterns, comprehension grade, and texts suitable to L2 learners for learning the language. For instance, McCarthy and Lewiet (2006) were of the view that CM works as an effective tool in knowing and establishing authorship even when the author himself/herself hides underwriting shifts in his/her writing styles. In the study of psychology articles, McCarthy (2007) has used CM-based indices of LSA so that he may show the structural cohesion in the themes of these articles. Similarly, Duran (2006) used CM to study temporal cohesion in the texts of history, narratives, and science. The purpose behind this was to study the textual domains of these genre types. The studies (McNamara, Ozuru, Graesser, & Louwerse, 2006) related to measuring the coherence levels in the writings of learners were also conducted, which established the level from highly coherent to low coherent texts, and Dufty (2006) used CM to assess human ratings across grade levels and how it differs due to certain socio and psychological reasons. Similarly, Duran & McNamara (2006), McCarthy (2006) conducted research where they assessed the structural organization of various high school published textbooks and recommended the grade levels according to comprehension level. Lightman (2007) studied the variation in formal/ informal and written/spoken texts and found the differences and similarities among these genres. Dempsey, McCarthy, & McNamara (2007) and Louwerse (2004) also studied gender differences manifested across various texts. Crossley and Louwerse (2007), Crossley McCarthy, & McNamara (2007) conducted research to find out authentic and modified texts which are necessary for the learners of the second language. All these studies rationalize the utility of CM in investigating the characteristics of texts and how l2 learners need to pre-decide their reading material.

Coh-Metrix studies are not only concerned with language in use but also in the features and functions of practiced language that are addressed and observed. The features and functions of language were analyzed through CM by a group of researchers who collected more than 1500 essays from students and analyzed them by using the methodology of MD by Biber (1988). Their study mainly aimed to examine functional parameters that are revealed through co-occurring linguistic features in learners' corpus. The essays were grouped together at the criteria of shared features with shared functions. On this criteria, four dimensions were identified, namely essays' prompt, quality and grade level. The results showed the functional parameters that affect the writing of the learners. This research also adds validity to the MDA methodology.

The present study will combine the approaches of MD and Coh-metrix and discuss dimensions extracted from these two approaches. The earlier studies (Hussain, 2015 & Abdulaziz 2017) done on learner corpus were limited to exploring dimensions through MDA, but the present study will employ both MD and Coh-metrix dimensions to see not only co-occurring linguistic features of MD but also indices of coh-metrix like cohesion, readability, text easability and quality of the text. The identification of these features is necessary as Crossely (2014) observes that coh-metrix indices known as functional parameters help in implications for writing theory, writing assessment, and writing pedagogy.

Research Methodology

The present research has used the indices that are collected one by one for 308 essays from the online coh-metrix data tagging tool, and results were saved in an excel file. For further filtration of data, the procedure of Biber's methodology was adopted. Following Biber (1988) methodology (1988), the obtained indices were first normalized and then standardized, and factor analysis was conducted, leaving the indices which have low or close to 0 weight. Principal component analysis (PCA) using Promax rotation was applied. PCA is used when the underlying structure is undefined, and thus PCA reduces the variables into meaningful sets. It allows a large number of indices to be reduced into small meaningful sets of variables, i.e., factors or dimensions. These dimensions were then interpreted based on writing parameters through a qualitative analysis of each dimension. For the meaningful inclusion of values, the cutoff point ?.35 eigenvalues were considered. Thus set of features combined through factor scorings are interpreted qualitatively, giving a functional interpretation to the text. The loadings of indices are helpful in the sense that an index is only included in a factor if that shows higher loading in one and is excluded in the other due to low loadings. For example, if a feature (here index) is higher in values on factor 1 than on factor 2, then that feature will only be included in factor 1. Following further Biber's methodology, the factor scores of each feature were calculated and by subtracting the mean of the standardized scores of the negative features from the positive scores of the indices. Till this process, coh-metrix helps to generate results. Now the next step is to statistically analyze data following Biber's methodology. To select the prominent factors, Scree plot was used, which is as under:

Figure

The study of the scree plot shows that four factors are prominent. The features retained through this estimate the possible groups of linguistic features that co-occur and mostly share a communicative function. According to Scree plot, four factors are important, but here some other cut points are also suggested by Biber that is to include only those features that have at least.± 35 value and secondly, only that factor that is consists of more than four features less to these is considered non-significant. Keeping this point in view, only three factors are worth to be interpreted. Interpretation of the factors is as under:

Results

Factor 1. Based On Coh-Metrix Indices

Factor 1.
Positive Component	Values	Negative Component	Values
PCREFz	.953	SYNMEDlem	-.734
PCREFp	.943	SYNMEDWRD	-.730
CRFCWO1	.850	LIMITED	-.698
CRFCWOa	.820	LDTTRc	-.687
CRFNO1	.743	LDVOCD	-.650
CRFAO1	.740	LDTTRA	-637
LASAGNA	.698	SYNMEDpos	-.517
CRFSO1	.692
CRFAOa	.674
QUINOA	.669
LSASS1	.651
CRFSOA	.641
CRFCWOad	.637
LSASSP	.550

Factor 1 is the most powerful dimension containing 15 positive and eight negative features. In positive features PCREF2 and PCREFp are of the highest loadings .922 and .912 respectively as compared to others. Both these indices are from the bank of text easability indices. These features show a tendency for higher referential cohesion in the text. Text of higher cohesion has a dominant trend of overlap among words, ideas, and sentences scattered in the whole text. To elaborate this McNamara et al. (2014) say: 'A text with higher referential cohesion contains words and ideas that overlap across sentences and the entire text forming explicit threads that connect the text for the reader' (p, 85). It's a common observation that low cohesion in the texts brings comprehension difficulty because of the few connectives or textual connections that put ideas together for the readers. The overlapping in the content word, content word overlap in all sentences, argument overlap, stem overlap symbolized in CRFCWO,CRFCOA CRFSO1, and CRFAO1 are other features that have positive highest loadings in factor 1. Indices of LSA are also prominent in features of factor 1 with positive loadings. Indices LSAGN, LSASS,LSA1, and LSAssp are important in this regard. These indices also show overlapping in sentences, passages, and content words. The information which is shared through these indices is both at a given or new information level. The learners are using the same syntactic and linguistic features, which show overlapping at the level of LSA too.

A negative pole features of lexical diversity are prominent. Lexical diversity refers to the unique words that a text possesses in relation to a total number of words. TTR is counted for all the content and all other words. When analyses of data show that a number of word types are equal to the total number of words, it means the words of a text are different and lexical diversity is at its peak. Such texts are either short or have low cohesion if, in contrast, the lexical diversity is low in cohesion. In contrast, if lexical diversity is low with higher cohesion, it means words are repeated by the user across the text. Therefore, a high number of words need to be used and repeated multiple times so that cohesion may be retained. TTR is influenced by text length as the number of words increases in length, and they provide those words more space to be reused and make words less unique. MLTD and VOcd measures use estimation algorithms. As far as the SYNMED pos is considered, both nearby sentences are similar though the second sentence is odd, pos do not concern it. Whereas SYNMEDwrd and SYNMEDlem consider the different positions of the word and will focus on the point that they have the same syntax but different words so different meanings too. So all these indices consider different aspects of sentences. All these features show that Pakistani learner writing is cohesive as overlapping at the level of content words, sentences and argument is seen as a dominant feature, yet there is scarce of new and unique words. Texts are highly informational, and there is intersecting of similar sentences and arguments scattered in the whole passages. Thus the right label for this dimension is 'overlapping informative features vs. simple structure.' An example of this can be taken from the data and are quoted below:

ICLE-PA-GF-004.1>

Often marriages are settled between two people who have different financial backgrounds. After their marriages, one who is poor or less rich is subjected to the agony of taunting. Especially if a woman is poor, then she had to suffer for the whole of her married life and had to obey her husband and serve him like a servant, or otherwise, she had to be ready for the consequences such as divorce. It was also come to line light from the research that men and their family hope that his wife will bring a large amount of dowry, and if she does not bring it, then she is divorced. Usually, young girls who belong to a rich are royal family are unable to live a married life in serenity due to their lack of flexibility; they do not compromise on any matter, thus resulting in a breakdown of a loving bond of marriages.

The example shows the overlapping of content words which is a common feature of learner writing.

The other example is:

ICLE-PA-GF-0080.1>

Terrorists and Mujahideen are two different categories of people. Terrorists are those who create terror and dread through their destructive activities. While Mujahideen are called Islamic soldiers. They fight for Allah against injustice or for Islam when anti-Muslim powers dominate Muslims and forbid them to lead their lives according to Islam. They don't fight for personal or worldly benefits, while terrorists fight for worldly benefits or purposes. Islam is the religion of peace. It stresses brotherhood, sacrifice, and welfare. It forbids to frighten someone. The killing of someone is inhuman.

Factor 2. Based on Coh-Metrix Indices

Factor 2
Positive Component	Values	Negative Component	Values
PCNARz	.901	DESWLsy	-.828
PCNARp	.900	DESWLlt	-.810
RDF	.802	DESWLltd	-.759
WRDPRO	.718	DESWLsyd	-.751
WRDFRQc	.712	WRDAOAc	-.634
RDL2	.635	SIENNA	-.571
WRDFAMc	.635	WRDNOUN	-.565
WRDFRQmc	.556	WRDADJ	-.493
GREG	.538
WRDPRP3s	.419

In Factor 2, prominent features with positive loadings are related to psychological rating bank indices. These indices give information regarding age, familiarity, imageability, concreteness, etc. For getting additional information, CM uses two databases for words interpretation. First is MRC psycholinguistic database, which provides several words with several psycho dimensions. For example, the use of acquisition measures calculates the specific period in which a word first time enters into a child's vocabulary, and another scale measures an adult's content word vocabulary with a scale of 1 -7 points. Results with higher scores represent easier processing. Ratings on the scale 1-7 were subsequently multiplied by 100 and rounded to the nearest integer. So, as to be able to present all the ratings as integers on a scale from 100 to 700. Other measures like familiarity, concreteness, imageability were attained from merging Paivio, Yuille, and Madigan (1968) norms. The second source is WordNet (Fellbaum 1998 Miller 1990) from which C.M estimates polysemy and hypernyms.

The dominant feature of PCNARz related to narrativity, where description is more near to narrativity. Text is like telling a story, sharing events, information about places, characters, and things. It is like a conversation about everyday oral conversation, and vocabulary is highly familiar, showing world knowledge.

On the negative pole, features of the descriptive index are prominent. These indices thus provide a detailed description of the text, its nature, and its complexity level. For instance, for the calculation of length, the paragraphs and sentences which have extended length may indicate more words and complex syntax, which means such sentences are difficult to process. Similarly, a large standard deviation of the mean of sentences indicates that the text has a large variation in respect of the length of sentences in which some are very long, and some are too short, which the author is deliberately doing to present utterances of characters and scenes description respectively. (McNamara, 2014)

Thus the right label for this dimension is 'narrative vs descriptive concerns.'

<ICLE-PA-VL-0001.1>

The turning point of American policy was 9/11. When 2000 Americans were killed in this attack, Americans claimed that Afghan was responsible for that and started a war against Afghanistan. Millions were killed in this war. The prisoners of war were killed in this and treated inhumanly and sent to Abu Garib Jail. Americans put them in cagescage-like animals and torture them by letting dogs lie on them.

They were not given food and other basic

needs. This true violation of the U.N Charter. Which was recognized at the Geneva conference. The media explored the cruelties of America in this regard. The American diplomacy to fight against terrorist was exposed that how America falsely got the support of the world. But in reality, the USA deceived the whole world for killing the innocent people.

The other example is showing the description of place.

<ICLE-PA-AO-0011.1>

Europe is one of the world's seventh continents, Europe is generally divided from Asia to its east by the water, divided by the Ural Mountains, the Ural River, the Caspian Sea, the Caucasus region, and the Black Sea to the southeast. Europe is bordered by the Arctic Ocean and other bodies of water to the north, the Atlantic Ocean to the west, the Mediterranean sea to the south, and the black sea to the southeast.

Factor 3. Based on Coh-Metrix Indices

Factor 3.
Positive Component	Values	Negative Component	Values
PCSYNz	.914	DESS	-.934
PCSYNp	.905	RDFKGL	-.821
SYNSTRUTt	.833	dressed	-.767
SYNSTRUTa	.804	SYNLE	-.620
SMCAUSv	.665
DEC	.647
SMINTEp	.568
CRFCWO1	.513
SMCAUSvp	.490

Factor three is based on eight features, all showing positive results with no negative loadings. The features with positive loadings belong to the indices of text easability measures. Indices of syntactic simplicity like PCSYNz and PCSYNp clues the syntactic structure, which is simple, showing fewer words, simple and familiar syntactic patterns. Such structures are easy to process. The other features are PCCNCp & PCCNCz belong to word concreteness giving information about content words which are more common, concrete, and non-abstract. Such vocabulary generates mental images and is less ambiguous and more meaningful. Such information is easy to process and comprehend. Therefore, the right label for this dimension is 'concrete factual information.

<ICLE-PA-AO-0024.1>

The modern age is not hunting animals for personal hunt and pleasure. In ancient times Kings and their companions used to hunt animals not for their larder but for their personal pleasure. It was very cruel and inhuman treatment toward the beauty of nature. They left many animals to rote often hunt.

<ICLE-PA-AO-0015.1>

If we take the example of domestic donkey, it is treated by human too much worse, a ton of weight is put on the back of this innocent creature, and the master take much more work from him., which is much more from his capacity and tendency.

In the circus, many animals are badly treated by humans, they are change their habitat, location and snatch their native and natural surroundings, and here they badly treated by a human.

<ICLE-PA-VL-0004.1>

At 3.79 million sq. miles and with over 309 million people, the America is the third-largest country by total area and population. America is the world largest economy with a GDP of $ 14.3 trillion- with the literacy rate of 99% America has one of the finest systems of education. In the field of sports, America has achieved many landmarks, and they have the highest number of medals any country won in the Olympics.

Although the global slow down in economic growth, America is still the highest funding nation in the world. America has one of the largest Army in the world, still operating in different parts of the world, and have taken part in world war 1 and 2. Still, due to many set backs, America is one of the strongest countries in the world.

Conclusion and Findings

Pakistani learners’ argumentative writing is

largely focused on sharing information. Even in putting arguments, the writers do not try to take a clear stance. Instead, they use indirect style. In argumentative essays, students need special training while dealing with argumentative topics as these are more cognitive, complex, and interactive. Students perform well in narrative and descriptive essays as compared to argumentative topics. Text is like telling a story, sharing events, information about places, characters, and things. It is like conversation about every day oral conversation, and vocabulary is highly familiar, showing world knowledge. Learners writing is cohesive as overlapping at the level of content words, sentences, and argument is seen as a dominant feature, yet there is scarce of new and unique words. They are using the same syntactic and linguistic features, which show massive overlapping in the text. They write highly cohesive text but lack variety of expression. Instead of putting arranged and well-designed arguments, they prefer to share information with interactive features.

Note: This paper is part of researcher’s Ph.D. dissertation.

References

Baumgardner, R. J. (1987). Utilizing Pakistani newspaper English to teach grammar. World Englishes, 6(3): pp. 241-252.
Baumgardner, R. J. (Ed.). (1996). South Asian English: Structure, use, and users. Urbana: University of Illinois Press.
Biber, D. & Finegan, E. (1994). Multi- dimensional analysis of authors' style: some case studies from eighteenth century. In D. Ross, D. Brink (Eds.). Research in humanities computing, III: pp. 3-17.
Biber, D. (2004b). Modal use across registers and time. In Anne Curzan and Kimberly Emmons (eds.), Studies in the history of the English language II: Unfolding conversations. Berlin: Mouton de Gruyter. pp. 189-216.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.
Biber, D. (1995). 'On the role of computational, statistical, and interpretive techniques in multi-dimensional analysis of register variation. Text 15/3: pp. 314-370.
Biber, D. (1995). Dimensions of Register Variation: A cross-linguistic comparison. Cambridge University Press.
Biber, D. (2004a). Conversation text types: A multi-dimensional analysis. In GÃ©rald Purnelle, CÃ©drick Fairon, and Anne Dister (eds.), Le poids des mots: Proceedings of the 7th International Conference on the Statistical Analysis of Textual Data. Louvain: Presses universitaires de Louvain. Pp. 15-34
Biber, D., Connor, U. & Upton, T. A. (2007). Discourse on the Move: Using Corpus Analysis to Describe Discourse Structure. Amsterdam: John Benjamins
Biber, D., Conrad, S. & Rappen, R. (2002). Speaking and writing in the university: A multi-dimensional comparison. TESOL quarterly. 36(1). Pp. 9-48
Crossley. S. Salsbury, T. Titak, A. McNamara, D. (2014). Frequency effects and second language lexical acquisition: Word types, word tokens, and word production. International Journal of Corpus Linguistics,9(3), 301-332
Crowhurst. (1990). How many millions? The statistics of English today. English Today 1(1), 7-9.
Friginal, E. (2012). The Discourse of Outsourced Call Centres: A Corpus-Based, Multi- Dimensional Analysis.
Geisler, C. (2002). Investigating register variation in nineteenth-century English: A multi- dimensional comparison. In R. Rappen, S. M. Fitzmaurice &D. Biber (Eds.), Using corpora to explore linguistic variation. Amsterdam: john benjamins. pp.249-271.
Graesser, A. C., & D'Mello, S. K. (2012). Moment- to-moment emotions during reading. Reading Teacher, 66, 238-242.

APA

APA :

Global Social Sciences Review, VI(III)

https://doi.org/10.31703/gssr.2021(VI-III).13

IDENTIFYING FEATURES OF PAKISTANI LEARNERS WRITING THROUGH MDA AND COHMETRIX

Related Articles

Most Viewed Articles

Abstract

Key Words

Introduction

Literature Review

Research Methodology

Figure

Results

Conclusion and Findings

References

Cite this article

APA

CHICAGO

HARVARD

MHRA

MLA

OXFORD

TURABIAN

Sections

References

Figures