Glottal stops and -um- in Tagalog

Anna E. Maclachlan   Macquarie University
Mark Donohue   University of Sydney

The misinterpretation of linguistic data often leads to theorising that is not justified by the language in question. We examine Tagalog phonology and the Optimality Theory (OT) analysis of the affix -um- 'actor voice', which is literally a textbook example of the principles of Generalized Alignment (eg. Prince and Smolensky 1993, Russell 1997). OT accounts draw a distinction between the behaviour of the affix, appearing as an infix on consonant-initial roots and as a prefix on vowel-initial roots, as in (i) and (ii):

(i) sulat + -um- = s-um-ulat 'write'
(ii) abot + -um- = um-abot 'attain'

However, we show that this OT distinction is spurious. It is well documented that ORTHOGRAPHICALLY vowel-initial roots like 'abot' always begin with a phonetic glottal stop. We argue that this glottal stop cannot be considered epenthetic, but rather it must be present underlyingly:

  1. an analysis lacking glottal stops (underlying or epenthetic), would incorrectly predict forms such as *[mag-alis] for [mag-?alis] 'remove' and *[um-abot] for [?um-abot] 'attain';
  2. word-initial epenthesis of glottal stops would incorrectly predict forms such as *[ma-upo?] for [ma-?upo?] 'sit';
  3. root-initial epenthesis of glottal stops would incorrectly predict forms such as *[um-?abot] for [?um-abot] 'attain';
  4. cyclical epenthesis (both word-initial and root-initial) would incorrectly predict forms such as *[?um-?abot] for [?um-abot] 'attain'; and finally,
  5. an analysis with underlying glottal stops in initial position CORRECTLY predicts all and only the attested forms.

This conclusion implies that ORTHOGRAPHIC 'abot' is /?abot/, and thus that the correct analysis of affixation of this root with -um- is not distinct from that of s-um-ulat:

(iii) ?abot + -um- = ?-um-abot 'attain' (compare with (i) & (ii))

We further argue that the ranking NOCODA >> ALIGN-um is not as relevant as it is claimed to be in the OT literature. Infixation in roots beginning with consonant clusters found only in loanwords (/Cr/ and /Cl/) has been taken as supporting this ranking:

(iv) gradwet + -um- = gr-um-adwet (~ g-um-radwet) 'graduate'

However, -um- infixation in roots beginning with native consonant clusters shows that candidates with codas are always chosen over viable candidates without codas, thus violating NOCODA, (v). We further show that ONSET, rather than NOCODA, is the active constraint in this Tagalog process.

(v) pyak + -um- = p-um-yak 'squawk' (*[pyumak])

In sum, a careful examination of Tagalog phonology shows that, contra the assumptions in the OT literature,

  1. no alternation between infixation and prefixation of -um- exists;
  2. the glottal stop is not epenthetic, but must be treated as underlying; and
  3. ONSET is the active constraint in determining the position of -um-.
Given that this -um- affixation example is central in discussions of Generalized Alignment and is so often cited, not only in textbooks, but also in recent research (eg. Crowhurst 1998), it is important that the details be corrected.

Preliminary Analysis of a Corpus of Descriptive Texts

Josef Meyer   Macquarie University

The analysis presented in this paper forms part of a study that aims to determine the feasibility of using shallow natural language processing techniques to create a `hybrid' knowledge base using overlapping sets of entries from multiple encyclopedia. By `hybrid' we mean that the knowledge representation contains something that combines information at different levels of linguistic realisation. Such knowledge bases have shown themselves to be useful in the area of natural language generation (Milosavljevic et al. 1996), and could also prove useful in applications such as intelligent search engines and text summarisation.

Our corpus consists of 2221 encyclopedia entries extracted from the sections of Encarta and Grolier's encyclopedia that deal with animals. The paper provides an overview of our approach, before proceeding to describe aspects of language use within our corpus that affect the difficulty of automatic knowledge base construction.

One problem that is particularly interesting involves the resolution of anaphoric and associative anaphoric (or `bridging') references. The case of direct, and in particular pronominal, anaphora has been extensively covered in the literature on natural language processing, and a number of systems have been implemented that provide reasonable solutions to this problem (see, for example, Lappin and Leass, 1994; Boguraev and Kennedy, 1996). The general case with associative relationships, as in examples such as "the bus ... the driver", is more complex; Vieira (1998) shows that in a corpus of Wall Street Journal entries even human annotators frequently have difficulty picking out associative relationships.

The results of some preliminary analysis on the use of both direct and associative reference in these texts indicate that at least for certain frequently occurring classes of associative relationship automatic resolution may be feasible. These include relationships involving body parts (The aardvark ... the tail), and type-subtype relationships (Aesculapian snakes ... the males).

A Fresh Look at Australian Languages and Individual-Identifying Features

Luisa Miceli   University of Western Australia

Johanna Nichols (1996) defines individual-identifying evidence as evidence diagnostic of a language family because "its probability of multiple occurrence among the world's languages is so low that for practical purposes it can be regarded as unique and individual" (p.48). Australian languages share a small number of features that can be said to be individual-identifying. Dixon (1980) believes that these features identify an Australian family, while the majority of Australianists believe that they identify a Pama-Nyungan family, comprising all Australian languages except those of the top end of Australia and the Kimberley region. The features consist of nominal and verbal inflection and pronouns including demonstratives and interrogatives. Reconstructions of these features, based on the various shared reflexes, can be found in Dixon (1980), Blake (1988), Alpher (1990) and Evans (1988).

This paper firstly assesses shared features and determines if all can indeed be considered individual-identifying. It is then argued that all individual-identifying features should be given equal importance; any one of the features identifies the family. Scholars who define a Pama-Nyungan family have usually given the pronoun paradigm most importance. The main part of the paper concentrates on languages of the Gulf of Carpentaria region, in particular languages of the Karrwan family/subgroup, and languages of the south eastern part of Australia, in particular Narinjari. The former have been tentatively excluded from the Pama-Nyungan family due to lack of pronoun forms believed to be identifying of Pama-Nyungan, while the latter has traditionally been included although it does not appear to share the same pronoun forms that were the basis for the tentative exclusion of Karrwan. Reasons given by scholars (eg Blake and Evans) for their inclusion/exclusion to the Pama-Nyungan family are briefly summarised. The position of these "borderline" languages is then reassessed on the basis of all individual-identifying features, all of which are given equal status. The issue of the extent of the large scale family that the features identify is also briefly discussed.

What does the 'element' in government phonology really represent?

Haruko Miyakoda   Tokyo University of Agriculture and Technology

The major goal of phonological theory is to specify where phonological processes take place, and to be able to account for what happens to the sound in question. The latter aspect clearly is closely related to how a speech sound is to be represented.

In government phonology (Kaye, Lowenstamm and Vergnaud 1985, 1990), a speech sound is assumed to be composed of components called 'elements'. Traditionally, sounds have been assumed to be composed of 'features', but in the government framework, elements are considered to be the templates by reference to which listeners decode auditory input and speakers orchestrate and monitor their articulation.

In this paper, we will first compare the privative 'element' approach with the traditional equipollent 'feature' approach. Although it has long been recognized that some historical processes follow preferred lenition trajectories, various attempts based on traditional features have not been successful in capturing these phenomena. According to Harris (1990, 1994), one major advantage that the element-based approach enjoys over the one based on features is that a theory based on elements can account for both reduction and sonority effects to be treated in a unified way--in terms of segmental complexity. In an element-based theory, lenition is assumed to involve the loss of elemental make-up. On the other hand, the opposite process, fortition, involves the addition of an element. Both processes are assumed to be triggered as a means of a 'repair' strategy; that is, they occur in order to adjust the elemental complexity of each melodic unit. The complexity is calculated by the number of elements within the internal structure of segments.

Although the calculation of elements allows processes such as lenition and fortition to be handled in an elegant way, there are examples that suggest the need to refine the way elements are gauged.

One aspect concerns the [L] element ('slackness of vocal cords'). Examples of the lenition process from Latin to Spanish show a stage where voiceless stops become voiced (e.g. paca[t]um > paga[d]o > pava[th]o(th as in this)). Since in elemental terms, [L] indicates 'voiced', this would mean that the element [H] ('fully voiceless') changes to [L], which does not decrease the number of elements. We will claim that this process can best be accounted for by assuming that the element [H] is 'pressured' to undergo deletion (based on the governing relations between segments).

Another aspect concerns the 'resonance' elements such as [I] and [U]. These elements are not regarded as 'place' elements, but are regarded as resonance elements in order to emphasize the status of the elements as 'cognitive categories'. However, in some cases of fortition, the strengthening of the segment in governing position is induced by the addition of an element, but at the same time, the elements such as [I] and [U] are transmitted from the governor to the governee, which is hardly the desired result when the elemental profile is considered. (e.g. /n/ + /f/ > /m/ + /p/, element [?] ('stop') is transmitted from /n/ to /f/, but element [U] is also transmitted from governor (/f/) to governee (/n/)). We will claim that the resonance elements should be regarded as 'place' elements and should not qualify in the calculation of complexity, since acquiring a place element does nothing to a segment to 'strengthen' it.

Grammatical and social aspects of evidentiality: the case of Japanese.

Ilana Mushin   University of Melbourne

Grammaticalized evidential systems have been described in a variety of languages (Chafe & Nichols 1986, Willett 1988), but the factors motivating the actual distribution of evidential forms in discourse are still poorly understood.

This paper presents a discourse based analysis of evidential forms in Japanese as a means of uncovering some of the motivations for evidential use in particular discourse contexts.

Japanese evidentials comprise a heterogeneous set of forms that includes nominals (soo da - ‘HEARSAY’, particles (tte - ‘HEARSAY’), and adjectives (rashii - ‘seems’) (Aoki 1986). None of these forms is fully grammaticalized and Japanese sentences may lack evidential marking altogether (c.f. Quechua (Weber 1989, Floyd 1993) where each sentence requires an evidential clitic, even if the evidential status of the information is already known). Since they are not fully grammaticalized, it is hypothesized that Japanese evidential forms will only occur when they make some actual contribution to the message.

This study uses a corpus of Japanese oral narratives that are retellings of other peoples personal experience to investigate the discourse status of reportive evidential forms. Contrary to initial expectations, the high density of reportive forms ( tte, tte yuu ‘(s/he) said’, rashii and soo da) in the corpus was comparable with the density of reportive forms in narrative retelling for languages with fully grammaticalized systems. Reportive forms consistently occurred even when such evidential coding was pragmatically redundant.

I argue that cultural factors motivate the high density of evidential marking. Parallels with “information territory” phenomena in Japanese (including sentence final ne and -n da) will illustrate how information as a “property” domain is a strong cultural feature of Japanese interactional discourse (Kamio 1979, 1998, Cook 1990, Iwasaki 1993). This factor motivated the storytellers in retelling to repeatedly mark their discourse as originating in another speaker’s storytelling, even though it was not required to do so by the grammar.

These results not only provide a detailed account of evidential distribution in Japanese discourse, revealing more about the discourse functions of evidentiality, they also highlight the balancing act that plays between grammar and pragmatics in the choice of linguistic forms.

Thematic role hierarchies and role engagement

Tom Mylne   Griffith University   

It has become common, even necessary in some theories, to postulate an ordered set of named semantic roles: the "thematic role hierarchy", usually presented (albeit implicitly) as reflecting some kind of universal extralinguistic knowledge about participant roles and their relationships. It is commonly assumed that the thematic role hierarchy determines the hierarchical order of the (syntactic) arguments which denote the roles (e.g. Larson, 1990:597), and some authors postulate a relationship between a semantic role's "prominence" on its hierarchy and the corresponding argument's grammatical status (Grimshaw, 1992:33ff.).

Thematic role theory has (at least) two major limitations: (i) lack of agreement as to which thematic roles exist, and (ii) the lack of any effective way to independently justify the assignment of noun phrases to thematic roles in particular sentences (Dowty, 1989:70). A third limitation is the lack of agreement as to the order of the roles in the hierarchy.

I argue that the ordered set of role names can be replaced by a simple measure of the "engagement" of each role-player in a situation under review. In principle, role engagement is a one-dimensional gradable quantity, varying from initiation (highest level) to mere presence in the situation (lowest level of engagement). Four zones on the scale can be defined in terms of two features which are commonly found, as terms, if not as features, in the literature: [Control] and [Affected]. The combination [+Control, -Affected] corresponds to initiation, [+Control, +Affected] to active response, [-Control, +Affected] to passive response, and [-Control, -Affected] to presence. A third feature, tentatively termed [Active], discriminates between [-Affected] roles associated with certain verbs. The only names which need be given to the roles are those particular to the situation ("massager" and "massagee" in a massage situation, for example).

A number of English argument structure alternations, and constraints on these alternations, are explained by the feature system proposed. For example "We sent Tom /*London the parcel" reflects the fact that Tom, but not London, can bear the feature [+Control]. "Tom pierced /*cut the knife through the cloth" reflects the fact that "cut", but not "pierce", requires a [+Affected] object. The two [-Control, -Affected] arguments in "They loaded the hay onto the wagon" are discriminated by the feature [Active], whereas the "same" two arguments in "They loaded the wagon with hay" are discriminated by [Affected], as the wagon here is [+Affected]. The feature [Active] also distinguishes the two [+Control] agents in a causative situation.

The role engagement scale does away with the major problem of the thematic role hierarchy: the fact that it depends on assigning names to roles, and it may well be that this simple scale is adequate for the purposes to which the thematic role hierarchy is most commonly put.

Grammaticalization of Variation

Anthony J. Naro   Universidade Federal do Rio de Janeiro   

Bits of linguistic structure can be considered 'functional' to the extent that they facilitate the interpretation of meaning on the grammatical and discourse levels. Both functional and non-functional variation exist in large scale, in the sense that in some cases the variant favored in a certain environment favors the correct interpretation, while in others it does not.

I first discuss a typically non-functional phenomenon in spoken Brazilian Portuguese, subject/verb concord, showing that the use of explicit morphological verbal and nominal agreement markers in no way contributes to communicate meaning. An example is the marked verb in:

Eles falam que VAI expulsar os outros do morro
'they say that they are (or he is) going to throw other people out of the hill-side slum'

where the absence of plural marking makes it unclear whether the correct subject is 'he' or 'they'.

I then discuss a typically functional phenomenon, the use of realized versus zero subject pronouns, showing that such pronouns tend to be used more frequently in environments in which their absence would lead to ambiguity in interpretation on the discourse level. An example is:

Mamãe me disse que VOCÊ está com alergia
'Mom told me that you have an allergy'

where omission of the marked subject would lead to ambiguity between the correct interpretation 'you' and other structural possibilities such as 'he' or 'she'.

Non-functional variation is a characteristic of phenomena that are in an advanced stage of diachronic evolution where regularization and phonological attrition have set in, near the end of the functional cycle (Sankoff 1980, Givón 1995). In this case we are dealing with purely mechanical variation, with no functional effect. Functional variation can be found near the beginning of the cycle.

Given this general picture it is natural to ask if all variation is fated to become non-functional in the long run. I argue that a second path is possible, namely grammaticalization of variation. In this case, the use of the variants is regularized: each variant is used nearly categorically in given environment. I discuss two examples of this sort of evolution in Brazilian Portuguese: 1) the use of 1st pl. -mos and 3rd sing. zero verb desinences with semantically 1st pl. verbs, which is now controlled to a great extent by tense, and 2) the use of subject/verb and verb/subject order, now largely associated with the foreground/background distinction.

A proposal for the Australian Linguistic eXchange (ALX)

David Nathan   AIATSIS   
Peter Austin   University of Melbourne

Australianist linguistics has a strong history of using computerised tools. Despite this, recent developments, especially those associated with the www, in computerised representation have provided new contexts for collaboration and rising expectations of the tools. Surprisingly, the underlying technologies are not only relatively simple, but also highly applicable to linguistic application. In this paper, we will show how encoding schemas such as XML (a text markup system derived from Standard Generalised Markup Language, and related to HTML) can be the key to making data and media portable and explicit, and allow researchers to match their data encoding with descriptions of linguistic software.

We propose a process for developing public agreement about formats and processes for dictionaries, texts, conversation, sound, and other linguistic data. The project is called ALX - the Australian Linguistic eXchange. As a first contribution, we will illustrate the approach using video data annotated to allow dictionary or other interfaces to be aware of the presence of video content.

Afrocentrist linguistics

Mark Newbrook   Monash University   

'Afrocentrism' is an intellectual tradition which involves the reassessment of African history and culture in terms of the experiences and traditions of African people. It covers various disciplines and embraces a range of approaches and viewpoints, some of which are very different indeed from those espoused by mainstream scholarship. Afrocentrism has for some time been a force in public and academic life in the USA, where its practitioners have sought to align it with ultra-postmodernist cultural relativism - and with the ongoing struggle for the rights and the cultural dignity of African-Americans. Because of this latter, opposition to its more extreme manifestations has been somewhat muted at times, even in academic circles. This is unfortunate, because many of the claims of Afrocentrists are inadequately supported by evidence or argumentation, while some are apparently nonsensical; the cause of African- Americans can hardly be furthered by such proposals. One such area involves re-interpretations of the history and diffusion of African languages. Claims include: Ancient Egyptian (whose speakers are identified as black Africans) was the ancestor of all African languages; Egyptian and other African languages and their speakers had enormous unacknowledged influence on other Old-World cultures, including those of China, Greece and India; West Africans speaking Mandingo crossed the Atlantic long ago, settled Central America and helped in the formation of the Olmec culture; the Ethiopic writing system is not merely a well-devised means of representing certain languages but has vast philosophical and indeed mystical significance (etc). Similarly exaggerated or inaccurate claims are being advanced on behalf of other hitherto oppressed peoples (not only in the Americas) and the intellectual issue here is thus one of wider significance. I will provide a critical survey of Afrocentrist linguistics, focusing on the claims of Asante, Bekerie, Bernal, Diop, Van Sertima and Winters.

Semantics of verbal classification: Key elements for the study of verbal classifiers in Bardi (Western Australia).

Edith Nicolas   University of Melbourne

The aim of this paper is to set out the main semantic parameters involved in the verbal classification of Bardi, an Aboriginal language from North Western Australia.

Verbal classification is a phenomenon that has mainly been recorded in Australian Aboriginal languages, and within this group, mostly in the so called Non-Pama-Nyungan group located in the North and North Western part of the country (Silverstein 1986; McGregor 1990, in preparation; Nicolas 1998). Verbal classification involves the classification of a process, according to parameters such as the type or directionality of the movement, the existence of a goal (or its absence), the way the goal is reached, etc. It does not classify an argument of the clause as is the case when nominal classifiers are placed on verbs.

The following examples illustrate the difference between verbal classifiers (as found in Australian languages) and verb classifiers (a type of nominal classifier):

Verbal classifier (Bardi, Western Australia; Aklif forthcoming)
(1a)boorm in-nya-gal aarli  (1b)joony an-nyamarroo jinaoola
He gutted the fish Suck the juice from the flower!
Verb classifier (North Dakota Caddo in Mithun 1984: 865)
She is stringing beadsPlums are growing

In (2) the classifier placed on the verb is used to classify the noun (argument) of the clause. In these examples, the verb classifier is used according to shape: both beads and plums share with "eye" the characteristic "(small) round shape". The verbal classifier in (1) is also placed on the verb but it classifies the process itself, irrelevant of the nouns in the clause. Here, it is the notion of extraction that is common to the two verbs that accounts for the use of the classifier.

In this paper I will suggest a framework to analyze the meaning and the use of the verbal classifiers in Bardi. I will first introduce the verbal system of the language and present the components relevant in the study of classification, namely the preverb and the root/classifier. I will then argue that the semantics of classification is primarily based on an action schema, that is the quintessential "activity" common to all the verbs using a given classifier. Defining this schema does not yield a lexical meaning, but a classifying feature, which is the semantic content of the classifier.

Direction, movement and contact will be distinguished as the essential notions relevant in the classification of processes, the three being derived from the action schema of the verb. Depending on the productivity of the classifier, that is the number of preverbs with which it can appear, these parameters will appear more or less clearly in the resulting complex verb.

Classification works as a system, and I will try to define the classifiers both in terms of themselves and as part of the network they constitute. The possibility given by multi-classification, i.e. the association of a preverb with various classifiers, will enable me to refine and confirm my analysis of the classifiers as well as show the creative tendencies of the system.

Set marking tags and stuff

Catrin Norrby   University of Melbourne   
Joanne Winter   Monash University

Our paper examines set marking tags (Dines 1980) in the discourse practices of adolescents. The commonalities of adolescent experiences, discourse performances and life stage is captured in labels like teenage talk, Jugendsprache and ungdomsspråk (lit. 'youth language'). Conversely, difference has been articulated to account for constructions and representations of the 'other' with adolescents perceived as highly emotional, expressive and dramatic. Sociolinguistic studies have viewed adolescents in various roles including 'innovators', 'levellers', as well as 'conformers' to peer group membership norms. The discourse practices under investigation are drawn from Swedish speaking urban adolescents in the GSM Project(Gymnasisters språk- och musikvärldar, 'Language and music worlds of high school students')and from Australian English speaking urban adolescents in the Monash Department of Linguistics Dimensions of Spoken Australian English project.

Set marking tags (SMTs) have been ch