A case study with logistic regression
Chiara Paolini
…and how people how studied it before
Try to look for:
The pivotal study on the phenomenon you want to study
A couple of qualitative and quantitative investigations to get a sense of how researchers addressed it before you and what you can integrate to their research: a new approach, a new perspective, new data with a previous approach etc.
…especially if you did not build the dataset by yourselves!
Linguistic information:
where are the observations extracted from? and when?
do they belong to a specific language register?
did the authors apply some restrictions/limitations to the dataset?
Statistical information:
how many observations the dataset counts originally and after restrictions
how many variables/predictors are annotated
…especially if you did not build the dataset by yourselves!
Language internal/external predictors: identify and describe them.
First thing First: Response variable, type of the variables, levels of the variables
Filters added in previous studies (they can drastically change the results of your replication study!!)
…based on the research questions and data you have
Why the authors chose to employ this/those analysis/es?
Which questions these analyses answer to? Do they address to specific perspectives of the linguistic phenomenon, or they are more general?
Did these analyses bring an innovation in terms of methodologies employed in the field? Or they have been already used (see 1)?
This paper investigates two of the well-known alternations in English, the dative and the genitive alternation in four varieties of spoken English. An ensemble of statistical analyses is employed to understand the extent to which the probabilistic grammar of genitive and dative variant choice differs across varieties.
Our mini-replication study will focus only on Spoken American English, and only on the dative alternation
Goal of my analysis is slightly different from the original one: to get familiar with the so-called traditional, top-down, manually annotated predictors for the dative alternation, and how well they predict the choice between the two variants in spoken American English.
(1) a. Ditransitive dative variant
[The waiter]subject [gave]verb [my cousin]recipient [some pizza]theme
b. Prepositional dative variant
[The waiter]subject [gave]verb [some pizza]theme [to my cousin]recipient
What is an alternation? See Pijpops (2020) and Gries (2017).
A core section of every variationist research is the dataset and its annotation: Szmrecsanyi et al. (2017) presents two comprehensive and homogeneously manually annotated datasets for both alternations.
Linguistic information
The dative tokens for American English were elicited from the Switchboard corpus of American English (Godfrey, Holliman & McDaniel 1992), as described in Bresnan et al. (2007). The Switchboard corpus covers telephone conversations collected at the beginning of the 1990s.
This dataset contains only observation with the verb give as verb of the dative construction.
The collection follows Bresnan et al. (2007) directions in defining interchangeable ditransitive and prepositional dative variants: only instances of the verb give with two argument Noun Phrases, with the exception of non-interchangeable contructions, were considered.
Statistical information
Language-external predictors
Language-internal predictors: the authors annotated for well-known determinants of dative variation.
Language-internal predictors: further manipulation
Reducing the predictors into binary contrasts: Recipient/Theme.type were reduced to pronominal ([2], [3], [4]) versus non-pronominal ([1]); Recipient/Theme.definiteness were reduced to definite ([1], [3]) versus indefinite ([2]); Recipient/Theme.animacy were reduced to animate ([1]) versus inanimate ([2], [3], [4], [5])
Creating a the new predictor Length.difference: the Recipient/Theme.length measures were combined into a relative measure of length, calculated as log(Recipient.length) - log(Theme.length).
Recipient/Theme.lemma: the annotated lemma of the heads.
Methods is Corpus Linguistics - Leuven, 2022-12-8