Language features to look at in Dementia patients

From IMC wiki
Revision as of 17:34, 18 August 2015 by Mpurver (Talk | contribs)

Jump to: navigation, search

About

In this page is a list of language features for consideration when processing patient transcript data. Where possible indications will be made detailing what software tools are available to facilitate the analysis of language features.

Interactional Features

A subset of the Computational Features list below which should be independent of content discussed and therefore the first things to look at in this data:

Feature Method Related linguistic phenomena Reference Implemented?
Backchannels Count transcribed backchannels (#mhm, #mm etc); rate per person (normalised on number of utterances) ref
As above, but distance between backchannels per person
Count acknowledgement keywords (yes, yeah, ok, okay, good etc); rate per person)
Inter-turn pauses rate of inter-turn (single-line) "Pause"s (over total utterances); maybe also turn-initial within-line "Pause"s Slowness to respond ref
Intra-turn pauses rate of intra-turn "Pause"s (over number of words in utterance) Slow rate of speech (and self-repair) ref
Filled pauses rate of transcribed filled pauses (#err, #umm etc); mean/min/meax/std rate per person per turn Circumlocutions (and self-repair) ref
Relative contribution Number of patient words and turns vs doctor; also vs carer Reduced number of utterances Orimaye et al 2014 yes, normalisation in progress
Slow rate of speech ref
Empty phrases/speech ref
Turn length Words per utterance per person, maybe normalised over whole document and over doctor and over carer Short conversational turns ref yes
Less words per turn Orimaye et al 2014 yes
Less complex sentences Croisile et al 1996 yes
Incomplete words Count transcribed incomplete words "d-"; mean/min/max/std rate per person Incomplete talk/ conversational discontinuity Watson et al 1999 yes
Use of incomplete words ref
Incomplete turns/contributions Turn-final transcribed incomplete words? Or maybe parse tree features? Incomplete talk/ conversational discontinuity ref
Self-repair Julian Hough's self-repair classifier, STIR - mean/max/min/std repairs per turn; can also use simpler classifier via STIR's lexicon of repair indicator words Repair (SISR) Watson et al 1999
Word finding difficulties lexical retrieval Croisile et al 1996
Object naming difficulties ref
Disfluency ref
Elaboration difficulty Watson et al 1999
Revisions Orimaye et al 2014, Onofre de Lire et al 2011
Other-repair Chris Howes' other-repair classifier from PPAT? Mean/max/min/std rate of other-repair per person Comprehension of the speech of others ref
Inter-turn repetition Mean words per turn repeated from previous turn(s); maybe weighted (inversely) by distance? Repetition (and other-repair) Croisile et al 1996, Onofre de Lire et al 2011
Repeating questions ref
Intra-turn repetition Mean repeated words per turn; maybe weighted (inversely) by distance? Repetition Croisile et al 1996, Onofre de Lire et al 2011 partial, no weighting yet
Greetings Count of initial greetings (use manually defined list), with/without pauses, other words? Impairment in greeting ref
Hedges Define lexicon manually & count (you know, thing etc); mean/min/meax/std rate per person per turn Circumlocutions ref basic count of hedges
Pronoun use Mean/max/min/std pronouns per turn per person; maybe normalised over number of nouns More pronoun use Jarrold et al 2014 yes
Empty phrases/speech ref
Predicates POS-tag, count verb/noun/adj/adv classes; mean/max/min/std per person, also normalised per utterance Number (not average) of predicates (CHECK) Orimaye et al 2014 yes
Syntactic complexity depth of tree from e.g. Stanford parser? then min/max/mean/std per person per turn Lower syntactic index (CHECK) Onofre de Lire et al 2011 in progress
Less complex sentences Onofre de Lire et al 2011
Topic variability LDA or similar to assign topics, then, using a sliding window of N(=10??) utterances: (1) measure change (e.g. KL divergence) in topic distribution vector between neighbouring windows; or (2) just calculate entropy of topic distribution in each window; (then mean, max etc) Introducing new topics/change topics ref
Topic shifts / lack of topic maintenance Watson et al 1999
Lexical surprisal train language model on e.g. Switchboard/BNC (use STIR models?), measure surprisal at each word; then mean/min/max/std surprisal per person Paraphasia words in wrong and senseless combinations ref
Lacking coherence /intelligibility Watson et al 1999

Computational Features

Feature Method Related linguistic phenomena Reference Implemented?
Topic variability LDA or similar to assign topics, then, using a sliding window of N(=10??) utterances: (1) measure change (e.g. KL divergence) in topic distribution vector between neighbouring windows; or (2) just calculate entropy of topic distribution in each window; (then mean, max etc) Introducing new topics/change topics ref
Topic shifts / lack of topic maintenance Watson et al 1999
Topic contribution Segment topics via WindowDiff method (lexical or LDA topics); then mean/min/max/std number of contributions per person per segment Lack of contribution to topics ref
Lexical surprisal train language model on e.g. Switchboard/BNC (use STIR models?), measure surprisal at each word; then mean/min/max/std surprisal per person Paraphasia words in wrong and senseless combinations ref
Lacking coherence /intelligibility Watson et al 1999
Lexicon size Lexical type counts (or maybe better: type/token ratio), breadth of lexical probability distribution (i.e. entropy of unigram language model) Lexicon richness see e.g. Hirst papers on author dementia from vocabulary changes in books
Incomplete words Count transcribed incomplete words "d-"; mean/min/max/std rate per person Incomplete talk/ conversational discontinuity Watson et al 1999 yes
Use of incomplete words ref
Incomplete turns/contributions Turn-final transcribed incomplete words? Or maybe parse tree features? Incomplete talk/ conversational discontinuity ref
Predicates POS-tag, count verb/noun/adj/adv classes; mean/max/min/std per person, also normalised per utterance Number (not average) of predicates (CHECK) Orimaye et al 2014 yes
Self-repair Julian Hough's self-repair classifier, STIR - mean/max/min/std repairs per turn; can also use simpler classifier via STIR's lexicon of repair indicator words Repair (SISR) Watson et al 1999
Word finding difficulties lexical retrieval Croisile et al 1996
Object naming difficulties ref
Disfluency ref
Elaboration difficulty Watson et al 1999
Revisions Orimaye et al 2014, Onofre de Lire et al 2011
Other-repair Chris Howes' other-repair classifier from PPAT? Mean/max/min/std rate of other-repair per person Comprehension of the speech of others ref
Reference errors ref
Turn length Mean/min/max/std-dev words per utterance per person, maybe normalised over whole document and over other person? Short conversational turns ref yes
Less words per turn Orimaye et al 2014 yes
Less complex sentences Croisile et al 1996 yes
Syntactic complexity depth of tree from e.g. Stanford parser? then min/max/mean/std per person per turn Lower syntactic index (CHECK) Onofre de Lire et al 2011 in progress
Less complex sentences Onofre de Lire et al 2011
Inter-turn repetition Mean words per turn repeated from previous turn(s); maybe weighted (inversely) by distance? Repetition Croisile et al 1996, Onofre de Lire et al 2011
Repeating questions ref
(See also other-repair features)
Intra-turn repetition Mean repeated words per turn; maybe weighted (inversely) by distance? Repetition Croisile et al 1996, Onofre de Lire et al 2011 partial, no weighting yet
Pronoun use Mean/max/min/std pronouns per turn per person; maybe normalised over number of nouns More pronoun use Jarrold et al 2014 yes
Empty phrases/speech ref
Reduced lexicon richness e.g. Hirst
Relative contribution Number of words and turns per person; also normalised over other person Reduced number of utterances Orimaye et al 2014 yes, normalisation in progress
Slow rate of speech ref
Empty phrases/speech ref
Pauses mean/min/max number of inter-turn and intra-turn "Pause"s Slowness to respond ref yes
Slow rate of speech ref
(see also self-repair phenomena)
Filled pauses Count transcribed filled pauses; mean/min/meax/std rate per person per turn Circumlocutions ref
(see also self-repair phenomena)
Hedges Define lexicon manually & count (you know, thing etc); mean/min/meax/std rate per person per turn Circumlocutions ref basic count of hedges
(see also self-repair phenomena)
Greetings Count of initial greetings (use manually defined list), with/without pauses, other words? Impairment in greeting ref
Backchannels Define lexicon manually & count; mean/min/max/std rate per person ref
Dialogue acts Use a standard e.g. Switchboard-trained DA tagger to give mean/min/max query vs statement; or build simple POS-sequence rule-based version More requestives than assertives ref
Gist ignore this!: (Maybe look at text summarisation of HCP dialogue and see how it matches up to patient utterances after topic modelling is performed on both Gist-level processing (summary, main idea, lesson task) ref
Detail-level processing  ? Detail-level processing ref
All unigrams/n-grams (Standard scikit-learn functions) (Standard NLP baseline) yes
Topic features Gensim for LDA inference with e.g. 20 topics; then use weights of each topic as per-dialogue/person features (As used in previous PPAT/AOTD work) yes, needs refining
Sentiment/emotion features Existing QMUL SVM classifiers (As used in previous PPAT/AOTD work) in progress

Linguistic Features

The same features ordered by linguistic description -= these should all now be subsumed into the table above

Feature Possible computational implementation
Lack of speech initiative / introducing new topics/change topics LDA or similar to assign topics, then measure change (e.g. KL divergence) in distributions over windows
Topic shifts / lack of topic maintenance
Lack of contribution to topics needs explicit topic segmentation - then mean/min/max/std number of contributions per segment
Paraphasia words in wrong and senseless combinations
Incomplete talk/ conversational discontinuity transcribed incomplete words "d-"
Lacking coherence /intelligibility maybe language model high surprisal?
Repair Julian Hough's self-repair classifier, STIR - mean/max repairs per turn; use lexicon of repair indicator words; incomplete words
Word finding difficulties lexical retrieval
Object naming difficulties
Disfluency
Elaboration difficulty
Use of incomplete words transcribed incomplete words "d-"
Comprehension of the speech of others Chris Howes' other-repair classifier from PPAT?
Short conversational turns Mean/min/max/std-dev words per utterance, maybe normalised over whole document
Less words per turn
Repetition Mean repeated words per turn; maybe weighted (inversely) by distance?
Reference errors
More pronoun use Simple python code to count usage levels of pronouns for Patients / HCPs?
Impairment in greeting presence/absence of initial greetings (use manually defined list), with/without pauses, other words?
Speech outflow
Circumlocutions Maybe look for hedges and fillers e.g "You know, that thing that does x / that thing with the y that looks like z"
Slowness to respond mean/min/max number of inter-turn "Pause"s
Gist-level processing (summary, main idea, lesson task) Maybe look at text summarisation of HCP dialogue and see how it matches up to patient utterances after topic modelling is performed on both
Detail-level processing  ?
Reduced lexicon richness see e.g. Hirst papers on author dementia from vocabulary changes in books
Empty phrases/speech ratio of pronouns to nouns
Slow rate of speech mean/min/max number of inter-turn "Pause"s and intra-turn "#pause"s
More requestives than assertives Use a standard e.g. Switchboard-trained DA tagger to give mean/min/max query vs statement; or build simple POS-sequence rule-based version
Repeating questions