Language features to look at in Dementia patients

From IMC wiki
Jump to: navigation, search

About

In this page is a list of language features for consideration when processing patient transcript data. Where possible indications will be made detailing what software tools are available to facilitate the analysis of language features.

Interactional Features

A subset of the Computational Features list below which should be independent of content discussed, and features of interaction, and therefore the first things to look at in this data:

Feature Method Related linguistic phenomena Reference Implemented? Implementation Difficulty Any issues?
Presence of carer Count number of individuals in transcript Presence of carer Elsey et al 2015 yes easy
Response patterns Probability of answer by carer vs patient after doctor question Inability to answer Elsey et al 2015 yes fair
Referral to carer for answers Elsey et al 2015 yes fair
"Don't know" answers Frequency of patient utterances containing "don't know" keywords (after doctor turns?) Inability to answer Elsey et al 2015 in progress fair
Length of responses to questions Number of words in patient utterance following doctor question; identify questions via "?"? Lack of elaboration in responses Elsey et al 2015 yes fair Transcripts don't contain ?s would need another way to detect questions (Solved with question classifier, achieved 90% accuracy tested on hand annotated dementia transcripts
As above but also normalise over number of words in doctor question Partial answers to compound doctor questions Elsey et al 2015 yes fair
Backchannels Count transcribed backchannels (#mhm, #mm etc); rate per person (normalised on number of utterances) ref yes easy Working on backchannel classifier trained on switchboard data. best case f1 scores ~0.81, need to hand annotate SLaDE transcripts for backchannels to test performance
As above, but distance between backchannels per person yes fair
Count acknowledgement keywords (yes, yeah, ok, okay, good etc); rate per person) yes easy
Inter-turn pauses rate of inter-turn (single-line) "Pause"s (over total utterances); maybe also turn-initial within-line "Pause"s Slowness to respond ref yes fair
Intra-turn pauses rate of intra-turn "Pause"s (over number of words in utterance) Slow rate of speech (and self-repair) ref yes easy
Filled pauses rate of transcribed filled pauses (#err, #umm etc); mean/min/meax/std rate per person per turn Circumlocutions (and self-repair) ref yes easy
Relative contribution Number of patient words and turns vs doctor; also vs carer Reduced number of utterances Orimaye et al 2014 yes fair
Slow rate of speech ref fair
Empty phrases/speech ref difficult
Turn length Words per utterance per person, maybe normalised over whole document and over doctor and over carer Short conversational turns ref yes fair
Less words per turn Orimaye et al 2014 yes easy
Less complex sentences Croisile et al 1996 yes fair
Incomplete words Count transcribed incomplete words "d-"; mean/min/max/std rate per person Incomplete talk/ conversational discontinuity Watson et al 1999 yes easy
Use of incomplete words ref yes easy
Incomplete turns/contributions Turn-final transcribed incomplete words? Or maybe parse tree features? Incomplete talk/ conversational discontinuity ref no difficult
Self-repair Julian Hough's self-repair classifier, STIR - mean/max/min/std repairs per turn; can also use simpler classifier via STIR's lexicon of repair indicator words Repair (SISR) Watson et al 1999 no difficult (using STIR), fair using lexicon of repair indicator words
Word finding difficulties lexical retrieval Croisile et al 1996 no difficult
Object naming difficulties ref no difficult
Disfluency ref no difficult
Elaboration difficulty Watson et al 1999 no difficult
Revisions Orimaye et al 2014, Onofre de Lire et al 2011 no difficult
Other-repair Chris Howes' other-repair classifier from PPAT? Mean/max/min/std rate of other-repair per person Comprehension of the speech of others ref no difficult
Inter-turn repetition Mean words per turn repeated from previous turn(s); maybe weighted (inversely) by distance? Repetition (and other-repair) Croisile et al 1996, Onofre de Lire et al 2011 yes fair
Repeating questions ref no easy
Intra-turn repetition Mean repeated words per turn; maybe weighted (inversely) by distance? Repetition Croisile et al 1996, Onofre de Lire et al 2011 yes fair
Greetings Count of initial greetings (use manually defined list), with/without pauses, other words? Impairment in greeting ref yes easy Need to find a good list of greetings words, using http://www.fluentu.com/english/blog/english-greetings-expressions/ as simple basis so far
Hedges Define lexicon manually & count (you know, thing etc); mean/min/meax/std rate per person per turn Circumlocutions ref basic count of hedges fair

Other Topic-independent Features

Another subset of the Computational Features list below which should be independent of content discussed (but not really interactional, more individual), so maybe the second things to look at in this data:

Feature Method Related linguistic phenomena Reference Implemented? Difficulty?
Pronoun use Mean/max/min/std pronouns per turn per person; maybe normalised over number of nouns More pronoun use Jarrold et al 2014 yes easy
Empty phrases/speech ref no fair
Predicates POS-tag, count verb/noun/adj/adv classes; mean/max/min/std per person, also normalised per utterance Number (not average) of predicates (CHECK) Orimaye et al 2014 yes fair
Syntactic complexity depth of tree from e.g. Stanford parser? then min/max/mean/std per person per turn Lower syntactic index (CHECK) Onofre de Lire et al 2011 in progress fair-difficult
Less complex sentences Onofre de Lire et al 2011
Topic variability LDA or similar to assign topics, then, using a sliding window of N(=10??) utterances: (1) measure change (e.g. KL divergence) in topic distribution vector between neighbouring windows; or (2) just calculate entropy of topic distribution in each window; (then mean, max etc) Introducing new topics/change topics ref no difficult
Topic shifts / lack of topic maintenance Watson et al 1999
Lexical surprisal train language model on e.g. Switchboard/BNC (use STIR models?), measure surprisal at each word; then mean/min/max/std surprisal per person Paraphasia words in wrong and senseless combinations ref no difficult
Lacking coherence /intelligibility Watson et al 1999

Computational Features

Feature Method Related linguistic phenomena Reference Implemented?
Topic variability LDA or similar to assign topics, then, using a sliding window of N(=10??) utterances: (1) measure change (e.g. KL divergence) in topic distribution vector between neighbouring windows; or (2) just calculate entropy of topic distribution in each window; (then mean, max etc) Introducing new topics/change topics ref
Topic shifts / lack of topic maintenance Watson et al 1999
Topic contribution Segment topics via WindowDiff method (lexical or LDA topics); then mean/min/max/std number of contributions per person per segment Lack of contribution to topics ref
Lexical surprisal train language model on e.g. Switchboard/BNC (use STIR models?), measure surprisal at each word; then mean/min/max/std surprisal per person Paraphasia words in wrong and senseless combinations ref
Lacking coherence /intelligibility Watson et al 1999
Lexicon size Lexical type counts (or maybe better: type/token ratio), breadth of lexical probability distribution (i.e. entropy of unigram language model) Lexicon richness see e.g. Hirst papers on author dementia from vocabulary changes in books
Incomplete words Count transcribed incomplete words "d-"; mean/min/max/std rate per person Incomplete talk/ conversational discontinuity Watson et al 1999 yes
Use of incomplete words ref
Incomplete turns/contributions Turn-final transcribed incomplete words? Or maybe parse tree features? Incomplete talk/ conversational discontinuity ref
Predicates POS-tag, count verb/noun/adj/adv classes; mean/max/min/std per person, also normalised per utterance Number (not average) of predicates (CHECK) Orimaye et al 2014 yes
Self-repair Julian Hough's self-repair classifier, STIR - mean/max/min/std repairs per turn; can also use simpler classifier via STIR's lexicon of repair indicator words Repair (SISR) Watson et al 1999
Word finding difficulties lexical retrieval Croisile et al 1996
Object naming difficulties ref
Disfluency ref
Elaboration difficulty Watson et al 1999
Revisions Orimaye et al 2014, Onofre de Lire et al 2011
Other-repair Chris Howes' other-repair classifier from PPAT? Mean/max/min/std rate of other-repair per person Comprehension of the speech of others ref
Reference errors ref
Turn length Mean/min/max/std-dev words per utterance per person, maybe normalised over whole document and over other person? Short conversational turns ref yes
Less words per turn Orimaye et al 2014 yes
Less complex sentences Croisile et al 1996 yes
Syntactic complexity depth of tree from e.g. Stanford parser? then min/max/mean/std per person per turn Lower syntactic index (CHECK) Onofre de Lire et al 2011 in progress
Less complex sentences Onofre de Lire et al 2011
Inter-turn repetition Mean words per turn repeated from previous turn(s); maybe weighted (inversely) by distance? Repetition Croisile et al 1996, Onofre de Lire et al 2011
Repeating questions ref
(See also other-repair features)
Intra-turn repetition Mean repeated words per turn; maybe weighted (inversely) by distance? Repetition Croisile et al 1996, Onofre de Lire et al 2011 partial, no weighting yet
Pronoun use Mean/max/min/std pronouns per turn per person; maybe normalised over number of nouns More pronoun use Jarrold et al 2014 yes
Empty phrases/speech ref
Reduced lexicon richness e.g. Hirst
Relative contribution Number of words and turns per person; also normalised over other person Reduced number of utterances Orimaye et al 2014 yes, normalisation in progress
Slow rate of speech ref
Empty phrases/speech ref
Pauses mean/min/max number of inter-turn and intra-turn "Pause"s Slowness to respond ref yes
Slow rate of speech ref
(see also self-repair phenomena)
Filled pauses Count transcribed filled pauses; mean/min/meax/std rate per person per turn Circumlocutions ref
(see also self-repair phenomena)
Hedges Define lexicon manually & count (you know, thing etc); mean/min/meax/std rate per person per turn Circumlocutions ref basic count of hedges
(see also self-repair phenomena)
Greetings Count of initial greetings (use manually defined list), with/without pauses, other words? Impairment in greeting ref
Backchannels Define lexicon manually & count; mean/min/max/std rate per person ref
Dialogue acts Use a standard e.g. Switchboard-trained DA tagger to give mean/min/max query vs statement; or build simple POS-sequence rule-based version More requestives than assertives ref
Gist ignore this!: (Maybe look at text summarisation of HCP dialogue and see how it matches up to patient utterances after topic modelling is performed on both Gist-level processing (summary, main idea, lesson task) ref
Detail-level processing  ? Detail-level processing ref
All unigrams/n-grams (Standard scikit-learn functions) (Standard NLP baseline) yes
Topic features Gensim for LDA inference with e.g. 20 topics; then use weights of each topic as per-dialogue/person features (As used in previous PPAT/AOTD work) yes, needs refining
Sentiment/emotion features Existing QMUL SVM classifiers (As used in previous PPAT/AOTD work) in progress

Linguistic Features

The same features ordered by linguistic description -= these should all now be subsumed into the table above

Feature Possible computational implementation
Lack of speech initiative / introducing new topics/change topics LDA or similar to assign topics, then measure change (e.g. KL divergence) in distributions over windows
Topic shifts / lack of topic maintenance
Lack of contribution to topics needs explicit topic segmentation - then mean/min/max/std number of contributions per segment
Paraphasia words in wrong and senseless combinations
Incomplete talk/ conversational discontinuity transcribed incomplete words "d-"
Lacking coherence /intelligibility maybe language model high surprisal?
Repair Julian Hough's self-repair classifier, STIR - mean/max repairs per turn; use lexicon of repair indicator words; incomplete words
Word finding difficulties lexical retrieval
Object naming difficulties
Disfluency
Elaboration difficulty
Use of incomplete words transcribed incomplete words "d-"
Comprehension of the speech of others Chris Howes' other-repair classifier from PPAT?
Short conversational turns Mean/min/max/std-dev words per utterance, maybe normalised over whole document
Less words per turn
Repetition Mean repeated words per turn; maybe weighted (inversely) by distance?
Reference errors
More pronoun use Simple python code to count usage levels of pronouns for Patients / HCPs?
Impairment in greeting presence/absence of initial greetings (use manually defined list), with/without pauses, other words?
Speech outflow
Circumlocutions Maybe look for hedges and fillers e.g "You know, that thing that does x / that thing with the y that looks like z"
Slowness to respond mean/min/max number of inter-turn "Pause"s
Gist-level processing (summary, main idea, lesson task) Maybe look at text summarisation of HCP dialogue and see how it matches up to patient utterances after topic modelling is performed on both
Detail-level processing  ?
Reduced lexicon richness see e.g. Hirst papers on author dementia from vocabulary changes in books
Empty phrases/speech ratio of pronouns to nouns
Slow rate of speech mean/min/max number of inter-turn "Pause"s and intra-turn "#pause"s
More requestives than assertives Use a standard e.g. Switchboard-trained DA tagger to give mean/min/max query vs statement; or build simple POS-sequence rule-based version
Repeating questions