ChrisNotes

From IMC wiki
Jump to: navigation, search

09/02/2017

Matt notes:

  • analysis of diagnosis class (depression, anxiety, other) on outcome
  1. maybe then error/accuracy analysis per class?
  • analysis of agent ID or agent quartile only on outcome

03/09/2015

Matt replication and cross-val testing:

  • for out-of-caseness, compared to Chris 27/04/2015:
  1. using AttrSelClassifier, get same results: logistic f=0.664 (0.552)
  2. pre-select features (cross-val 10 folds, take>5), slightly better results: logistic f=0.702 (0.586)
  3. adding first/diff for same topics 5/14/16 (ENDOOCmatt_manual.arff): logistic f=0.704 (0.594)
  4. adding SentimentMeanFirst, AngerMeanLast (ENDOOCmatt_manual_emo.arff): logistic f=0.729 (0.642)
  5. (above justified by Attribute Selection using cross-validation, take any selected in any fold)
  6. as above, cross-val blocking AGENTID: logistic f=0.707 (0.604)
  • for agent av PHQ delta, compared to Chris "additional" below:
  1. using only the first/last agent/client num words/turns/prop: logistic f=0.674 (0.632)
  2. using ALLWORDSfirst vectorised unigrams (Agent3PatsOrMore_matt_allwords.arff), LIBLINEAR f=0.781 (0.741)
  3. as above but cheating attribute selected, LIBLINEAR f=0.860 (0.839), logistic similar

28/04/2015

Also should do same for those with 'measurable recovery (6+ delta)

  • Best/worst agents - file for Matt cross validation PPAT/POLData/ARFF/data/20150427/FirstLastStartedInCaseCompletedOnlyAgent3PatsOrMore.arff

27/04/2015

Sanity check of out of caseness (IAPT Moving to recovery) is good (same results as below!) but slightly disingenuous as data contained those who also STARTED out of case...

Just taking those who started with PHQ >= 10

  1. Baseline 53/140 (37.9%) did not move to recovery (i.e. more balanced dataset)
  2. Get similar results: attribute selected (w/o words) + logistic f=0.664 (0.552):
Variable Odds Ratios (low) Coefficients
PHQatAssessment 1.039 0.0382
PHQatFirstTreatment 0.8705 -0.1387
PHQ_FIRST_TREATMENT (BINARY) 0.1142 -2.1694
AllBoW05last 0.0036 -5.6218
AllBoW14last 0.0001 -9.785
AllBoW16last 667.8459 6.5041

13/03/2015

Redo best and worst therapists by average delta of patients who have finished treatment and started in caseness

Patient language only run as is; therapist language cross validation problem (send to Matt).


Additional

Best/worst therapists; Can predict with >0.7 whether completed patient was by one of the best quartile of therapists (55 patients) or worst quartile (44 patients) (7 or 8 patients in each case) just by the therapist language in the assessment session.

19th August

Feature sets etc tried for predicting whether out of caseness at final treatment session (for those completed):

  1. Baseline: 26.8% (62/231) PHQ >= 10 at final

Using all

  1. First and last sessions (topic/sent/high level) inc delta
    • logistic f=0.71 (0.477)
  2. First and last sessions (topic/sent/high level/words) inc delta
    • logistic f=0.658 (0.288)
  3. First and last sessions (topic/sent) inc delta
    • logistic f=0.699 (0.43)
  4. Deltas only (topic/sent/high level)
    • logistic f=0.685 (0.355)
  5. ALL attribute selected + logistic f=0.761 (0.509):
Variable Odds Ratios (low) Coefficients
PHQatAssessment 1.0177 0.0176
PHQ9_SCORE 0.8541 -0.1578
PHQ_FIRST_TREATMENT 0.3995 -0.9175
AngerMean 0.002 -6.1979
AllBoW14last 0 -11.9372
AllBoW16last 69.9508 4.2478

18th August

Feature sets etc tried for predicting whether will stay in treatment:

  1. Baseline: 29.6% (148/500) did not enter/stay in treatment

Using assessment session

  1. Assessment session high level (inc PHQ)
    • logistic f=0.619 (0.131)
  2. Assessment session sentiment
    • logistic f=0.595 (0.051)
  3. Assessment session high level/sentiment
    • logistic f=0.629 (0.191)
  4. Assessment session words
    • logistic f=0.6 (0.351)
  5. Assessment session high level/words
    • logistic f=0.591 (0.342)
  6. Assessment session sentiment/words
    • logistic f=0.627 (0.392)
  7. Assessment session high level/sentiment/words
    • logistic f=0.622 (0.412)
  8. Assessment session high level inc delta
    • logistic f=0.623 (0.194)
  9. Assessment session sentiment inc delta
    • logistic f=0.644 (0.217)
  10. Assessment session high level/sentiment inc delta
    • logistic f=0.648 (0.263)
  11. Assessment session high level inc delta/words
    • logistic f=0.591 (0.353)
  12. Assessment session sentiment inc delta/words
    • logistic f=0.617 (0.378)
  13. Assessment session high level inc delta/sentiment inc delta/words
    • logistic f=0.59 (0.318)

Using treatment session

  1. Treatment session high level (inc PHQ)
    • logistic f=0.611 (0.134)
  2. Treatment session sentiment
    • logistic f=0.608 (0.088)
  3. Treatment session topic
    • logistic f=0.589 (0.08)
  4. Treatment session high level/sentiment
    • logistic f=0.633 (0.214)
  5. Treatment session high level/topic
    • logistic f=0.608 (0.196)
  6. Treatment session sentiment/topic
    • logistic f=0.606 (0.139)
  7. Treatment session high level/sentiment/topic
    • logistic f=0.629 (0.236)
  8. Treatment session words
    • logistic f=0.676 (0.479)
  9. Treatment session high level/words
    • logistic f=0.689 (0.485)
  10. Treatment session sentiment/words
    • logistic f=0.684 (0.472)
  11. Treatment session topic/words
    • logistic f=0.684 (0.466)
  12. Treatment session high level/sentiment/words
    • logistic f=0.695 (0.479)
  13. Treatment session high level/topic/words
    • logistic f=0.693 (0.495)
  14. Treatment session topic/sentiment/words
    • logistic f=0.678 (0.476)
  15. Treatment session high level/sentiment/topic/words
    • logistic f=0.693 (0.495)
  16. Treatment session high level inc delta/topic/sentiment inc delta/words
    • logistic f=0.59 (0.318)

Using Assessment and treatment

  1. Everything:
    • logistic f=0.664 (0.401)

4th June

Number of completers, dropouts etc.

Mean no. sessions

optimum number of sessions - line graph with phq change

characteristics of those who 'recover'

therapists with better outcomes?

pull out a few transcripts where they discuss 'thoughts/ feelings of ending your life' 'life not worth living' 'suicid'

28th May

Do descriptives of new stuff (how much people get better, how many sessions they need etc)

Potentially new/old changes in topic through sessions (pull out people egs; good v bad)

12th May

Change in each topic (now-next) for each one. Relationship with progress.

Gradients within sessions? (Future!)

Write pipeline stuff;

Methods and description of sample/data

5th March

  • 20 all words without agent/client specifics
  • Prediction with 40 all
  • Sentiment
    1. Means and SDs of sentiment (and max/min) by person
    2. Changes of sentiment / anger
    3. Session by session or pair by pair
  • SPSS Data - add sentiment/agentID etc

26th February

  • Map each one to the next (difference between words in session and improvement/otherwise by next session)
  • Topics correlations by transcript etc etc - get examples and backwards engineering as before
  • Topics in context
  • PHQ change of only completed treatment
  • Dynamics of change?

19th February

  • How to choose optimum number of topics? check add-ons for Mallet
    • coherence
    • usefulness for prediction
  • Tag words for who says them. (Agent_...)
  • Redo the topics with different/no stopwords
  • Redo topics for treatment sessions only
  • Those who have completed or done 4+ treatments - starting/end


  • Predicting who drops out?
  • Counts of words over total words. Normalise
  • Increase cost weighting; try different parameters; bi-grams/tri-grams
  • Lower frequency threshold for words.
  • Most informative words - attribute selection etc.