TIPS2025 - Chen | Bloch-Elkouby Lab for Suicide Prevention and Psychotherapy Research

Jimmy Chen’s Poster

Reliability of Abbreviated Suicide Crisis Syndrome Checklist (A-SCS-C) Across 70 Clinicians Interviewing the Same Virtual Patient

by Jimmy P. Chen^1,2, Hanjiang Xu^1,2, Jingyi Yang^1,2, & Sarah Bloch-Elkouby ^2,3

1 Teachers College, Columbia University in the City of New York 2 Icahn School of Medicine at Mount Sinai in New York City 3 Ferkauf Graduate School of Psychology, Yeshiva University

Audio Introduction:

Background:

The Suicide Crisis Syndrome (SCS) is a newly developed suicidal mental state shown to predict suicidal behaviors (Schuck et al., 2019; Bloch-Elkouby et al., 2024). It emphasizes suicide risk assessment without considering suicidal ideation. The syndrome is consisted of five criteria: A. Entrapment, B1 Affective Disturbance, B2 Loss of Cognitive Control, B3 Disturbance in Arousal, and B4 Social Withdrawal.

The Suicide Crisis Syndrome Checklist (SCS-C) is a yes-no clinician rated scale based on the SCS. In SCS-C, criterion A includes only 1 symptom; criteria B1-B3 each includes four symptoms; criterion B4 includes two symptoms. As a new concept and scale, their reliability, factorial structure, predictive validity, convergent validity, and discriminant validity are yet to be researched.

Objective:

The development of virtual patient (VP) technology offered a unique opportunity for testing the SCS-C: powered by Google DialogFlow (Sabharwal & Agrawal, 2020), the intent-recognition based VP system offered scripted responses, which made the VP highly consistent and standardized across interviews.

The current study is a part of a larger study that recruited 200 future clinicians (doctoral students) to conduct a clinical interview with the virtual patient, Noah. We selected 70 clinicians who interviewed the same version of Noah to compare their ratings of SCS on Noah.

Hypotheses

Interrater reliability of SCS would be high (AC1 >.80) and superior to that of the C-SSRS.
The accuracy of SCS ratings would be high and superior to that of the C-SSRS

Method:

Clinician rated both SCS-C and C-SSRS-screener in five-point Likert scale (0 Not at all - 4 Extremely) after they interacted with the VP, according to their assessment. The scores were then dichotomized with scores below 2 (0 and 1) recoded as 0 (negative/absent) and the rest as 1 (positive/present). The SCS-C included 15 individual symptoms, 5 criteria, and 1 overall diagnosis. The question 1-5 in C-SSRS-screener that assessed the severity of suicidal ideation was included. With both measures, the validity of clinicians’ assessment was operationalized as the consistency between clinicians’ and experts’ ratings (correct/total) while interrater reliability was assessed using Gwet’s AC1 (Gwet, 2008).

Results:

Table 1 and 2 presents Noah’s correct SCS and C-SSRS screener ratings respectively. The number in the parentheses showed the number of clinicians who got the correct rating, divided by the total sample size. The third table documents the frequency distribution of clinician responses and their AC1 scores.

Clinician ratings demonstrated strong agreement with the VP design and high interrater reliability across SCS criteria (entrapment: 62/70, AC1 = .74; affective disturbance: 70/70, AC1 = 1.00; cognitive disturbance: 67/70, AC1 = .91; arousal disturbance: 64/70, AC1 = .84; social withdrawal: 66/70, AC1 = .91). The SCS diagnosis was moderately consistent and reliable (56/70; AC1 = .55), outperforming CSSRS risk assessment (35/70; AC1 = .15). Overall, the SCS-C items consistently ranked above the CSSRS items in AC1 scores and consistency ratings.

However, some individual SCS symptoms demonstrated lower reliability (lowest 10/70, AC1 = .10), warranting further investigation.

Conclusion:

Although some SCS symptoms were rated less accurately, perhaps because the clinicians in the study were never trained on SCS or told to assess SCS, the strong criterion-level validity and interrater reliability for SCS-C and superiority to C-SSRS still provided evidence for its clinical utility.

The VP is also demonstrated to be an efficient tool for testing clinician-rated measures, providing stronger scalability and cost-effectiveness than standardized patients.

References:

Bloch-Elkouby, S., Rogers, M. L., Goncearenco, I., Yanez, N., Nemeroff, C., Chennapragada, L., … & Galynker, I. (2024). The narrative crisis model of suicide: A review of empirical evidence for an innovative dynamic model of suicide and a comparison with other theoretical frameworks. Personalized Medicine in Psychiatry, 45, 100131. https://doi.org/10.1016/j.pmip.2024.100131

Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61, 29-48. https://doi.org/10.1348/000711006X126600

Schuck, A., Calati, R., Barzilay, S., Bloch‐Elkouby, S., & Galynker, I. (2019). Suicide Crisis Syndrome: A review of supporting evidence for a new suicide‐specific diagnosis. Behavioral Sciences & the Law, 37(3), 223-239. https://doi.org/10.1002/bsl.2397