Portfolio

Problem Definition

Correlation and Descriptive studies.

Problem Definition: Investigating resuscitation code assignment in the intensive care unit is pertinent because revenue is lost in assigning resources to unnecessary resuscitation attempts, especially in cases where do-not-resuscitate orders have already been procured. Most Patients come to the ICU unplanned for some care after a sudden change in health. Resuscitation code status is generally assigned as Full code (FC), until full prognosis is done and patient`s full wishes are known either from the patient or family. This is not easy !

Even in AD (advanced directives) cases, it is usually difficult to predict whether stabilization will occur, consequently less aggressive code assignment doesn`t occur until after entry into the ICU. For patients who transitioned from FC status, to DNR, or to CMO, the last recorded code status might be used. Null hypothesis question: Is there any relationship between medical, familial and social factors in the assignment of resuscitation code?

Inputs

Inputs: Synthea , synthetic patient generated data, Mimic iv ICU dataset( https://mimic.mit.edu/iv/ ) , inclusion/exclusion criteria, deidentified longitudinal, cross sectional EHR data: 53,423 critical care admissions, 26 tables, 324 variables Charted Events, laboratory measurements (LOINC), over 2 million rows of unstructured data (provider notes)coded with SNOMED CT, ICD-9, ICD-10 and LOINC codes. Unstructured medical notes and structured deidentified EHR data.

Algorithm

Algorithm: R, SQL, FHIR, Random forest, Logistic Regression, NLP, Apache Ctakes, tokenization, lemmatization, Word2vec, NLP Models based on data from CTAKES and 'treebanks' in 'CoNLL-U' format.

Outputs

Outputs: Aggregate composite dataset included Unstructured 19642documents, 5682 unique terms, unstructured variables include BM-25/TF-IDF per doc, doc score, Sentiment polarity and

Structured 19642 rows of unique patient admission id with variables including length of stay, Subject_id, Gender, Age.yrs, Saps score (14 aggregate mimic variables ), Mortality & Code status. Document level statistics, random forest prediction, correlation matrix. & Regression model predictions.

Project Overview

Schematic overview and project execution pipeline

Distribution of sentiment across corpus and documents.

85% of the corpus was neutral to the semantic variability of the domain specific concepts and social related terms.

Distribution of words in corpus.

The patients are identified by their names. Name appear to be the most frequent word used in the corpus.

Relationship between patient code status and social Visits

Fathers reduced visits as patients tend towards CMO. Girlfriends remain consistent in their visits and highest number of visits came from daughters. Awesome!

Distribution of Age by code status

Specific differences in social interactions written in notes as observed by different gender is interesting.

Confusion matrix

94% of the DNR cases were identified and overall classification accuracy was 87%. A diagnosis from the DNR features was correct 94% of the time (for 6% False Positive rate)

GITHUB

Impact of Resuscitation Code Assignment for Intensive Care Management System

Input:

Algorithm:

Output: