Decision Tree Stability for Suicide Experience Prediction

ISE 625 Project Presentation

Adhithya Bhaskar

Michelle Gelman

2025-10-07

Problem outline

4.2 million youth experience homelessness each year in the U.S
~27% and ~35% of Runaway and Homeless Youth report past-year suicidal ideation [1] [2] compared with 15.8% in general population [3]
>50% of RHY have experienced suicidal ideation during their lifetime [4] [5] [6]

Key question

Most studies focus on individual characteristics in suicidal risk for YEH [7]
Limited insight on relevant factors and combinations of both individual and social factors that influence suicide risk [8]
Risk Profile: Signal of high vulnerability or resiliency with respect to suicidal ideation and attempts

Can a stable tree be used to understand the clinical risk profiles of YEH?

Data Collection & considerations

Survey:
- Use individual and social network attributes
- Between October 2011 and February 2013
- 2 drop-in centers that serve YEH in Hollywood and Santa Monica, CA
Current Factors:
- Overrepresentation of males and heterosexual youth
- Generalizability to other geographical regions

What does trust mean to a clinician?

Accurate intervention targets \(\rightarrow\) Help stakeholders understand how to build intervention strategies based on clinical risk profiles for YEH
Less algorithmic aversion \(\rightarrow\) Clinicians’ decision making is not affected by unreliable models’ decisions from a lack of consistency in identifying suicidal risk

Why do we need stable trees?

Population characteristics change over time
Model’s predicted output should not change with subsequent runs in light of new patient data
Initial dataset is small, but larger amounts of data become available over time as more patients’ information gets recorded \(\rightarrow\) retraining needed [9]

Data Import and Preparation

def prepare_data(df: pd.DataFrame, features: List[str], label: str, rng, imbalance=None):
    df = df.replace('NaN', pd.NA)  # replace the string 'NaN' with actual NaN values
    df_full_cleaned = df[features + [label]].dropna().copy()
    X = df_full_cleaned[features]
    y = df_full_cleaned[label]

    if imbalance == "SMOTE":
        sm = SVMSMOTE(random_state=42)
        X, y = sm.fit_resample(X, y)
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=rng.integers(0, 2**32 - 1), stratify=y)
    X_full = df_full_cleaned[features]
    y_full = df_full_cleaned[label]
    return X_full, y_full, X_train, X_test, y_train, y_test

Number of samples in the full dataset: 586
Number of samples in the training set: 439
Number of samples in the test set: 147
Shape of training set: (439, 56)

Data Imbalance

Before and after using SMOTE on imbalanced suicidea

A peek at the data

	age	gender	sexori	raceall	trauma_sum	cesd_score	harddrug_life	school	degree	job	...	prop_family_othersupport	prop_friends_othersupport	prop_friends_home_othersupport	prop_friends_street_othersupport	prop_alter_all_othersupport	sum_alter_staff	prop_object_badbehave	prop_enc_goodbehave	prop_alter_school_job	sum_alter_borrow
105	21.0	0.0	1.0	0.0	0.0	9.0	4.0	2.0	2.0	2.0	...	0.00	0.00	0.00	0.00	0.00	0.0	0.46	0.67	0.33	0.0
830	22.0	0.0	0.0	0.0	3.0	13.0	15.0	2.0	2.0	2.0	...	0.00	0.00	0.00	0.00	0.40	0.0	0.70	1.00	0.50	6.0
361	23.0	0.0	1.0	1.0	4.0	10.0	10.0	1.0	2.0	2.0	...	0.00	0.00	0.00	0.11	0.16	3.0	0.68	1.00	0.37	1.0
278	20.0	0.0	0.0	1.0	4.0	18.0	3.0	2.0	2.0	2.0	...	0.00	0.00	0.00	0.00	0.00	0.0	0.62	1.00	0.83	6.0
149	22.0	0.0	0.0	0.0	3.0	16.0	9.0	2.0	2.0	2.0	...	0.00	0.11	0.00	0.11	0.11	0.0	0.25	1.00	0.67	3.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
739	23.0	0.0	0.0	1.0	4.0	9.0	3.0	1.0	1.0	2.0	...	0.00	0.00	0.00	0.15	0.15	2.0	0.14	0.38	0.54	2.0
575	23.0	0.0	0.0	1.0	6.0	9.0	9.0	2.0	1.0	2.0	...	0.00	0.06	0.00	0.06	0.06	1.0	0.64	1.00	0.59	3.0
770	23.0	0.0	0.0	1.0	0.0	6.0	3.0	2.0	2.0	2.0	...	0.29	0.00	0.29	0.43	0.71	0.0	0.71	0.00	0.29	3.0
690	20.0	0.0	0.0	0.0	3.0	1.0	3.0	2.0	1.0	2.0	...	0.00	0.00	0.07	0.00	0.07	0.0	0.35	0.00	0.33	1.0
446	20.0	0.0	0.0	0.0	1.0	17.0	3.0	2.0	2.0	1.0	...	0.00	0.00	0.00	0.00	0.00	0.0	0.53	0.00	0.78	1.0

439 rows × 56 columns

Tree Path

Sequence of splits from the root to a leaf & class label \(k^p\)

\[ 𝒫(𝕋) = \{p_1, \dots, p_T\} \]

\(\mathbf{u}^p_j\), \(\mathbf{l}^p_j\): define the numeric interval for feature \(j\)
\(\mathbf{c}^p_j\): binary vector indicating which categories of feature \(j\) satisfy the splits
\(\mathbf{k}^p\): class label predicted for this region

Tree Distance

\[ \begin{align*} d({T}_1, {T}_2) &= \min_{{x}} \Bigg[ \sum_{p \in \mathcal{P}({T}_1)} \sum_{q \in \mathcal{P}(\mathbb{T}_2)} d(p,q) \, x_{pq} + \sum_{p \in \mathcal{P}(\mathbb{T}_1)} w(p) \, x_{p} \Bigg] \\ \text{s.t.} \quad & \sum_{q \in \mathcal{P}(\mathbb{T}_2)} x_{pq} + x_{p} = 1, \quad \forall p \in \mathcal{P}(\mathbb{T}_1) \\ & \sum_{p \in \mathcal{P}(\mathbb{T}_1)} x_{pq} = 1, \quad \forall q \in \mathcal{P}(\mathbb{T}_2) \\ \\ & x_{pq} \in \{0,1\}, \quad x_{p} \in \{0,1\}, \\ & \quad \forall p \in \mathcal{P}(\mathbb{T}_1), \quad \forall q \in \mathcal{P}(\mathbb{T}_2) \end{align*} \]

Stable decision trees

Implementing stable trees proposed by Bertsimas et al. 2023 [9]

Initial Training (T0): Train initial set of decision trees on subset
Full Data Training (T): Train a second set on full training data
Distance Computation: Calulate average distance between trees in T and the trees in T0 \[ d\bigl(\mathcal{T}_{1}, \mathcal{T}_{2}\bigr) \;=\;\min_{\{x\}}\ \sum_{p\in\mathcal{P}(\mathcal{T}_{1})}\sum_{q\in\mathcal{P}(\mathcal{T}_{2})} d(p,q)\, x_{p,q} \;+\;\sum_{p\in\mathcal{P}(\mathcal{T}_{1})} w(p)\, x_{p} \]
Performance Metrics: Compute AUC ROC on test set
Pareto Optimization: Select Pareto optimal trees that balance predictive performance and stability \[ \mathbb{T}^{\star}=\arg\!\mathrm{max}\,f\!\left(d_{b},\alpha_{b}\right)\! \]

Random Train Split

def random_train_split(X, y):
    X_values = X.values if hasattr(X, 'values') else X
    y_values = y.values if hasattr(y, 'values') else y
    
    N = X_values.shape[0]
    indices = np.random.permutation(N)
    X0, y0 = X_values[indices[:N // 2]], y_values[indices[:N // 2]]
    return X0, y0

X0, y0 = random_train_split(X_train.values, y_train.values)
X0.shape, y0.shape

((219, 56), (219,))

Bootstrapping Decision Trees

def train_decision_tree(X, y, depth, min_samples_leaf):
    clf = DecisionTreeClassifier(max_depth=depth, min_samples_leaf=min_samples_leaf)
    clf.fit(X, y)
    return clf

def bootstrap_trees(X, y, depths, min_samples, B):
    # Create B bootstrap trees by sampling with replacement from X_0
    trees = []
    for _ in range(B):
        X_sample, y_sample = resample(X, y, replace=True)
        depth = np.random.choice(depths)
        min_leaf = np.random.choice(min_samples)
        tree = train_decision_tree(X_sample, y_sample, depth, min_leaf)
        trees.append(tree)
    return trees

Number of trees in T0: 25
Number of trees in T: 25

Computing Tree Distances

def compute_average_distances(T0, T, X_train, y_train):
    X_train_values = X_train.values if hasattr(X_train, "values") else X_train
    distances: list[float] = []

    with alive_bar(len(T), title="Computing average tree distances", dual_line=True, spinner="waves", bar="smooth") as bar:
        for tree_b in T:
            d_b = 0.0
            for tree_beta in T0:
                calc = DistanceCalculator(tree_beta, tree_b, X=X_train_values, y=y_train)
                d_b += calc.compute_tree_distance()

            mean_dist = d_b / len(T0)
            distances.append(mean_dist)
            bar()

    return distances
distances = compute_average_distances(T0, T, X_train, y_train)

Finding Pareto Optimal Trees

\((d_{b'} \leq d_b \text{ and } \alpha_{b'} > \alpha_b)\) or \((d_{b'} < d_b \text{ and } \alpha_{b'} \geq \alpha_b)\)

def pareto_optimal_trees(distances, auc_scores):
    pareto_trees = []
    for i, (d_i, a_i) in enumerate(zip(distances, auc_scores)):
        dominated = False
        for j, (d_j, a_j) in enumerate(zip(distances, auc_scores)):
            if i != j and ((d_j <= d_i and a_j > a_i) or (d_j < d_i and a_j >= a_i)):
                dominated = True
                break
        if not dominated:
            pareto_trees.append(i)
    return pareto_trees

pareto_trees = pareto_optimal_trees(distances, auc_scores)
print("Number of Pareto optimal trees:", len(pareto_trees))

Number of Pareto optimal trees: 4

Tree Selection Strategy

\(\mathbb{T}^\star = \underset{\mathbb{T}_b \in \mathcal{T}^\star}{\text{argmax}} \ f(d_b, \alpha_b)\)
Tradeoff: Stability and predictive power

def select_final_tree(distances, auc_scores, pareto_indices, epsilon=0.01):
    best_auc = max(auc_scores)
    candidates = [i for i in pareto_indices if auc_scores[i] >= (1 - epsilon) * best_auc]
    if not candidates:
        candidates = pareto_indices
    best_idx = max(candidates, key=lambda i: auc_scores[i] - distances[i])
    return best_idx

selected_tree_index = select_final_tree(distances, auc_scores, pareto_trees)
print("Selected tree index:", selected_tree_index)

Selected tree index: 23

==================================================
Running for dataset DataSet_Combined_SI_SNI_Baseline_FE with seed 42
==================================================
ds_nameDataSet_Combined_SI_SNI_Baseline_FE.csv
Experiment: experiment_20250501_134848_seed_42_DataSet_Combined_SI_SNI_Baseline_FE_suicidea - Seed: 42 - Dataset: DataSet_Combined_SI_SNI_Baseline_FE
Number of samples in the full dataset: 586
Number of samples in the training set: 726
Number of samples in the test set: 242
Shape of training set: (726, 56)
Shape of random split: (363, 56), (363,)
Number of trees in T0: 20
Number of trees in T: 20
Computing average tree distances |████████████████████████████████████████| 20/20 [100%] in 20.7s (0.96/s) 
Number of distances computed: 20
Average AUC score: 0.821854723038044
Number of Pareto optimal trees: 7
Frequenicies of top 2 common features: [[('trauma_sum', 70.0), ('fight', 20.0)], [('harddrug_life', 45.0), ('exchange', 15.0)], [('LEAF_NODE', 25.0), ('harddrug_life', 20.0)]]
Selected stability-accuracy trade-off final tree index: 1
Stability-accuracy tree depth: 4, nodes: 23
Selected AUC maximizing tree index: 1
AUC-maximizing tree depth: 4, nodes: 23
Selected distance minimizing tree index: 15
Distance-minimizing tree depth: 11, nodes: 79
Completed experiment:

==================================================
Running for dataset DataSet_Combined_SI_SNI_Baseline_FE with seed 42
==================================================
ds_nameDataSet_Combined_SI_SNI_Baseline_FE.csv
Experiment: experiment_20250501_134911_seed_42_DataSet_Combined_SI_SNI_Baseline_FE_suicattempt - Seed: 42 - Dataset: DataSet_Combined_SI_SNI_Baseline_FE
Number of samples in the full dataset: 587
Number of samples in the training set: 627
Number of samples in the test set: 209
Shape of training set: (627, 56)
Shape of random split: (313, 56), (313,)
Number of trees in T0: 20
Number of trees in T: 20
Computing average tree distances |████████████████████████████████████████| 20/20 [100%] in 9.4s (2.13/s) 
Number of distances computed: 20
Average AUC score: 0.8648953261927945
Number of Pareto optimal trees: 5
Frequenicies of top 2 common features: [[('fighthurt', 45.0), ('fight', 35.0)], [('trauma_sum', 40.0), ('degree', 15.0)], [('gettherapy', 15.0), ('exchange', 15.0)]]
Selected stability-accuracy trade-off final tree index: 18
Stability-accuracy tree depth: 7, nodes: 59
Selected AUC maximizing tree index: 8
AUC-maximizing tree depth: 4, nodes: 19
Selected distance minimizing tree index: 11
Distance-minimizing tree depth: 10, nodes: 61
Completed experiment: experiment_20250501_134911_seed_42_DataSet_Combined_SI_SNI_Baseline_FE_suicattempt

==================================================
Running for dataset breast_cancer with seed 42
==================================================
ds_namebreast_cancer.csv
Experiment: experiment_20250501_134921_seed_42_breast_cancer_target - Seed: 42 - Dataset: breast_cancer
Number of samples in the full dataset: 569
Number of samples in the training set: 535
Number of samples in the test set: 179
Shape of training set: (535, 30)
Shape of random split: (267, 30), (267,)
Number of trees in T0: 20
Number of trees in T: 20
Computing average tree distances |████████████████████████████████████████| 20/20 [100%] in 1.9s (10.51/s) 
Number of distances computed: 20
Average AUC score: 0.9502996254681648
Number of Pareto optimal trees: 5
Frequenicies of top 2 common features: [[('worst perimeter', 65.0), ('mean concave points', 20.0)], [('worst smoothness', 15.0), ('worst radius', 15.0)], [('area error', 30.0), ('worst texture', 10.0)]]
Selected stability-accuracy trade-off final tree index: 14
Stability-accuracy tree depth: 5, nodes: 19
Selected AUC maximizing tree index: 12
AUC-maximizing tree depth: 3, nodes: 9
Selected distance minimizing tree index: 13
Distance-minimizing tree depth: 6, nodes: 23
Completed experiment: experiment_20250501_134921_seed_42_breast_cancer_target

Visualizing Selected Tree

Aggregated metrics

Unbalanced vs SMOTE processed aggregate metrics

Results - Aggregated STD

Results - Tree depth

Results - Number of nodes

Results - Distinct top features

Results - Current Considerations

Correlation to other aggregate metrics with magnitude of distance?
What the “right depth?”” (bias-variance trade off)
- optimizing for distance -> deeper trees
Is T0 large enough currently statistically to represent T when bootstrapping?
- Checks for distribution shift in sampling

Thinking About Stability

Pareto Frontier Visualization

Average distance

\(d_{b}\) , \(\forall b \in \mathcal{T}\)

Out-of-sample AUCROC

\(a_{b}\), \(\forall b \in \mathcal{T}\)

Future work - Short term

Refine what stability means with our stakeholders
- What do we measure concerning and how do we quantify success?
Hyper-parameter finetuning
- Path distance class weight
- Pareto tree selection strategy
- Data pre-processing
Stability-performance trade-off when selecting Pareto Optimal tree?

Future work - Long term

Does implementation meet the criteria for reliability in identifying YEH?
Feature selection and qualitative pre-processing
Robustness: direct versus indirect perturbation sensitivity analysis
- Direct: Changes in tree structure (threshold pertubations)
- Indirect: Modifications in training data

Thoughts and points of dicussion

Does the sequence of paths matter? [10]

The study of [39] proposes a new distance metric to quantify the structural differences and prediction similarities between decision trees. … However, the metric does not consider the sequence of splits, thereby potentially overlooking the overall structural similarity of the trees. Furthermore, the approach suffers from high computational complexity due to the need for pairwise comparison of all paths between two trees, which can become computationally expensive as the number of leaf nodes increases.

References

[1]

M. Kirst, T. Frederick, and P. G. Erickson, “Concurrent mental health and substance use problems among street-involved youth,” International Journal of Mental Health and Addiction, vol. 9, no. 5, pp. 543–553, 2011, doi: 10.1007/s11469-011-9328-3.

[2]

L. Rew, M. Taylor-Seehafer, and M. L. Fitzgerald, “Sexual abuse, alcohol and other drug use, and suicidal behaviors in homeless adolescents,” Issues Compr Pediatr Nurs, vol. 24, no. 4, pp. 225–240, 2001, doi: 10.1080/014608601753260326.

[3]

D. Eaton, “Youth risk behavior surveillance - United States, 2011 - PubMed.” https://pubmed.ncbi.nlm.nih.gov/22673000/ (accessed May 01, 2025).

[4]

C. Merscham, J. M. Van Leeuwen, and M. McGuire, “Mental health and substance abuse indicators among homeless youth in Denver, Colorado,” Child Welfare, vol. 88, no. 2, pp. 93–110, 2009, Available: https://www.ncbi.nlm.nih.gov/pubmed/19777794

[5]

B. E. Molnar, S. B. Shade, A. H. Kral, R. E. Booth, and J. K. Watters, “Suicidal behavior and sexual/physical abuse among street youth,” Child Abuse Negl, vol. 22, no. 3, pp. 213–222, Mar. 1998, doi: 10.1016/s0145-2134(97)00137-3.

[6]

E. Votta and I. Manion, “Suicide, high-risk behaviors, and coping style in homeless adolescent males’ adjustment,” Journal of Adolescent Health, vol. 34, no. 3, pp. 237–243, 2004, doi: 10.1016/j.jadohealth.2003.06.002.

[7]

“Preventing suicide through connectedness.” https://stacks.cdc.gov/view/cdc/11795 (accessed May 01, 2025).

[8]

A. Fulginiti, E. Rice, H.-T. Hsu, H. Rhoades, and H. Winetrobe, “Risky integration: A social network analysis of network position, exposure, and suicidal ideation among homeless youth,” Crisis: The Journal of Crisis Intervention and Suicide Prevention, vol. 37, no. 3, pp. 184–193, 2016, doi: 10.1027/0227-5910/a000374.

[9]

D. Bertsimas and V. D. Jr, “Improving Stability in Decision Tree Models.” arXiv, May 2023. doi: 10.48550/arXiv.2305.17299.

[10]

J. Lee, M. K. Sim, and J.-S. Hong, “Assessing decision tree stability: A comprehensive method for generating a stable decision tree,” IEEE Access, vol. 12, pp. 90061–90072, 2024, doi: 10.1109/ACCESS.2024.3419228.