Health Case Variable Glossary
=============================

Files produced:
  • targets.csv  → columns: id, start_time, end_time, target
  • features.csv → columns: time, id, tumor_marker, weight, bmi, overweight_flag, inflammation, lesion_size, vital_instability, age, smoking_packyears, family_history, exercise_level, cholesterol, spo2

Time semantics:
  • 'time' is a sequential visit index (treat as monthly visits).
  • Each patient is observed only from start_time to end_time; rows outside this
    window were generated as zeros and should be ignored or masked before transforms like log.

Label semantics:
  • target = 1  → cancer-positive cohort. The final 6 visits show pre-diagnostic trends:
    tumor_marker↑, weight↓, BMI↓, inflammation↑ (more volatile), lesion_size↑,
    vital_instability↑, SpO₂↓.
  • target = 0  → no strong monotonic trends; mild fluctuations only.

Preprocessing notes:
  • Mask rows outside [start_time, end_time] before logs/ratios, or use log1p with clamping.
  • If standardizing, guard against divide-by-zero on flat/constant columns.
  • Always assert finiteness before model.fit to catch any pipeline regressions.