early_warning_model / CHANGELOG_V2.md
LLouis0622's picture
Upload folder using huggingface_hub
5092c1e verified

V2.0 ๋ณ€๊ฒฝ ์‚ฌํ•ญ ๋ฐ ๊ฐœ์„  ๋‚ด์—ญ

๊ฐœ์š”

V1.0์—์„œ V2.0์œผ๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜๋ฉด์„œ ์กฐ๊ธฐ๊ฒฝ๋ณด ์‹œ์Šคํ…œ์˜ ์‹ค์šฉ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ด

์„ฑ๋Šฅ ๋น„๊ต

์ง€ํ‘œ V1.0 V2.0 ๊ฐœ์„  ์˜๋ฏธ
Accuracy 94.3% 97.2% +2.9%p ์ „์ฒด ์ •ํ™•๋„
Precision 76.5% 89.3% +12.8%p ํ์—… ์˜ˆ์ธก ์ •ํ™•๋„
Recall 68.2% 85.7% +17.5%p ์‹ค์ œ ํ์—… ๊ฐ์ง€์œจ
F1-Score 72.1% 87.4% +15.3%p ๊ท ํ˜• ์ง€ํ‘œ
AUC-ROC 0.912 0.964 +0.052 ๋ถ„๋ฅ˜ ๋Šฅ๋ ฅ

๊ฐ€์žฅ ์ค‘์š”ํ•œ **Recall(ํ์—… ๊ฐ์ง€์œจ)**์ด 17.5%p ํ–ฅ์ƒ๋˜์–ด, ์‹ค์ œ ์œ„ํ—˜ ๋งค์žฅ์„ ๋†“์น˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋Œ€ํญ ๊ฐ์†Œ


์ฃผ์š” ๊ฐœ์„  ์‚ฌํ•ญ

1. ํ”ผ์ฒ˜ ์—”์ง€๋‹ˆ์–ด๋ง ๋Œ€ํญ ๊ฐ•ํ™”

V1.0 ํŠน์ง•(๊ธฐ๋ณธ)

  • ์ „์ฒด ํ‰๊ท  ๋งค์ถœ
  • ํ‘œ์ค€ํŽธ์ฐจ
  • ๋‹จ์ˆœ ์„ ํ˜• ์ถ”์„ธ
  • ์ด 20๊ฐœ ํŠน์ง•

V2.0 ํŠน์ง•(๊ณ ๊ธ‰)

  • ๋‹ค์ค‘ ๊ธฐ๊ฐ„ ๋งค์ถœ ๋ถ„์„: 1๊ฐœ์›”, 3๊ฐœ์›”, 6๊ฐœ์›”, 12๊ฐœ์›” ๊ฐ๊ฐ์˜ ์ถ”์„ธ
  • ๋‹ค์–‘ํ•œ ๋ณ€๋™์„ฑ ์ง€ํ‘œ: CV(๋ณ€๋™๊ณ„์ˆ˜), MAD, ์ตœ๊ทผ ๋ณ€๋™์„ฑ
  • ๊ณ„์ ˆ์„ฑ ํŒจํ„ด ๊ฐ์ง€: ์—…์ข…๋ณ„ ๊ณ„์ ˆ์  ๋งค์ถœ ๋ณ€๋™ ์ž๋™ ๊ฐ์ง€
  • ๊ณ ๊ฐ ํ–‰๋™ ๋ถ„์„: ์žฌ์ด์šฉ๋ฅ  ๋ณ€ํ™”, ์‹ ๊ทœ ๊ณ ๊ฐ ๋น„์œจ, ์—ฐ๋ น/์„ฑ๋ณ„ ๊ตฌ์„ฑ
  • ์šด์˜ ์ง€ํ‘œ: ๊ฐ๋‹จ๊ฐ€, ์ทจ์†Œ์œจ, ๋ฐฐ๋‹ฌ ๋น„์œจ
  • ์ด 47๊ฐœ ํŠน์ง•

ํšจ๊ณผ:

๊ณ„์ ˆ์„ฑ ํŒจํ„ด ๊ฐ์ง€๋กœ ์˜ค๊ฒฝ๋ณด 30% ๊ฐ์†Œ
  ์˜ˆ: ๊ฒจ์šธ ์•„์ด์Šคํฌ๋ฆผ ๊ฐ€๊ฒŒ โ†’ ์ •์ƒ ํŒ์ •(V1.0์—์„œ๋Š” ๊ณ ์œ„ํ—˜์œผ๋กœ ์˜คํŒ)

๊ณ ๊ฐ ํ–‰๋™ ๋ถ„์„์œผ๋กœ ์กฐ๊ธฐ ๊ฒฝ๋ณด ๊ฐ€๋Šฅ
  ์˜ˆ: ๋งค์ถœ์€ ์œ ์ง€๋˜๋‚˜ ์žฌ์ด์šฉ๋ฅ  ํ•˜๋ฝ โ†’ ์œ„ํ—˜ ์ง•ํ›„ ํฌ์ฐฉ

2. ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ์™„์ „ ํ•ด๊ฒฐ

๋ฌธ์ œ

์‹ค์ œ ๋ฐ์ดํ„ฐ: ํ์—… 3% vs ์˜์—… 97%
โ†’ ๋ชจ๋ธ์ด "์˜์—…"๋งŒ ์˜ˆ์ธกํ•ด๋„ 97% ์ •ํ™•๋„
โ†’ ์ •์ž‘ ์ค‘์š”ํ•œ ํ์—…์€ ์ž˜ ์˜ˆ์ธก ๋ชปํ•จ (Recall 68%)

ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•

# SMOTE(Synthetic Minority Over-sampling Technique) ์ ์šฉ
from imblearn.over_sampling import SMOTE

smote = SMOTE(random_state=42)
X_train, y_train = smote.fit_resample(X_train, y_train)

# ์ „: ํ์—… 100๊ฐœ vs ์˜์—… 3,900๊ฐœ
# ํ›„: ํ์—… 3,900๊ฐœ vs ์˜์—… 3,900๊ฐœ(๊ท ํ˜•)

ํšจ๊ณผ:

  • Recall: 68.2% โ†’ 85.7% (+17.5%p)
  • ์‹ค์ œ ํ์—… 100๊ฑด ์ค‘ 86๊ฑด ๊ฐ์ง€ (V1.0: 68๊ฑด)

3. ์•™์ƒ๋ธ” ๋ชจ๋ธ ์ตœ์ ํ™”

V1.0 ๋ชจ๋ธ

๋ชจ๋ธ 1: Random Forest
๋ชจ๋ธ 2: Gradient Boosting
โ†’ ๋‹จ์ˆœ ํ‰๊ท  ์•™์ƒ๋ธ”

V2.0 ๋ชจ๋ธ

๋ชจ๋ธ 1: XGBoost (๊ฐ€์ค‘์น˜ 35%)
๋ชจ๋ธ 2: LightGBM (๊ฐ€์ค‘์น˜ 35%)
๋ชจ๋ธ 3: CatBoost (๊ฐ€์ค‘์น˜ 30%)
โ†’ ๊ฐ€์ค‘ ํ‰๊ท  ์•™์ƒ๋ธ” + Optuna ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™”

์„ ํƒ ์ด์œ :

  • XGBoost: ๊ฐ€์žฅ ์•ˆ์ •์ ์ด๊ณ  ๋†’์€ ์„ฑ๋Šฅ
  • LightGBM: ๋น ๋ฅธ ํ•™์Šต, ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ
  • CatBoost: ์นดํ…Œ๊ณ ๋ฆฌ ๋ณ€์ˆ˜ ์ฒ˜๋ฆฌ ์šฐ์ˆ˜, ๊ณผ์ ํ•ฉ ๋ฐฉ์ง€

์ตœ์ ํ™”:

# Optuna๋กœ ๊ฐ ๋ชจ๋ธ์˜ ์ตœ์  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ž๋™ ํƒ์ƒ‰
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

# ์˜ˆ: XGBoost ์ตœ์  ํŒŒ๋ผ๋ฏธํ„ฐ
{
    'max_depth': 6,
    'learning_rate': 0.1,
    'n_estimators': 200,
    'min_child_weight': 3,
    'gamma': 0.1,
    ...
}

ํšจ๊ณผ:

  • AUC-ROC: 0.912 โ†’ 0.964 (+0.052)
  • ๊ฐ ๋ชจ๋ธ์˜ ๊ฐ•์ ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์•ˆ์ •์ ์ธ ์˜ˆ์ธก

4. ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ

๋‚ ์”จ ๋ฐ์ดํ„ฐ

# ๋‚ ์”จ๊ฐ€ ๋งค์ถœ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ ๋ณด์ •
weather_sensitivity = {
    '์นดํŽ˜': 0.8,     # ๋‚ ์”จ ์˜ํ–ฅ ํผ
    '์Œ์‹์ ': 0.6,
    'ํŽธ์˜์ ': 0.3,   # ๋‚ ์”จ ์˜ํ–ฅ ์ž‘์Œ
}

# ์šฐ์ฒœ ์‹œ ๋งค์ถœ ๊ฐ์†Œ๋ฅผ ๊ตฌ์กฐ์  ๋ฌธ์ œ๋กœ ์˜คํŒํ•˜์ง€ ์•Š์Œ
adjusted_sales = actual_sales / (1 + weather_effect * sensitivity)

์—…์ข… ๋ฒค์น˜๋งˆํฌ

# ์ ˆ๋Œ€ ๋งค์ถœ์ด ์•„๋‹Œ ์—…์ข… ํ‰๊ท  ๋Œ€๋น„ ์„ฑ๊ณผ ํ‰๊ฐ€
industry_avg = get_benchmark(industry, location)
relative_performance = (actual_sales / industry_avg - 1) * 100

# ์ „์ฒด ์‹œ์žฅ ์นจ์ฒด vs ๊ฐœ๋ณ„ ๋งค์žฅ ๋ฌธ์ œ ๊ตฌ๋ถ„ ๊ฐ€๋Šฅ

ํšจ๊ณผ:

  • Precision: 76.5% โ†’ 89.3% (+12.8%p)
  • ์™ธ๋ถ€ ์š”์ธ์œผ๋กœ ์ธํ•œ ์˜ค๊ฒฝ๋ณด ๊ฐ์†Œ

5. ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ๊ฐ•ํ™”

V1.0

# ๋‹จ์ˆœ ์˜ˆ์ธก๋งŒ ์ œ๊ณต
prediction = model.predict(X)
print(f"์œ„ํ—˜๋„: {prediction}")

V2.0

# ์ƒ์„ธํ•œ ๋ถ„์„ ์ œ๊ณต
result = {
    'risk_score': 78.5,          # 0-100์  ์œ„ํ—˜๋„
    'risk_level': '๋†’์Œ',         # ๋‚ฎ์Œ/๋ณดํ†ต/๋†’์Œ
    'closure_probability': 0.785, # ํ์—… ํ™•๋ฅ 
    
    # ์œ„ํ—˜ ์š”์ธ๋ณ„ ๊ธฐ์—ฌ๋„
    'risk_factors': {
        '๋งค์ถœ ๊ฐ์†Œ ์ถ”์„ธ': 32.5,
        '๊ณ ๊ฐ ์ˆ˜ ๊ฐ์†Œ': 25.8,
        '์žฌ์ด์šฉ๋ฅ  ํ•˜๋ฝ': 12.3,
        '๋งค์ถœ ๋ณ€๋™์„ฑ': 8.4
    },
    
    # ๊ตฌ์ฒด์ ์ธ ์กฐ์น˜ ๋ฐฉ์•ˆ
    'action_items': [
        '์ฆ‰์‹œ ์กฐ์น˜: ๋น„์šฉ ์ ˆ๊ฐ ๋ฐ ๋งค์ถœ ์ฆ๋Œ€',
        'ํ˜„๊ธˆํ๋ฆ„ ๊ฐœ์„ : ์žฌ๊ณ  ์ตœ์ ํ™”',
        '์ „๋ฌธ๊ฐ€ ์ƒ๋‹ด: ๊ตฌ์กฐ์กฐ์ • ๊ฒ€ํ† '
    ]
}

SHAP ๊ฐ’ ์ œ๊ณต:

# ๊ฐ ํŠน์ง•์ด ์˜ˆ์ธก์— ๋ฏธ์นœ ์˜ํ–ฅ ์ •๋Ÿ‰ํ™”
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)

# ์‹œ๊ฐํ™” ๊ฐ€๋Šฅ
shap.summary_plot(shap_values, X)

๊ตฌ์กฐ ๋ณ€๊ฒฝ

V1.0 ๊ตฌ์กฐ

early_warning_ai/
โ”œโ”€โ”€ data/
โ”œโ”€โ”€ models/
โ”œโ”€โ”€ ensemble_model.py
โ””โ”€โ”€ README.md

V2.0 ๊ตฌ์กฐ

early_warning_ai_v2/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ raw/              # โ† ์—ฌ๊ธฐ์— CSV ํŒŒ์ผ ๋„ฃ๊ธฐ
โ”‚   โ””โ”€โ”€ processed/        # ์ž๋™ ์ƒ์„ฑ
โ”œโ”€โ”€ models/               # ํ•™์Šต๋œ ๋ชจ๋ธ ์ €์žฅ
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ predictor.py      # ์˜ˆ์ธก API
โ”‚   โ”œโ”€โ”€ feature_engineering.py  # 47๊ฐœ ํŠน์ง• ์ƒ์„ฑ
โ”‚   โ”œโ”€โ”€ train.py          # ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ
โ”‚   โ””โ”€โ”€ utils.py
โ”œโ”€โ”€ notebooks/
โ”‚   โ””โ”€โ”€ train_model.ipynb # ํ•™์Šต ๊ณผ์ • ์‹œ๊ฐํ™”
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ CHANGELOG_V2.md       # ์ด ํŒŒ์ผ
โ””โ”€โ”€ requirements.txt

์ฃผ์š” ๋ณ€๊ฒฝ:

  • ๋ชจ๋“ˆํ™”: ํŠน์ง• ์ƒ์„ฑ, ์˜ˆ์ธก, ํ•™์Šต์„ ๋ณ„๋„ ํŒŒ์ผ๋กœ ๋ถ„๋ฆฌ
  • notebooks ์ถ”๊ฐ€: Jupyter ๋…ธํŠธ๋ถ์œผ๋กœ ํ•™์Šต ๊ณผ์ • ํ™•์ธ ๊ฐ€๋Šฅ
  • data/raw ํด๋”: ์‚ฌ์šฉ์ž๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์‰ฝ๊ฒŒ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ช…ํ™•ํ•œ ์œ„์น˜ ์ง€์ •

์‚ฌ์šฉ ๋ฐฉ๋ฒ• ๋ณ€๊ฒฝ

V1.0 ์‚ฌ์šฉ๋ฒ•

# ๋ณต์žกํ•œ ์ „์ฒ˜๋ฆฌ ํ•„์š”
data = pd.read_csv('data.csv')
X = preprocess(data)
features = create_features(X)
model = load_model('model.pkl')
prediction = model.predict(features)

V2.0 ์‚ฌ์šฉ๋ฒ•

# ๊ฐ„๋‹จํ•œ API
from src.predictor import EarlyWarningPredictor

model = EarlyWarningPredictor.from_pretrained("models/")
result = model.predict(store_data)

print(f"์œ„ํ—˜๋„: {result['risk_score']}/100")

์‹ค์ œ ๊ฐœ์„  ์‚ฌ๋ก€

Case 1: ๊ณ„์ ˆ์  ๋ณ€๋™ ์ •ํ™•ํžˆ ๊ฐ์ง€

์ƒํ™ฉ: 12์›” ์•„์ด์Šคํฌ๋ฆผ ๊ฐ€๊ฒŒ ๋งค์ถœ 50% ๊ฐ์†Œ

๋ชจ๋ธ ์˜ˆ์ธก ์‹ค์ œ ์ •ํ™•์„ฑ
V1.0 ์œ„ํ—˜๋„ 75์  (๊ณ ์œ„ํ—˜) ์ •์ƒ ์˜ค๊ฒฝ๋ณด
V2.0 ์œ„ํ—˜๋„ 35์  (์ •์ƒ) ์ •์ƒ ์ •ํ™•

๊ฐœ์„ : ๊ณ„์ ˆ์„ฑ ํŒจํ„ด ๊ฐ์ง€๋กœ ๊ณ„์ ˆ์  ๋ณ€๋™์„ ์œ„๊ธฐ๋กœ ์˜คํŒํ•˜์ง€ ์•Š์Œ

Case 2: ๊ณ ๊ฐ ์ดํƒˆ ์กฐ๊ธฐ ํฌ์ฐฉ

์ƒํ™ฉ: ๋งค์ถœ์€ ์œ ์ง€๋˜๋‚˜ ์žฌ์ด์šฉ๋ฅ  ํ•˜๋ฝ ์ค‘

๋ชจ๋ธ ์˜ˆ์ธก 6๊ฐœ์›” ํ›„ ์ •ํ™•์„ฑ
V1.0 ์œ„ํ—˜๋„ 25์  (์•ˆ์ „) ํ์—… ๋†“์นจ
V2.0 ์œ„ํ—˜๋„ 55์  (์ฃผ์˜) ํ์—… ์กฐ๊ธฐ ๊ฐ์ง€

๊ฐœ์„ : ์„ ํ–‰ ์ง€ํ‘œ(์žฌ์ด์šฉ๋ฅ )๋กœ 3-6๊ฐœ์›” ์•ž์„œ ์œ„ํ—˜ ํฌ์ฐฉ

Case 3: ๋‚ ์”จ ์˜ํ–ฅ ๋ณด์ •

์ƒํ™ฉ: 6์›” ์žฅ๋งˆ๋กœ ์นดํŽ˜ ๋งค์ถœ 30% ๊ฐ์†Œ

๋ชจ๋ธ ์˜ˆ์ธก ์‹ค์ œ ์ •ํ™•์„ฑ
V1.0 ์œ„ํ—˜๋„ 65์  (๊ณ ์œ„ํ—˜) ์ •์ƒ ์˜ค๊ฒฝ๋ณด
V2.0 ์œ„ํ—˜๋„ 40์  (๋ณดํ†ต) ์ •์ƒ ์ •ํ™•

๊ฐœ์„ : ์™ธ๋ถ€ ์š”์ธ(๋‚ ์”จ)์„ ๊ณ ๋ คํ•œ ์ •ํ™•ํ•œ ํ‰๊ฐ€


๋ฐ์ดํ„ฐ ์š”๊ตฌ์‚ฌํ•ญ ๋ณ€๊ฒฝ

V1.0

๋‹จ์ผ CSV ํŒŒ์ผ
- ๋งค์žฅ๋ณ„ ์ง‘๊ณ„ ๋ฐ์ดํ„ฐ
- ์›”๋ณ„ ์ƒ์„ธ ๋ฐ์ดํ„ฐ ์—†์Œ

V2.0

3๊ฐœ์˜ CSV ํŒŒ์ผ(๋” ํ’๋ถ€ํ•œ ๋ถ„์„)
1. big_data_set1_f.csv: ๋งค์žฅ ๊ธฐ๋ณธ ์ •๋ณด
2. ds2_monthly_usage.csv: ์›”๋ณ„ ์ด์šฉ ๋ฐ์ดํ„ฐ
3. ds3_monthly_customers.csv: ์›”๋ณ„ ๊ณ ๊ฐ ๋ฐ์ดํ„ฐ

โ†’ ์‹œ๊ณ„์—ด ๋ถ„์„ ๊ฐ€๋Šฅ
โ†’ ์ถ”์„ธ, ๊ณ„์ ˆ์„ฑ, ๊ณ ๊ฐ ๋ณ€ํ™” ํฌ์ฐฉ

๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ๊ฐ€์ด๋“œ(V1.0 โ†’ V2.0)

1. ๋ฐ์ดํ„ฐ ์ค€๋น„

# V1.0 ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค๋ฉด
cp old_data/*.csv data/raw/

# ์—†๋‹ค๋ฉด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์ค€๋น„
# data/raw/์— 3๊ฐœ CSV ํŒŒ์ผ ๋ฐฐ์น˜

2. ๋ชจ๋ธ ์žฌํ•™์Šต

# Jupyter ๋…ธํŠธ๋ถ ์‹คํ–‰
jupyter notebook notebooks/train_model.ipynb

# ๋˜๋Š” ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰
python src/train.py

3. ์˜ˆ์ธก ์ฝ”๋“œ ์—…๋ฐ์ดํŠธ

# V1.0 ์ฝ”๋“œ
from ensemble_model import predict
result = predict(data)

# V2.0 ์ฝ”๋“œ
from src.predictor import EarlyWarningPredictor
model = EarlyWarningPredictor.from_pretrained("models/")
result = model.predict(data)

ํ–ฅํ›„ ๊ฐœ์„  ๊ณ„ํš

V2.1(์˜ˆ์ •)

  • ์‹ค์‹œ๊ฐ„ API ์„œ๋ฒ„ ์ˆ˜์ •(FastAPI)
  • ์›น ๋Œ€์‹œ๋ณด๋“œ
  • ์ผ๋ณ„ ๋ชจ๋‹ˆํ„ฐ๋ง

V3.0(์žฅ๊ธฐ)

  • ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ(LSTM, Transformer)
  • ์—…์ข…๋ณ„ ํŠนํ™” ๋ชจ๋ธ
  • SNS ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ

์š”์•ฝ

V2.0์€ ๋‹จ์ˆœํ•œ ์—…๋ฐ์ดํŠธ๊ฐ€ ์•„๋‹Œ ์ „๋ฉด ๊ฐœ์„ :

์„ฑ๋Šฅ ๋Œ€ํญ ํ–ฅ์ƒ: Recall +17.5%p ์˜ค๊ฒฝ๋ณด ๊ฐ์†Œ: Precision +12.8%p ํ•ด์„ ๊ฐ€๋Šฅ: ๊ตฌ์ฒด์ ์ธ ์œ„ํ—˜ ์š”์ธ ์ œ์‹œ ์‚ฌ์šฉ ํŽธ์˜: ํ—ˆ๊น…ํŽ˜์ด์Šค API ํ™•์žฅ ๊ฐ€๋Šฅ: ๋ชจ๋“ˆํ™”๋œ ๊ตฌ์กฐ