Early Warning System for Flood Risk Detection in Bandar Lampung

Flood risk classification based on meteorological data using Random Forest algorithm

Dataset: 2010-2020 50 Records Lampung University

Accuracy

93.3%

Correct classification rate

Recall

100%

Perfect flood detection

Precision

85.7%

Positive prediction accuracy

AUC-ROC

94.4%

Classification capability

Feature Importance Analysis

Random Forest

Key Finding: Total Rainfall is the dominant factor with importance score >0.5, followed by Humidity and Air Temperature which significantly contribute to flood risk classification.

Confusion Matrix

8
True Negative
1
False Positive
0
False Negative
6
True Positive

Perfect Recall: Model successfully detected all flood events (0 False Negative)

Correlation Matrix - Meteorological Factors

Weekly Rainfall
1.00
→ Flood
Daily Rainfall
0.76
→ Flood
Humidity
0.74
→ Flood
Air Temperature
0.63
→ Flood

Correlation Insight: Weekly rainfall has a perfect correlation (1.00) with flood events, making it the strongest predictor in the classification model.

Cross-Validation Results

90%
Mean CV Score
97%
OOB Score

Model Architecture

Algorithm

Random Forest Classifier

Training Data

70% Split (35 samples)

Parameters

50 estimators, balanced weights

Validation

Stratified 5-Fold CV

Primary Factor

Weekly rainfall is the dominant predictor with a perfect correlation (1.00) to flood events, highlighting the importance of precipitation accumulation monitoring for early detection.

Model Reliability

100% Recall ensures no flood events are missed, ideal for early warning systems that prioritize public safety and disaster preparedness.

Implementation Ready

With 93.3% accuracy and 97% OOB score, the model is ready for implementation in real-time flood risk detection systems in Bandar Lampung.

Technical Specifications & Methodology

Data Processing Pipeline

  • Preprocessing: Data normalization (0-1), handling missing values, feature engineering
  • Feature Selection: 9 meteorological variables + 3 temporal features (year, month, day)
  • Data Split: 70% training (35 samples), 30% testing (15 samples)

Model Configuration

  • Algorithm: Random Forest Classifier (Ensemble Learning)
  • Hyperparameters: 50 estimators, balanced class weights, OOB scoring enabled
  • Validation: Stratified 5-Fold Cross Validation for optimal generalization

Flood Risk Detection System - Data Mining Project

Lampung University | Informatics Engineering | 2024/2025

Diki Darmawan