Skin Lesion Detection Using Convolutional Neural Networks – ISI…



Welcome to my project on skin lesion detection, developed as part of the ISIC 2024 Challenge. This project marks my first foray into this competition, where the objective is to build a model capable of accurately identifying skin lesions from medical images. The dataset provided by the challenge serves as the foundation for training and evaluating the model.
In this project, I have applied various techniques to ensure that the model performs robustly. Below is an overview of the approach taken:
Data Preprocessing: Cleaning and preparing the dataset for effective training.
Data Augmentation: Enhancing the dataset through techniques that improve the model’s ability to generalize.
Model Building: Designing and implementing a Convolutional Neural Network (CNN) using Keras and TensorFlow.
Training: Fine-tuning the model with appropriate loss functions and metrics to optimize performance.
Evaluation: Rigorous assessment of the model’s accuracy using validation and test datasets.
In [1]:
import os os.environ["KERAS_BACKEND"] = "tensorflow" import keras_cv import keras from keras import ops import tensorflow as tf import cv2 import pandas as pd import numpy as np from glob import glob from tqdm.notebook import tqdm import joblib import matplotlib.pyplot as plt print("TensorFlow:", tf.__version__) print("Keras:", keras.__version__) print("KerasCV:", keras_cv.__version__) TensorFlow: 2.16.1 Keras: 3.3.3 KerasCV: 0.9.0
In [2]:
class CFG: verbose = 1 # Verbosity seed = 42 # Random seed neg_sample = 0.01 # Downsample negative calss pos_sample = 5.0 # Upsample positive class preset = "efficientnetv2_b2_imagenet" # Name of pretrained classifier image_size = [128, 128] # Input image size epochs = 8 # Training epochs batch_size = 256 # Batch size lr_mode = "cos" # LR scheduler mode from one of "cos", "step", "exp" class_names = ['target'] num_classes = 1
In [3]:
In [4]:
BASE_PATH = "/kaggle/input/isic-2024-challenge"
In [5]:
# Train + Valid df = pd.read_csv(f'{BASE_PATH}/train-metadata.csv', low_memory=False) df = df.ffill() display(df.head(2)) # Testing testing_df = pd.read_csv(f'{BASE_PATH}/test-metadata.csv', low_memory=False) testing_df = testing_df.ffill() display(testing_df.head(2)) isic_idtargetpatient_idage_approxsexanatom_site_generalclin_size_long_diam_mmimage_typetbp_tile_typetbp_lv_A…lesion_ididdx_fulliddx_1iddx_2iddx_3iddx_4iddx_5mel_mitotic_indexmel_thick_mmtbp_lv_dnn_lesion_confidence0ISIC_00156700IP_123582860.0malelower extremity3.04TBP tile: close-up3D: white20.244422…NaNBenignBenignNaNNaNNaNNaNNaNNaN97.5172821ISIC_00158450IP_817006560.0malehead/neck1.10TBP tile: close-up3D: white31.712570…IL_6727506BenignBenignNaNNaNNaNNaNNaNNaN3.141455
2 rows × 55 columns
isic_idpatient_idage_approxsexanatom_site_generalclin_size_long_diam_mmimage_typetbp_tile_typetbp_lv_Atbp_lv_Aext…tbp_lv_radial_color_std_maxtbp_lv_stdLtbp_lv_stdLExttbp_lv_symm_2axistbp_lv_symm_2axis_angletbp_lv_xtbp_lv_ytbp_lv_zattributioncopyright_license0ISIC_0015657IP_607433745.0maleposterior torso2.70TBP tile: close-up3D: XP22.8043320.007270…0.3048271.2815322.2999350.47933920-155.065101511.222000113.980100Memorial Sloan Kettering Cancer CenterCC-BY1ISIC_0015729IP_166413935.0femalelower extremity2.52TBP tile: close-up3D: XP16.648679.657964…0.0000001.2719402.0112230.42623025-112.36924629.535889-15.019287Frazer Institute, The University of Queensland…CC-BY
2 rows × 44 columns
In [6]:
print("Class Distribution Before Sampling (%):") display(*100) # Sampling positive_df = df.query("target==0").sample(frac=CFG.neg_sample, random_state=CFG.seed) negative_df = df.query("target==1").sample(frac=CFG.pos_sample, replace=True, random_state=CFG.seed) df = pd.concat([positive_df, negative_df], axis=0).sample(frac=1.0) print("\nCalss Distribution After Sampling (%):") display(*100) Class Distribution Before Sampling (%): target 0 99.902009 1 0.097991 Name: proportion, dtype: float64Calss Distribution After Sampling (%): target 0 67.09645 1 32.90355 Name: proportion, dtype: float64
In [7]:
from sklearn.utils.class_weight import compute_class_weight # Assume df is your DataFrame and 'target' is the column with class labels class_weights = compute_class_weight('balanced', classes=np.unique(df['target']), y=df['target']) class_weights = dict(enumerate(class_weights)) print("Class Weights:", class_weights) Class Weights: {0: 0.7451959071624656, 1: 1.519592875318066}
In [8]:
import h5py training_validation_hdf5 = h5py.File(f"{BASE_PATH}/train-image.hdf5", 'r') testing_hdf5 = h5py.File(f"{BASE_PATH}/test-image.hdf5", 'r')
In [9]:
isic_id = df.isic_id.iloc[0] # Image as Byte String byte_string = training_validation_hdf5[isic_id][()] print(f"Byte String: {byte_string[:20]}....") # Convert byte string to numpy array nparr = np.frombuffer(byte_string, np.uint8) print("Image:") image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)[...,::-1] # reverse last axis for bgr -> rgb plt.imshow(image); Byte String: b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00'.... Image:
In [10]:
from sklearn.model_selection import StratifiedGroupKFold df = df.reset_index(drop=True) # ensure continuous index df["fold"] = -1 sgkf = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=CFG.seed) for i, (training_idx, validation_idx) in enumerate(sgkf.split(df,, groups=df.patient_id)): df.loc[validation_idx, "fold"] = int(i) # Use first fold for training and validation training_df = df.query("fold!=0") validation_df = df.query("fold==0") print(f"# Num Train: {len(training_df)} | Num Valid: {len(validation_df)}") # Num Train: 4706 | Num Valid: 1266
In [11]:
target 0 3088 1 1618 Name: count, dtype: int64
In [12]:
target 0 919 1 347 Name: count, dtype: int64
In [13]:
# Categorical features which will be one hot encoded CATEGORICAL_COLUMNS = ["sex", "anatom_site_general", "tbp_tile_type","tbp_lv_location", ] # Numeraical features which will be normalized NUMERIC_COLUMNS = ["age_approx", "tbp_lv_nevi_confidence", "clin_size_long_diam_mm", "tbp_lv_areaMM2", "tbp_lv_area_perim_ratio", "tbp_lv_color_std_mean", "tbp_lv_deltaLBnorm", "tbp_lv_minorAxisMM", ] # Tabular feature columns FEAT_COLS = CATEGORICAL_COLUMNS + NUMERIC_COLUMNS
In [14]:
from tqdm import tqdm from tqdm.notebook import tqdm
In [15]:
def build_augmenter(): # Define augmentations aug_layers = [ keras_cv.layers.RandomCutout(height_factor=(0.02, 0.06), width_factor=(0.02, 0.06)), keras_cv.layers.RandomFlip(mode="horizontal"), ] # Apply augmentations to random samples aug_layers = [keras_cv.layers.RandomApply(x, rate=0.5) for x in aug_layers] # Build augmentation layer augmenter = keras_cv.layers.Augmenter(aug_layers) # Apply augmentations def augment(inp, label): images = inp["images"] aug_data = {"images": images} aug_data = augmenter(aug_data) inp["images"] = aug_data["images"] return inp, label return augment def build_decoder(with_labels=True, target_size=CFG.image_size): def decode_image(inp): # Read jpeg image file_bytes = inp["images"] image = # Resize image = tf.image.resize(image, size=target_size, method="area") # Rescale image image = tf.cast(image, tf.float32) image /= 255.0 # Reshape image = tf.reshape(image, [*target_size, 3]) inp["images"] = image return inp def decode_label(label, num_classes): label = tf.cast(label, tf.float32) label = tf.reshape(label, [num_classes]) return label def decode_with_labels(inp, label=None): inp = decode_image(inp) label = decode_label(label, CFG.num_classes) return (inp, label) return decode_with_labels if with_labels else decode_image
In [16]:
def build_dataset( isic_ids, hdf5, features, labels=None, batch_size=32, decode_fn=None, augment_fn=None, augment=False, shuffle=1024, cache=True, drop_remainder=False, ): if decode_fn is None: decode_fn = build_decoder(labels is not None) if augment_fn is None: augment_fn = build_augmenter() AUTO = images = [None]*len(isic_ids) for i, isic_id in enumerate(tqdm(isic_ids, desc="Loading Images ")): images[i] = hdf5[isic_id][()] inp = {"images": images, "features": features} slices = (inp, labels) if labels is not None else inp ds = ds = ds.cache() if cache else ds ds =, num_parallel_calls=AUTO) if shuffle: ds = ds.shuffle(shuffle, seed=CFG.seed) opt = opt.deterministic = False ds = ds.with_options(opt) ds = ds.batch(batch_size, drop_remainder=drop_remainder) ds =, num_parallel_calls=AUTO) if augment else ds ds = ds.prefetch(AUTO) return ds
In [17]:
## Train print("# Training:") training_features = dict(training_df[FEAT_COLS]) training_ids = training_df.isic_id.values training_labels = training_ds = build_dataset(training_ids, training_validation_hdf5, training_features, training_labels, batch_size=CFG.batch_size, shuffle=True, augment=True) # Valid print("# Validation:") validation_features = dict(validation_df[FEAT_COLS]) validation_ids = validation_df.isic_id.values validation_labels = validation_ds = build_dataset(validation_ids, training_validation_hdf5, validation_features, validation_labels, batch_size=CFG.batch_size, shuffle=False, augment=False) # Training:
Loading Images : 100%
4706/4706 [00:22<00:00, 255.31it/s]
# Validation:
Loading Images : 100%
1266/1266 [00:05<00:00, 256.33it/s]
In [18]:
feature_space = keras.utils.FeatureSpace( features={ # Categorical features encoded as integers "sex": "string_categorical", "anatom_site_general": "string_categorical", "tbp_tile_type": "string_categorical", "tbp_lv_location": "string_categorical", # Numerical features to discretize "age_approx": "float_discretized", # Numerical features to normalize "tbp_lv_nevi_confidence": "float_normalized", "clin_size_long_diam_mm": "float_normalized", "tbp_lv_areaMM2": "float_normalized", "tbp_lv_area_perim_ratio": "float_normalized", "tbp_lv_color_std_mean": "float_normalized", "tbp_lv_deltaLBnorm": "float_normalized", "tbp_lv_minorAxisMM": "float_normalized", }, output_mode="concat", )
In [19]:
training_ds_with_no_labels = x, _: x["features"]) feature_space.adapt(training_ds_with_no_labels)
In [20]:
for x, _ in training_ds.take(1): preprocessed_x = feature_space(x["features"]) print("preprocessed_x.shape:", preprocessed_x.shape) print("preprocessed_x.dtype:", preprocessed_x.dtype) preprocessed_x.shape: (256, 71) preprocessed_x.dtype: <dtype: 'float32'>
In [21]:
training_ds = lambda x, y: ({"images": x["images"], "features": feature_space(x["features"])}, y), validation_ds = lambda x, y: ({"images": x["images"], "features": feature_space(x["features"])}, y),
In [22]:
batch = next(iter(validation_ds)) print("Images:",batch[0]["images"].shape) print("Features:", batch[0]["features"].shape) print("Targets:", batch[1].shape) Images: (256, 128, 128, 3) Features: (256, 71) Targets: (256, 1)
In [23]:
# AUC auc = keras.metrics.AUC() # Loss loss = keras.losses.BinaryCrossentropy(label_smoothing=0.02)
In [24]:
# Define input layers image_input = keras.Input(shape=(*CFG.image_size, 3), name="images") feat_input = keras.Input(shape=(feature_space.get_encoded_features().shape[1],), name="features") inp = {"images":image_input, "features":feat_input} # Branch for image input backbone = keras_cv.models.EfficientNetV2Backbone.from_preset(CFG.preset) x1 = backbone(image_input) x1 = keras.layers.GlobalAveragePooling2D()(x1) x1 = keras.layers.Dropout(0.2)(x1) # Branch for tabular/feature input x2 = keras.layers.Dense(96, activation="selu")(feat_input) x2 = keras.layers.Dense(128, activation="selu")(x2) x2 = keras.layers.Dropout(0.1)(x2) # Concatenate both branches concat = keras.layers.Concatenate()([x1, x2]) # Output layer out = keras.layers.Dense(1, activation="sigmoid", dtype="float32")(concat) # Build model model = keras.models.Model(inp, out) # Compile the model model.compile( optimizer=keras.optimizers.Adam(learning_rate=1e-4), loss=loss, metrics=[auc], ) # Model Summary model.summary() Model: "functional_1" ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) Output Shape Param # Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ images (InputLayer) │ (None, 128, 128, │ 0 │ - │ │ │ 3) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ features │ (None, 71) │ 0 │ - │ │ (InputLayer) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ efficient_net_v2b2… │ (None, 4, 4, │ 8,769,374 │ images[0][0] │ │ (EfficientNetV2Bac… │ 1408) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense (Dense) │ (None, 96) │ 6,912 │ features[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ global_average_poo… │ (None, 1408) │ 0 │ efficient_net_v2… │ │ (GlobalAveragePool… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_1 (Dense) │ (None, 128) │ 12,416 │ dense[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout (Dropout) │ (None, 1408) │ 0 │ global_average_p… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_1 (Dropout) │ (None, 128) │ 0 │ dense_1[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_1 │ (None, 1536) │ 0 │ dropout[0][0], │ │ (Concatenate) │ │ │ dropout_1[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_2 (Dense) │ (None, 1) │ 1,537 │ concatenate_1[0]… │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘ Total params: 8,790,239 (33.53 MB) Trainable params: 8,707,951 (33.22 MB) Non-trainable params: 82,288 (321.44 KB)
In [25]:
keras.utils.plot_model(model, show_shapes=True, show_layer_names=True, dpi=60)
In [26]:
import math def get_lr_callback(batch_size=8, mode='cos', epochs=10, plot=False): lr_start, lr_max, lr_min = 2.5e-5, 5e-6 * batch_size, 0.8e-5 lr_ramp_ep, lr_sus_ep, lr_decay = 3, 0, 0.75 def lrfn(epoch): # Learning rate update function if epoch < lr_ramp_ep: lr = (lr_max - lr_start) / lr_ramp_ep * epoch + lr_start elif epoch < lr_ramp_ep + lr_sus_ep: lr = lr_max elif mode == 'exp': lr = (lr_max - lr_min) * lr_decay**(epoch - lr_ramp_ep - lr_sus_ep) + lr_min elif mode == 'step': lr = lr_max * lr_decay**((epoch - lr_ramp_ep - lr_sus_ep) // 2) elif mode == 'cos': decay_total_epochs, decay_epoch_index = epochs - lr_ramp_ep - lr_sus_ep + 3, epoch - lr_ramp_ep - lr_sus_ep phase = math.pi * decay_epoch_index / decay_total_epochs lr = (lr_max - lr_min) * 0.5 * (1 + math.cos(phase)) + lr_min return lr if plot: # Plot lr curve if plot is True plt.figure(figsize=(10, 5)) plt.plot(np.arange(epochs), [lrfn(epoch) for epoch in np.arange(epochs)], marker='o') plt.xlabel('epoch'); plt.ylabel('lr') plt.title('LR Scheduler') return keras.callbacks.LearningRateScheduler(lrfn, verbose=False) # Create lr callback
In [27]:
inputs, targets = next(iter(training_ds)) images = inputs["images"] num_images, NUMERIC_COLUMNS = 8, 4 plt.figure(figsize=(4 * NUMERIC_COLUMNS, num_images // NUMERIC_COLUMNS * 4)) for i, (image, target) in enumerate(zip(images[:num_images], targets[:num_images])): plt.subplot(num_images // NUMERIC_COLUMNS, NUMERIC_COLUMNS, i + 1) image = image.numpy().astype("float32") target= target.numpy().astype("int32")[0] image = (image - image.min()) / (image.max() + 1e-4) plt.imshow(image) plt.title(f"Target: {target}") plt.axis("off") plt.tight_layout()
In [28]:
lr_cb = get_lr_callback(CFG.batch_size, mode="exp", plot=True)
In [29]:
ckpt_cb = keras.callbacks.ModelCheckpoint( "best_model.keras", # Filepath where the model will be saved. monitor="val_auc", # Metric to monitor (validation AUC in this case). save_best_only=True, # Save only the model with the best performance. save_weights_only=False, # Save the entire model (not just the weights). mode="max", # The model with the maximum 'val_auc' will be saved. )
In [30]:
history = training_ds, epochs=CFG.epochs, callbacks=[lr_cb, ckpt_cb], validation_data=validation_ds, verbose=CFG.verbose, class_weight=class_weights, ) Epoch 1/8 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1724872959.215852 72] XLA service 0x7dd614018260 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: I0000 00:00:1724872959.215927 72] StreamExecutor device (0): Tesla T4, Compute Capability 7.5 I0000 00:00:1724872959.215933 72] StreamExecutor device (1): Tesla T4, Compute Capability 7.5 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1724873079.626884 72] ptxas warning : Registers are spilled to local memory in function 'loop_add_subtract_fusion_18', 72 bytes spill stores, 72 bytes spill loads I0000 00:00:1724873079.747367 72 device_compiler.h:188] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 16/19 ━━━━━━━━━━━━━━━━━━━━ 1s 366ms/step - auc: 0.4186 - loss: 0.7458I0000 00:00:1724873190.922405 70] ptxas warning : Registers are spilled to local memory in function 'loop_add_subtract_fusion_18', 72 bytes spill stores, 72 bytes spill loads 19/19 ━━━━━━━━━━━━━━━━━━━━ 308s 7s/step - auc: 0.4260 - loss: 0.7432 - val_auc: 0.3284 - val_loss: 0.7782 - learning_rate: 2.5000e-05 Epoch 2/8 19/19 ━━━━━━━━━━━━━━━━━━━━ 14s 673ms/step - auc: 0.7664 - loss: 0.5888 - val_auc: 0.7247 - val_loss: 0.5442 - learning_rate: 4.4333e-04 Epoch 3/8 19/19 ━━━━━━━━━━━━━━━━━━━━ 14s 659ms/step - auc: 0.9539 - loss: 0.3044 - val_auc: 0.7679 - val_loss: 0.5726 - learning_rate: 8.6167e-04 Epoch 4/8 19/19 ━━━━━━━━━━━━━━━━━━━━ 14s 666ms/step - auc: 0.9880 - loss: 0.1778 - val_auc: 0.8706 - val_loss: 0.4317 - learning_rate: 0.0013 Epoch 5/8 19/19 ━━━━━━━━━━━━━━━━━━━━ 14s 664ms/step - auc: 0.9965 - loss: 0.1228 - val_auc: 0.9185 - val_loss: 0.4356 - learning_rate: 9.6200e-04 Epoch 6/8 19/19 ━━━━━━━━━━━━━━━━━━━━ 19s 598ms/step - auc: 0.9988 - loss: 0.0951 - val_auc: 0.9064 - val_loss: 0.4878 - learning_rate: 7.2350e-04 Epoch 7/8 19/19 ━━━━━━━━━━━━━━━━━━━━ 13s 606ms/step - auc: 0.9993 - loss: 0.0796 - val_auc: 0.9059 - val_loss: 0.5100 - learning_rate: 5.4462e-04 Epoch 8/8 19/19 ━━━━━━━━━━━━━━━━━━━━ 13s 612ms/step - auc: 0.9997 - loss: 0.0761 - val_auc: 0.9029 - val_loss: 0.5102 - learning_rate: 4.1047e-04
In [31]:
# Extract AUC and validation AUC from history auc = history.history['auc'] val_auc = history.history['val_auc'] epochs = range(1, len(auc) + 1) # Find the epoch with the maximum val_auc max_val_auc_epoch = np.argmax(val_auc) max_val_auc = val_auc[max_val_auc_epoch] # Plotting plt.figure(figsize=(10, 6)) plt.plot(epochs, auc, 'o-', label='Training AUC', markersize=5, color='tab:blue') plt.plot(epochs, val_auc, 's-', label='Validation AUC', markersize=5, color='tab:orange') # Highlight the max val_auc plt.scatter(max_val_auc_epoch + 1, max_val_auc, color='red', s=100, label=f'Max Val AUC: {max_val_auc:.4f}') plt.annotate(f'Max Val AUC: {max_val_auc:.4f}', xy=(max_val_auc_epoch + 1, max_val_auc), xytext=(max_val_auc_epoch + 1 + 0.5, max_val_auc - 0.05), arrowprops=dict(facecolor='black', arrowstyle="->"), fontsize=12, color='tab:red') # Enhancing the plot plt.title('AUC over Epochs', fontsize=14) plt.xlabel('Epoch', fontsize=12) plt.ylabel('AUC', fontsize=12) plt.legend(loc='lower right', fontsize=12) plt.grid(True) plt.xticks(epochs) # Show the plot
In [32]:
# Best Result best_score = max(history.history['val_auc']) best_epoch = np.argmax(history.history['val_auc']) + 1 print("#" * 10 + " Result " + "#" * 10) print(f"Best AUC: {best_score:.5f}") print(f"Best Epoch: {best_epoch}") print("#" * 28) ########## Result ########## Best AUC: 0.91847 Best Epoch: 5 ############################
In [33]:
# Manually save the model to /kaggle/working/'/kaggle/working/best_model.keras') # Verify the file is saved file_path = '/kaggle/working/best_model.keras' print(os.path.isfile(file_path)) # Should return True if the file exists True
In [34]:
In [35]:
# Testing print("# Testing:") testing_features = dict(testing_df[FEAT_COLS]) testing_ids = testing_df.isic_id.values testing_ds = build_dataset(testing_ids, testing_hdf5, testing_features, batch_size=CFG.batch_size, shuffle=False, augment=False, cache=False) # Apply feature space processing testing_ds = lambda x: {"images": x["images"], "features": feature_space(x["features"])}, # Testing:
Loading Images : 100%
3/3 [00:00<00:00, 116.94it/s]
In [36]:
preds = model.predict(testing_ds).squeeze() 1/1 ━━━━━━━━━━━━━━━━━━━━ 8s 8s/step
In [37]:
inputs = next(iter(testing_ds)) images = inputs["images"] # Plotting plt.figure(figsize=(10, 4)) for i in range(3): plt.subplot(1, 3, i+1) # 1 row, 3 columns, i+1th subplot plt.imshow(images[i]) # Show image plt.title(f'Prediction: {preds[i]:.2f}') # Set title with prediction plt.axis('off') # Hide axis plt.suptitle('Model Predictions on Testing Images', fontsize=16) plt.tight_layout()
In [38]:
pred_df = testing_df[["isic_id"]].copy() pred_df["target"] = preds.tolist() sub_df = pd.read_csv(f'{BASE_PATH}/sample_submission.csv') sub_df = sub_df[["isic_id"]].copy() sub_df = sub_df.merge(pred_df, on="isic_id", how="left") sub_df.to_csv("submission.csv", index=False) sub_df.head()
