
math and numpy for numerical operations, then add visualization power with matplotlib to help us understand our data. The torch imports form the backbone of our implementation - we're using PyTorch's neural network module (nn) for building our CNN, its vision datasets for easy access to MNIST, and transforms for preprocessing our images. Special mentions go to torchvision.transforms.ToTensor for converting images to PyTorch tensors and one_hot for label encoding. We also set a random seed with np.random.seed(1) to ensure reproducibility of our results, and include %matplotlib inline for smooth Jupyter Notebook visualization. This comprehensive import strategy ensures we have all the necessary components for data loading, model building, training, and evaluation readily available as we progress through our implementation.MNIST class do the heavy lifting for us: the first loads 60,000 training images (train=True), while the second prepares 10,000 test images (train=False). The root parameter specifies where to store the data in our Kaggle environment, and download=True automatically fetches the dataset if it's not already present.torchvision.datasets.MNIST is its seamless integration with PyTorch - it handles all the downloading, decompressing, and organizing for us. When we print the dataset sizes, we see the classic MNIST split: 60,000 training samples and 10,000 test samples. This 6:1 ratio is ideal for machine learning, giving us ample data to train our CNN while reserving a substantial set for evaluation. Notice how we're not yet transforming the data - we'll handle preprocessing in the next steps, keeping our pipeline clean and modular.type(train_data[0]), we discover it returns a tuple - this reveals an important characteristic of how the dataset is organized. In PyTorch's MNIST implementation, each data sample is stored as a (image, label) pair, where:
plt.subplots(), with each image given ample space (10 inches wide × 3 inches tall)np.random.randint) and displays them with:image.show() renders the grayscale pixel dataset_title() shows us the ground truth labelaxis('off') removes distracting axes for cleaner visualizationcmap='gray' parameter ensures we see authentic black-and-white representations, just as our CNN will process them. These samples give us immediate intuition about our task - we can see the variation in handwriting styles, stroke thickness, and digit positioning that our model will need to handle. Notice how the labels match the visible digits (though sometimes the handwriting is surprisingly ambiguous even to human eyes!), establishing our baseline expectation for model performance.transform pipeline using transforms.Compose, which currently contains just one operation - ToTensor(). This simple but powerful conversion:download=False since we've already cached the data.train_data[0][0], we now see a tensor instead of a PIL Image. This tensor contains our normalized pixel values ready for neural network processing.random_split. This gives us:one_hot functionSequential container to build our model—but what exactly is it? Think of Sequential as a simple and organized way to stack layers in a neural network, one after another, like building blocks. Instead of manually defining how data flows between layers, Sequential automatically connects them in the order you specify. This makes it perfect for straightforward architectures where the output of one layer directly feeds into the next. For example, in our CNN, we’ll stack layers like convolution, batch normalization, and activation functions in sequence—just like a pipeline. It’s beginner-friendly, reduces boilerplate code, and keeps the model definition clean and readable. Now, let’s see how we use it to construct our digit classifier!device = "cuda" if torch.cuda.is_available() else "cpu" automatically selects GPU acceleration if available, falling back to CPU otherwise. This makes our code more portable across different hardware setups.unsqueeze(dim=1) operation - this adds the required channel dimension (from 28×28 to 1×28×28)CrossEntropyLoss: The perfect choice for our multi-class classification task, combining softmax activation and negative log likelihood loss in a numerically stable way. This will measure how far our predictions are from the true digit labels.Adam Optimizer: Our model's "guide", combining the benefits of AdaGrad and RMSProp with:LinearLR Scheduler: Gradually decreases the learning rate during training for more stable convergenceAccuracy: Overall correctness of predictionsPrecision: Measure of prediction reliabilityRecall: Ability to find all relevant cases
Each metric is configured for our 10-class problem, tracking both training and validation performance separately.TensorDataset for clean data managementDataLoader with:torch.inference_mode() for better performancetrain() vs eval())zero_grad() and backward()
Posted Apr 18, 2025
Built a custom CNN architecture in PyTorch, achieving 99.26% accuracy on the MNIST dataset.