14 Weeks
Structured research tutorial
University of Wisconsin–Eau Claire · Research with Dr. Xiang Ma · Summer 2025
This project investigates how well modern image classification models can recognize different cheese varieties from real-world images. The work follows a structured 14-week tutorial designed by Dr. Xiang Ma, starting from tool onboarding (Python, VS Code, Git, Colab) and classic datasets like MNIST, then progressing to cheese-specific datasets from Kaggle, data cleaning, and baseline model design in PyTorch.
My contribution focused on data quality and reproducible experiments: understanding what a “good” image dataset looks like, cleaning and organizing the cheese images, and helping implement and run baseline models on the university’s high-performance computing (HPC) resources.
Structured research tutorial
Primary deep learning framework
Started by setting up the environment with VS Code, Python, Git, and Markdown, then reviewed core concepts in machine learning (regression vs. classification) and major frameworks like TensorFlow, Keras, and PyTorch. MNIST served as the “hello world” dataset for image classification, using fully connected networks and CNNs to understand training dynamics.
Shifted to cheese-specific datasets from Kaggle, learning what defines a high-quality dataset in practice: no mislabeled samples, consistent image sizes, minimal blur, and no duplicates. Applied a mix of manual review and automated checks to bring the datasets closer to that standard and prepared unified train/validation/test splits.
Experimented with tools such as CleanVision for detecting low-quality and potentially mislabeled images, and used Google Gemini examples to flag images that did not actually contain cheese. After cleaning, all images were resized to a consistent resolution suitable for model training.
Studied state-of-the-art image classification approaches, beginning with CNNs and conceptually exploring transformer-based models. Helped design a PyTorch model sized to fit the university HPC GPU server, and participated in training and refinement cycles, including hyperparameter adjustments and monitoring performance across epochs.
Supported efforts to evaluate the trained model using a small collected cheese dataset and contributed to documentation of the full pipeline: from tool onboarding and dataset cleaning through model design, training, and evaluation. This documentation is intended to help future students quickly understand and extend the project.
Python
PyTorch (primary), Google Colab
NumPy, Pandas, Matplotlib, PIL, CleanVision
Google Gemini (cheese recognition / triage examples)
UW–Eau Claire HPC GPU cluster, local VS Code
Git & GitHub for code and notebook tracking
Public Kaggle cheese datasets + small collected set
Markdown notes, weekly summaries, and final write-up