We develop and evaluate genomic prediction models for plant breeding. By integrating high-density marker data with phenotypic records, we build statistical and machine learning models that predict breeding values of untested individuals. Our focus includes optimizing training population design, cross-validation strategies, and multi-trait prediction for accelerated breeding cycles.
Genomic selection (GS) enables breeders to select superior genotypes without the need for extensive field testing of every candidate, dramatically reducing the time and cost of breeding programs. Our models incorporate thousands of genome-wide markers, obtained through genotyping-by-sequencing (GBS) alongside multi-environment phenotypic data collected across years and locations.
We evaluate a range of prediction methods including GBLUP, Bayesian regression models, random forests, and deep learning architectures. Special attention is given to multi-trait genomic prediction, where correlated traits such as yield, quality, can be jointly modelled to improve prediction accuracy. Our research also explores how to optimally allocate resources between genotyping and phenotyping to maximize genetic gain per unit investment.