Predict Human Reading Time Using GPT-2 & BERT
Course project for Seminar in Computational Cognition.
- Preprocessed the Natural Stories Dataset with more than 1M observations and engineered features for reading time prediction.
- Built a sliding-window batching pipeline to split text into segments for parallel, CUDA-efficient inference of GPT-2 and BERT models.
- Analyzed surprisal scores and fitted linear mixed-effects models to predict human reading time.
- Contributed quantitative evidence on GPT-2 outperforming BERT with an AIC improvement of 571 points and p-value less than 1e-100.