Predict Human Reading Time Using GPT-2 & BERT

less than 1 minute read

Course project for Seminar in Computational Cognition.

  • Preprocessed the Natural Stories Dataset with more than 1M observations and engineered features for reading time prediction.
  • Built a sliding-window batching pipeline to split text into segments for parallel, CUDA-efficient inference of GPT-2 and BERT models.
  • Analyzed surprisal scores and fitted linear mixed-effects models to predict human reading time.
  • Contributed quantitative evidence on GPT-2 outperforming BERT with an AIC improvement of 571 points and p-value less than 1e-100.

View project on GitHub