Predict Human Reading Time Using GPT-2 & BERT

less than 1 minute read

Course project for Seminar in Computational Cognition.

Preprocessed the Natural Stories Dataset with more than 1M observations and engineered features for reading time prediction.
Built a sliding-window batching pipeline to split text into segments for parallel, CUDA-efficient inference of GPT-2 and BERT models.
Analyzed surprisal scores and fitted linear mixed-effects models to predict human reading time.
Contributed quantitative evidence on GPT-2 outperforming BERT with an AIC improvement of 571 points and p-value less than 1e-100.

Parkinson Classification & Symptoms Profiling with Accelerometer Data