Human researchers outperformed large language models across all major stages of systematic review preparation, particularly in study selection, synthesis, and final manuscript drafting. While LLMs demonstrated speed and partial accuracy in early screening and data extraction, they could not independently produce high-quality, guideline-compliant systematic reviews.