A systematic review reveals only 5% of healthcare evaluations for large language models use real patient data, highlighting gaps in bias and task assessment.
A systematic review reveals only 5% of healthcare evaluations for large language models use real patient data, highlighting gaps in bias and task assessment.