In September 2023, a new feature was added to ChatGPT (OpenAI) that allows the analysis of images, including those obtained through dermatoscopy. This version of ChatGPT might help clinicians identify the nature of skin lesions, but its ability to do so must be verified.
In a letter to the editor of the Journal of the American Academy of Dermatology, a team reported its experience in exploring the performance of ChatGPT Vision in diagnosing melanoma from dermatoscopic images.
Dermoscopic images, including cases confirmed by melanoma and benign nevi histopathology, were extracted from the archives of the International Skin Imaging Collaboration. The images were submitted to ChatGPT Vision with a request for three differential diagnoses to be ranked from most to least likely.
The diagnosis deemed most likely was compared with the histopathologic diagnosis. The researchers also determined whether the correct diagnosis was among ChatGPT's top three diagnoses. Finally, the ability of ChatGPT Vision to differentiate between benign and malignant lesions was investigated.
A total of 100 melanocytic lesions, including 50 melanomas and 50 non-atypical benign nevi, were used in the study. Considering the first diagnosis provided by ChatGPT Vision, the sensitivity was 32%, specificity was 40%, and overall diagnostic accuracy was 36%. The correct diagnosis was in the top three of the differential diagnoses with a sensitivity of 56%, specificity of 53%, and precision of 55%.
Regarding ChatGPT's ability to differentiate between malignant and benign lesions, the sensitivity, specificity, and precision were 46%, 78%, and 62%, respectively, for the diagnosis deemed most likely and 78%, 47%, and 62% for the top three diagnoses proposed by the chatbot.
These results show that the performance of ChatGPT Vision is significantly below that of approved artificial intelligence algorithms for diagnosing melanoma from dermatoscopic images. The risk for missing a melanoma and incorrectly classifying a lesion as malignant is too high to use ChatGPT in clinical practice.
Certainly, this study is limited by the small number of lesions used, among which dysplastic lesions were absent, and the lack of consideration of various essential data (such as the anatomical site of the tumor, type of nevus, and melanoma thickness).
Therefore, ChatGPT is not sufficiently effective for the diagnosis of melanoma, even if it can at least help describe images. Or should we say not yet sufficiently effective?
This story was translated from JIM, which is part of the Medscape Professional Network, using several editorial tools, including AI, as part of the process. Human editors reviewed this content before publication.