Based on the analysis presented in the article, here are some key takeaways and conclusions regarding the use of large language models (LLMs) for auditing smart contracts:
-
High-Level Accuracy: All three LLMs—Claude, Gemini, and GPT-5.5—are highly accurate at identifying the correct DASP-10 vulnerability category in known vulnerable smart contracts, with recall rates ranging from 89% to 98%.
-
Localization Precision:
- Claude has a high gap between lenient and strict recall (16 points), indicating it often identifies the right category but struggles with pinpointing the exact line of code.
- GPT-5.5, on the other hand, shows very little difference between lenient and strict recall, suggesting it is highly precise in both identifying the correct vulnerability type and locating it accurately.
-
False Positives:
- Claude tends to report a high number of false positives by flagging stylistic issues as security vulnerabilities.
- GPT-5.5 focuses on reporting one finding per contract with higher precision, which is suitable for automated systems that need fewer false positives.
-
Ensemble Approach: Combining the findings from
Read the full article at DEV Community
Want to create content about this topic? Use Nemati AI tools to generate articles, social posts, and more.

![[AINews] The Unreasonable Effectiveness of Closing the Loop](/_next/image?url=https%3A%2F%2Fmedia.nemati.ai%2Fmedia%2Fblog%2Fimages%2Farticles%2F600e22851bc7453b.webp&w=3840&q=75)



