Explainable Generative AI: Enhancing Stable Diffusion with Machine Learning and Generative AI
Explainable AI (XAI) increases the transparency of AI, it is a set of techniques that help us how machine learning algorithm make decisions however issues persist, particularly in understanding image generation tasks. Numerical interpretability is the major issue that makes it challenging for models...
Saved in:
Published in | 2025 5th International Conference on Pervasive Computing and Social Networking (ICPCSN) pp. 571 - 577 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
14.05.2025
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ICPCSN65854.2025.11035339 |
Cover
Summary: | Explainable AI (XAI) increases the transparency of AI, it is a set of techniques that help us how machine learning algorithm make decisions however issues persist, particularly in understanding image generation tasks. Numerical interpretability is the major issue that makes it challenging for models to understand numerical values in prompts to recognize the number of subjects for image generation using diffusion models. In this study, prompt engineering using Generative AI and SHAP-based (Shapley Additive Explanations) explainability are used to enhance the prompt and understand the features that contribute to the generated image. The Gemini API prompts refined model responses and increases numerical interpretability because generative AI can reduce human errors and enhance image features through structured prompts. Prompt engineering techniques are used to guide Artificial Intelligence models to produce desired output, techniques such as dynamic prompting expands the prompt and improve the numerical interpretability and clarity of the actual input.Synthetic images were tested for distortion using Peak Signal-to-Noise Ratio (PSNR: 9.11 dB) and our model is 20% better than the existing model and the Learned Perceptual Image Patch Similarity (LPIPS: 0.69) for 69% dissimilarity between the generated images of our extensible diffusion model model and the existing stable diffusion model, the Structural Similarity Index Measure (SSIM: 24%) for structural consistency and the Con trastive Language-Image Pre-training (CLIP) Score for semantic alignment between text and image. The results indicated reduced noise, better prompt alignment, and greater transparency. By combining structured prompting, explainability, and quantitative testing, this method improves generative model control, ensuring efficiency, interpretability, and better image quality for real-world applications. |
---|---|
DOI: | 10.1109/ICPCSN65854.2025.11035339 |