Controlled image generation and evaluation workflow
This project explores a diffusion-image workflow that starts with SDXL text-to-image generation, applies Img2Img semantic editing, uses ControlNet Canny guidance for structure, and evaluates perceptual and pixel-level change.
Challenge
- Text-to-image models can generate strong visuals but do not always preserve structure across edits.
- Img2Img can change semantic content while also drifting from the original geometry.
- ControlNet gives a way to preserve structure while allowing style and lighting changes.
System architecture
Data and inputs
A futuristic city scene is used as the controlled test case, with Canny edges extracted from the baseline image for structural guidance.
Technical approach
- Generate a baseline image with SDXL using a detailed prompt.
- Apply Img2Img to add new semantic content while keeping the general style.
- Use ControlNet Canny to transform the scene while preserving skyline geometry.
- Evaluate the resulting image with LPIPS and PSNR.
Evaluation and results
40 SDXL inference steps
ControlNet Canny guidance
LPIPS 0.4527 / PSNR 12.71 dB
- ControlNet preserved skyline geometry more effectively than plain Img2Img.
- LPIPS captured a meaningful perceptual shift while the scene identity remained recognizable.
- The low PSNR was expected because the output changed lighting and colors substantially.
Implementation and code
Implementation focus
The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.
Source code
The code is available for exploring the implementation details and extending the experiment when needed.
Scope and responsible use
The project is a focused modeling and evaluation study. Broader use should be supported by validation on additional data, robustness checks, monitoring, and domain-specific evaluation.
Future development
- Compare additional ControlNet conditions and conditioning strengths.
- Add seed sweeps to separate prompt effects from sampling variation.
- Build a small gallery that compares outputs side by side.
Technical contribution
The project shows how controlled generative-image workflows can combine creative editing with structural guidance and measurable image comparison.