π§ͺ Evaluation Framework¶
The evaluation framework is separate from unit tests. It performs visual regression testing by rendering scenes and comparing with golden PNGs.
π Structure¶
evaluation/
βββ cases/ # Scene input JSON files
βββ expected/ # Golden PNG images
βββ output/ # Rendered PNGs (generated)
βββ diffs/ # Diff images on failure
Browse on GitHub:
evaluation/
π Running Evaluation¶
π How It Works¶
flowchart TD
A[evaluation/cases/*.json] --> B[Render scene to PNG]
B --> C{Compare with\nexpected PNG}
C -->|within tolerance| D[Pass]
C -->|exceeds tolerance| E[Write diff image]
D --> F[JSON report\n+ text summary]
E --> F
π Comparison Methods¶
- ποΈ Pixel diff: Percentage of differing pixels (configurable tolerance)
-
οΈβ£ Perceptual hash: Hamming distance between image hashes¶
π Tests vs Evaluation¶
| tests/ | evaluation/ | |
|---|---|---|
| π·οΈ Type | Unit tests | Visual regression |
| π οΈ Tool | pytest | Built-in runner |
| β Checks | Logic correctness | Render correctness |
| π¦ Artifacts | - | PNG renders + diffs |