TLDR: Custom matcher for React testing frameworks that uses multi-modal AI models to enforce UI/UX patterns.
- Supports the Jest and vitest testing frameworks and follows Jest's snapshot testing pattern
- Uses headless Chrome and Puppeteer to render html snapshots
- Supports using OpenAI models for analysis
- Supports running locally, via Dockerfile, or using a free (rate-limited) shared backend
import { render } from '@testing-library/react';
import { extendExpectDesignReviewer, DEFAULT_REVIEW_TIMEOUT, DEFAULT_RULES } from '@vslint/jest';
import Button from '../src/Button';
expect.extend(extendExpectDesignReviewer({
customStyles: ['./styles/globals.css'],
rules: [{
ruleid: "text-too-wide",
description: "First write out how many words are on each line of text. If a single line of text, as it appears between line breaks (aka newlines), contains more than 30 words, excluding spaces and punctuation, mark it as true and explain which line is too long; otherwise, mark it as false."
}],
model: { modelName: 'gpt-4o-mini', key: process.env.OPENAI_API_KEY }
}));
test('text content that is too wide on desktop screens and is not legible', async () => {
const { container } = render(<div>Incredibly long content potentially too long. Human readability is best when lines are not as long and have fewer words on a single line, this div should have fewer words, really. It's just rude.</div>);
await expect(container).toPassDesignReview();
}, DEFAULT_REVIEW_TIMEOUT);
# for Jest
npm install @vslint/jest --save-dev
# for Vitest
npm install @vslint/vitest --save-dev
The first step is to add a new matcher to the testing framework's expect that performs the design review. This should likely be done via the setupFilesAfterEnv
flag in the testing framework's config.
// jest.config.js
module.exports = {
testEnvironment: "jsdom",
setupFilesAfterEnv: ["@testing-library/jest-dom", "./setupTests.js"],
...
};
// or in vitest.config.js
import { defineConfig } from "vitest/config";
export default defineConfig({
test: {
environment: "jsdom",
setupFiles: "./setupTests.js",
globals: true,
...
},
});
// setupFiles.js
import { extendExpectDesignReviewer } from '@vslint/jest';
expect.extend(extendExpectDesignReviewer({
// global CSS paths that enable correct rendering
customStyles: ['./styles/globals.css'],
// model config to determine which provider to use for analysis
model: { modelName: 'gpt-4o-mini', key: process.env.OPENAI_API_KEY },
// optional, defaults to `DEFAULT_RULES` in '@vslint/shared/rules'
rules: DEFAULT_RULES,
// optional, sets a custom review endpoint. Override if you are self-hosting a review server
reviewEndpoint: 'https://vslint-644118703752.us-central1.run.app/api/v1/design-review',
// optional, sets the log level (or a custom winston logger)
log: 'debug'
}));
Parameter | type | default | Description |
---|---|---|---|
customStyles |
string[] |
The path to the css file that is used to generate the hash of the css file and the snapshot. | |
strict |
boolean |
true |
If true, tests will fail if any of the rules fail. If false, the test will pass and the snapshot will be logged with the results. |
model |
{ modelName: string; key: string } |
API credentials for the design review model. Supported models are gpt-4o , gpt-4o-mini |
|
reviewEndpoint |
string |
https://vslint-644118703752.us-central1.run.app/api/v1/design-review |
The endpoint to use for the review server. Defaults to a shared review server. |
log |
string or winston.Logger |
info |
Allows you to set a log level or pass in a custom Winston logger. |
Now that the matcher is setup, you can use it in your tests to check if the snapshot passes design review. The toPassDesignReview
method expects to be called on an HTMLElement
. Semantics are the same for Jest and Vitest.
import { render } from '@testing-library/react';
test('render text that is too long and hard to read', async () => {
const { container } = render(<div>Incredibly long content potentially too long. Human readability is best at a maximum of 75 characters</div>);
// it's important to always await the matcher as the design review call is asynchronous
await expect(container).toPassDesignReview({
// optional, sets the viewport size to render the content at
atSize: 'md',
// optional, sets the log level (or a custom winston logger)
log: 'debug'
});
}, DEFAULT_REVIEW_TIMEOUT);
Parameter | type | default | Description |
---|---|---|---|
atSize |
string |
{ width: number; height: number;} |
{ width: 1920, height: 1080 } |
log |
string or winston.Logger |
info |
Allows you to set a log level or pass in a custom Winston logger. |
strict |
boolean |
true |
If true, this test will fail if any of the rules fail. If false, the test will pass and the snapshot will be logged with the failing results. This overrides the global strict setting. |
UX rules are written as JavaScript objects and passed into the extendExpectDesignReviewer
call. You can view the default rules here.
Rules are evaluated as part of a multi-modal LLM call, so they can be as complex as you want. Here is an example of a rule that checks if the text is too wide.
{
ruleid: 'text-too-wide',
description: 'First write out how many words are on each line of text. If a single line of text, as it appears between line breaks (aka newlines), contains more than 30 words, excluding spaces and punctuation, mark it as true and explain which line is too long; otherwise, mark it as false.'
}
As usual, the better you are at prompting the more effective your rules will be. One trick to writing good rules is to first ask the model to "focus" on the relevant part of your design. For example in the rule above, we first ask the model to count the words on each line of text before evaluating whether or not the text is too wide.
Fun fact, vision model performance improves drastically with few-shot prompting. This means that if you provide samples demonstrating the rule failing and passing, the better the model will be at evaluating your rules.
You can add these by adding a samples
property to your rule.
{
ruleid: '...',
description: '...',
// samples will be prepended to your model call
samples: [
{
html: '...',
viewport: { ... },
fail: true
},
{
html: '...',
viewport: { ... },
fail: false
}
]
}
VSLint ships with an evaluation tool that allows you to test your design rules. You can use it via the @vslint/server
package as follows:
npx @vslint/server eval --input path/to/evals --rules ./path/to/rules.json --model gpt-4o
In order to work properly, evals should be a directory with the following structure:
evals/
rule-id/
pass/
rule-id-pass-1.json
rule-id-pass-2.json
...
fail/
rule-id-fail-1.json
rule-id-fail-2.json
...
Each of the .json
files should be in the following format:
{
"html": "<html content here>",
"viewport": {
"width": 1920,
"height": 1080
}
}
Evals for the default rule set can be found here. Writing your own evals is highly recommended to ensure your rules are working as expected.
Right now the default rules are not very good and all contributions are welcome! Please include at least one new failing and one passing eval for any PR that changes default rules.
npx @vslint/server
Run the server on a custom port by setting the PORT
environment variable. You can target this server by setting the reviewEndpoint
parameter in the extendExpectDesignReviewer
call to DEFAULT_LOCAL_REVIEW_ENDPOINT
.
Deploy the dockerfile at packages/server/Dockerfile
to run a design review server. You can deploy on Google Cloud by clicking the button below.
You can run this in your existing backend by directly importing the runReview
call
import { runReview } from '@vslint/server';
⚠️ Warning: The shared backend is not recommended for production use as it is a shared resource and rate limited.
You can run this in the shared backend by setting the reviewEndpoint
parameter in the extendExpectDesignReviewer
call to DEFAULT_REVIEW_ENDPOINT
(this is also the default value).
The full logic for how tests pass or fail for both Jest and Vitest is shown below.
flowchart TD
matchingSnapshot{Matching Snapshot?}
matchingSnapshot -->|No| inCI{In CI?}
inCI -->|Yes| failNoSnapshot[Fail: No Snapshot in CI]
inCI -->|No| hasOpenAICreds{Has OpenAI Credentials?}
hasOpenAICreds -->|Yes| runReview[Run Review]
hasOpenAICreds -->|No| runtimeException[Runtime Exception: Missing Credentials]
runReview --> reviewFails{Review Fails?}
reviewFails -->|Yes| strict{Strict Mode?}
strict -->|Yes| failReview[Fail: Review Failed]
strict -->|No| logSnapshot[Pass: Log Snapshot]
reviewFails -->|No| logSnapshot[Pass: Log Snapshot]
matchingSnapshot -->|Yes| markedFail{Marked Failing?}
markedFail -->|Yes| strictFail{Strict Mode?}
strictFail -->|Yes| failMarkedFail[Fail: Marked Failing in Strict Mode]
strictFail -->|No| pass[Pass: Matches Snapshot]
markedFail -->|No| pass[Pass: Matches Snapshot]
VSLint supports using OpenAI to perform the design review as well as a shared backend design review server. While the benefit of using the shared backend is that it's free, this does mean that snapshots are sent to the OpenAI API and that your API key is being sent to a server.
This project is licensed under the MIT License - see the LICENSE file for details.