SVG Model

YEAR

2023

ABOUT

Our team at Poly built a next-generation SVG model that enabled designers to generate vector assets from text, sketches, reference images, or even extrapolate full asset sets from a single style example.


At its core is a diffusion model conditioned on a vast array of signals, designed as a flexible, API-style system. We introduced novel contrastive learning methods for understanding asset sets, and advanced VLM-based approaches for structured SVG generation.

View ↗

KEY INITIATIVES

Large scale data labeling projects, novel interaction patterns, actionable user studies

Problem Validation

[DESIGN MENTAL MODELS STUDY]

I began by running a comprehensive user study to fully understand designers' mental models around vector assets and identify the most effective forms of conditioning our model we could offer.


I operated within a rapid validation loop to uncover the breadth of creative intents across different designer archetypes and roles, from spinning up Gradio demo environments for quick model prototypes, to a stream of UI demos.

View ↗

[MODEL CONDITIONINGS]

These learnings were then directly translated into the global and channel conditions we wanted the diffusion side of our model to have, including…


⌨️ Text prompting (Describe your asset in words)

✍️ Sketch prompting (Draw a quick sketch as guidance)

🖌️ Region masking of conditions (Using a brush to define the areas you want to edit)

💬 Instruction-based editing (Describe the changes you want to make in words, mask free)

🖼️ Image style embed (Upload an image as guidance)

📦 SET EMBEDDING (Create a set with similar visual properties)

🎨 Palette input (Select your own color palette)

These learnings were then directly translated into the global and channel conditions we wanted the diffusion side of our model to have, including…


⌨️ Text prompting (Describe your asset in words)

✍️ Sketch prompting (Draw a quick sketch as guidance)

🖌️ Region masking of conditions (Using a brush to define the areas you want to edit)

💬 Instruction-based editing (Describe the changes you want to make in words, mask free)

🖼️ Image style embed (Upload an image as guidance)

📦 SET EMBEDDING (Create a set with similar visual properties)

🎨 Palette input (Select your own color palette)

These learnings were then directly translated into the global and channel conditions we wanted the diffusion side of our model to have, including…


⌨️ Text prompting (Describe your asset in words)

✍️ Sketch prompting (Draw a quick sketch as guidance)

🖌️ Region masking of conditions (Using a brush to define the areas you want to edit)

💬 Instruction-based editing (Describe the changes you want to make in words, mask free)

🖼️ Image style embed (Upload an image as guidance)

📦 SET EMBEDDING (Create a set with similar visual properties)

🎨 Palette input (Select your own color palette)

View ↗

Solution Validation

[CREATIVE INTENT EXERCISE]

With our conditionings defined, the next step was to run interactive sessions with users to collect ground-truth input/output examples for modeling and labeling. This included capturing live prompts, drawing mask inputs, and simulating other conditions to gather authentic data.

View ↗

[INTERACTIVE PROTOTYPE]

This foundational work culminated in a fully interactive prototype that showcased the model’s potential. It was our sole demo at the time, used to help us raise an additional $4M (bringing total funding to $8M), which was necessary to build out a 128+ GPU cluster for training.


Beyond fundraising, the prototype also laid the groundwork for the end-to-end design of the creative platform itself.

View ↗

Novel interaction patterns were discovered and implemented — particularly direct manipulation techniques allowing users to prompt and manipulate media directed by the cursor, rather than remaining limited by the traditional static text input mechanisms.

View ↗

Data Labeling

[DATA COLLECTION SYSTEMS]

We built multiple workflows to collect different types of data based on our model conditionings. This included working with several data labeling firms, each trained on custom task guidelines and taught to use tools like Photoshop and Figma for labeling and review.


For several projects, we managed weekly volumes of 25k–35k labeled outputs, balancing speed with quality through constant iteration.

View ↗

[SOURCING & EVALUATION FRAMEWORKS]

We had to source and label highly specific, real-world use cases — the kind of data that simply doesn’t exist in off-the-shelf datasets, especially for things like design sets or sketches. To address this, we not only collected unique human-labeled data, but also trained embedding models on our specialized datasets to generate high-quality synthetic data at scale.

View ↗

[SUBMISSION PLATFORM]

We also designed and built a custom data annotation and submission platform tailored to our workflows. It allowed labelers to submit links for us to scrape, apply tags and categories, upload their work, and access task-specific docs and instructions — all in one place.

View ↗