Case Study 01 /

MULTI-MODAL CREATIVE AGENT

Role

Lead PM - Agentic In-Canvas System for Prompting, Model Selection, and Complex Workflows. Built with Custom Testing and Training Sets.

Scope

Creative Reasoning, Agent Architecture, User Research

Tools

GPT 4o, GPT 03-mini, Moondream

Superstudio is a powerful but complex creative platform. Users have access to multiple AI models, generation flows, and a deep set of creative parameters, but actually using them required a level of technical fluency most people didn't have.

The data told a clear story. Canvas engagement dropped sharply after day one and continued to decline steadily after day two. Only about 50% of users ever created a node, and just 43% started a media generation process. The onboarding tutorial had a 6% completion rate. The product was losing people before they ever reached the good part.

The friction came down to three things: prompt engineering, model selection, and the cognitive overhead of translating a creative idea into the right technical configuration. Even experienced creative teams reported "tool fatigue," a reluctance to invest the time learning yet another creative tool that might not meet their needs.

I led this from the PRD through launch, coordinating across research, engineering, design, and GTM. The central insight was a reframing of the problem: unlike Apple, we couldn't present users with a single curated path through the product. Superstudio is a modular canvas with branching decisions at every step. So instead of a static golden path, we needed a "neural" golden path, one that could respond to each user's creative intent and present the most efficient solution in the simplest available steps.

The core bet was that we could build an agent that understood the product deeply enough to operate it on behalf of the user. Not just answer questions, but take real actions inside the canvas. At the time, in-product agentic behaviors at this depth were nearly unprecedented in consumer creative tools.

Key product decisions I drove:

  • Test and Develop against failed user actions. I worked directly with the support team to organize and classify our most common support questions into specific tasks and areas users would often struggle with. I turned this into a test-set for the agent, ensuring that we were prepared to handle the tasks we knew our users needed the most help with.
  • Prioritize creative intelligence over JSON accuracy. Getting the technical execution right was a solvable problem. The harder and more important thing was making sure Kiko suggested the right creative solution, not just a solution. We targeted 95%+ accuracy on interpreting user goals and recommending appropriate flows and models.
  • Design for a full spectrum of user intent. A new user asking "what is Superstudio?" and a power user saying "make me a music video with these reference images" needed to hit the same entry point and both get a useful response.
  • Ship in phases, expand capabilities incrementally. Phase 1 was the MVP: conceptual understanding and canvas organization. Phase 2 added media generation and image handling. Phase 3 brought the full multi-agent architecture with vision analysis and direct image manipulation.
  • Build personification into the product, not just the UI. Kiko needed to feel more accessible and alive than a generic chatbot. She got a visual identity, a personality, and a conversational style that made interacting with the canvas feel collaborative rather than transactional.

The architecture I designed with our research team was a multi-agent system with several coordinating components:

  • Creative reasoning layer (GPT o3-mini): The brain. Evaluates user queries, decides what actions to take, crafts creative direction, and orchestrates the other agents.
  • Vision analysis (Moondream): Analyzes uploaded images and canvas screenshots to understand the visual context of a request.
  • Knowledge library: A deep reference base covering all available flows and their parameters, model-specific prompting optimizations, creative tutorials for complex projects, a full Superstudio glossary, and 100+ real Kiko/user interaction examples.
  • kikoCoder (GPT-4o): A specialized subagent for flow execution. Translates the reasoning layer's decisions into JSON schemas to create and modify image and video flows via the Superstudio API.
  • kikoGemini (Gemini-ImageEdit): Handles direct image manipulation, letting users edit generated content through natural language.

This allowed Kiko to meet users wherever they were. Someone could ask for help with a single parameter, or ask her to produce a full multi-flow project. She'd craft the creative direction, select the right models, write optimized prompts, and execute everything in one conversational turn.

Simple diagram of Kiko’s system architecture
Simple diagram of Kiko’s system architecture

To test the system, I turned to our support team. Rather than test a suite of generalized creative tasks, I wanted to ensure that Kiko was tuned to the specific creative needs that users often failed accomplishing on the canvas without assistance. Our support team and I categorized a rolling collection of the previous 60 days of support messages into task-oriented categories, focusing on creating buckets of queries depending on the type of difficulty the user encountered. These broadly fell under a few core categories:

  • Prompting quality and model choice confusion. Users frequently messaged support with vague complaints about the quality of their output. In the vast majority of cases, the root of the complaint could be traced back to either a poorly constructed prompt or a model choice that wasn’t built for the creative output the user desired.
  • Complex creative tasks in simple workflows. One of our most common support message buckets hinged on a single creative task: making a music video. Users tried in various ways to create music videos by generating single pieces of media or struggled to understand cohesion across a multi model workflow. We noticed a number of other complex creative tasks that users struggled to systemize and included these types of queries in this bucket.
  • New user UX confusion. Regardless of their familiarity with the AI models themselves, we found that new users often hit a roadblock simply trying to set up a workflow.
Sample of the test set, used to ensure high ROI creative reasoning for user queries
Sample of the test set, used to ensure high ROI creative reasoning for user queries

These buckets of queries allowed us to create a test set for Kiko that accurately represented what she would need to do successfully to have the highest ROI for our users. The test set allowed us to evaluate her creative reasoning with significantly less internal bias than a generalized set.

Over her 9 months online, Kiko handled approximately 375,000 queries and processed over 2.8 billion tokens. In a 30-day snapshot, 43,000+ users engaged with Kiko, with daily usage holding steady at around 1,600 users. 12% of new users engaged with Kiko within their first session.

43k+
Users in a 30-day snapshot
2.8b
Tokens processed