Job Description
We are seeking a ComfyUI specialist to build a high-precision, sequential video generation workflow. The objective is to create a 15-second video that is generated in three distinct 5-second segments. The Vision: Unlike standard AI video generators that "guess" the motion, this workflow must allow me to provide a Start Frame and an End Frame for every 5-second block. This ensures that the video doesn't just wander; it follows a precise path from Point A to Point B. The aesthetic must mimic the "Zack D.
Films" style: clean 3D character models, clinical/educational lighting, and smooth, snappy animations. By generating the keyframes (T0, T5, T10, T15) before the video, we ensure total character consistency and professional-grade storytelling across the full 15 seconds. 2. Core Logic & Workflow Structure
The developer must build the pipeline to follow these four specific stages: β Stage 1: Keyframe Storyboarding β A module to generate 4 primary images: 0s, 5s, 10s, and 15s. β Must use IP-Adapter or Wan-StandIn logic to ensure the character, clothing, and environment are identical in all 4 images.
β Stage 2: Sequential Rendering (
The "Sandwich" Method) β Segment 1 (0-5s): Uses Image 0 as the Start and Image 5 as the End. β Segment 2 (5-10s): Uses Image 5 as the Start and Image 10 as the End. β Segment 3 (10-15s): Uses Image 10 as the Start and Image 15 as the End. β Stage 3: Seamless Transitions & Smoothing β Implement Color Match nodes to prevent "flicker" between segments. β Use VFI (Video Frame Interpolation) to bring the native 16fps output up to a "snappy" 60fps. β Stage 4: Automated Assembly β Automatically stitch the three clips into a single high-bitrate.mp4 file.
3. Required Technical Stack (Models & Nodes) Primary Video Model: β Wan2.1 / Wan2.2 (FLF2V Version): Specifically the First-Last Frame 14B or 1.3B models. This is non-negotiable as it is the only open-source model capable of dual-image conditioning (Start and End frames). Essential Custom Nodes: β ComfyUI-WanVideoStartEndFrames: For the WanVideoStartEndFramesSampler. β ComfyUI-WanVideoWrapper (Kijai): For model loading and VRAM optimization. β ComfyUI-VideoHelperSuite (VHS): For video concatenation and saving.
β ComfyUI-KJNodes: For ColorMatch and frame interpolation. β IP-Adapter-Plus: To lock character identity across the segments. 4. Technical Requirements & Performance β VRAM Efficiency:
The workflow should be optimized for a 24GB VRAM environment (using FP8 or GGUF quantization where necessary). β Zack D. Aesthetic:
The workflow must include a prompt-engineering block (or LoRA loader) pre-configured for the "3D Medical Animation / Octane Render" look. β Modularity: Each 5-second segment should be able to be "frozen" or "muted" so I can re-roll one segment without re-rendering the whole 15 seconds.
5. Deliverables 1. A.json or.png workflow file that is color-coded and organized into clear groups. 2. A simple "Setup Guide" listing the specific models and LoRAs to download. 3. A Test Render: A 15-second demonstration video showing a character moving through the three segments with zero identity drift. Apply tot his job