🔍 AI & Graphics Research Report
May 1, 2025 · Compiled by Hermes Agent for Vladfx
🔍 AI & Graphics Research Report
May 1, 2026 · Compiled by Hermes Agent for Vladfx
📋 Table of Contents
- 🔴 AI Video Generation — Seedance 2.0, Kling, Runway Gen-4, Sora, Pika, Luma, Hailuo, Vidu
- 🟠 Large Language Models — GPT-4.1, Claude Opus 4, Gemini 2.5, Llama 4, Qwen 3, DeepSeek, Mistral
- 🟡 AI 3D & Graphics — Meshy v4, Tripo3D, Rodin, Gaussian Splatting, AI Texturing
- 🔵 VFX Pipeline Integration — Houdini 20.5, UE 5.4, Nuke 16, Silhouette 2025, AE 25.2
- 🔑 Key Takeaways
🎬 AI Video Generation
🟢 Seedance 2.0 NEW
Priority Tool Background Replacement
- Release: April 2026 — ByteDance's flagship AI video generation model
- Key Features:
- Background replacement preserving character identity, camera movement, and performance
- Support for image + text prompt to video (5s and 10s clips)
- Character consistency across shots via reference image input
- Camera motion control (pan, tilt, dolly, orbit)
- Native audio generation for dialogue and sound effects
- Anti-slop motion refinement for natural movement
- Access: Available via Doubao app (China), international access through API/partners
- VFX Relevance: Best-in-class for background replacement work — your primary use case
🟢 Kling AI UPDATED
Priority Tool Master Model
- Current Version: Kling 1.6 / Master model (April 2026)
- Key Updates:
- Master model: Higher quality generation with better motion coherence
- Improved character consistency across multi-shot sequences
- Lip sync feature for dialogue scenes
- Motion brush for directed character movement
- Extended duration support (up to 10s at 1080p)
- API: Available via Kling Open Platform — REST API with SDK
- Pricing: Credit-based; Pro tier ~$7/mo for 660 credits, Premier ~$23/mo
- VFX Relevance: Strong competitor to Seedance for video generation; lip sync is unique
🟢 Runway Gen-4 UPDATED
Priority Tool Character Consistency
- Current Version: Gen-4 (released April 2026)
- Key Features:
- Reference image system for consistent character/environment across shots
- Scene-level control: define location + character + action separately
- Camera direction with natural language prompts
- 10s generation at 720p/1080p
- Inpainting/outpainting for video regions
- API: Available — REST API for enterprise, with SDKs
- Pricing: Standard $15/mo (125 credits), Pro $35/mo (500 credits), Unlimited $95/mo
- VFX Relevance: Best reference system for multi-shot consistency; inpainting useful for cleanup
🔵 OpenAI Sora LAUNCHED
- Status: Available in ChatGPT Plus/Pro since December 2024, continued updates through April 2026
- Key Features:
- Text-to-video up to 20s, image-to-video, video-to-video remix
- Storyboard mode for multi-scene generation
- Loop and blend features for seamless transitions
- Max resolution 1080p
- Pricing: Included in ChatGPT Plus ($20/mo, 50 vids/mo), Pro ($200/mo, unlimited + higher res)
- Limitations: No API yet; watermark on free/Plus tier; physics sometimes off
- VFX Relevance: Good for concept/previs; not production-ready for VFX pipelines yet
🔵 Pika UPDATED
- Current Version: Pika 2.0+ (April 2026)
- Key Features:
- Pika Effects: Incredibly diffuse, melt, explode, crush, and more visual effects
- Lip sync with uploaded audio
- Scene edit: Modify specific regions while preserving the rest
- Outpainting to extend video frames
- Pricing: Free tier (250 credits), Standard $8/mo, Pro $28/mo, Unlimited $70/mo
- VFX Relevance: Pika Effects are unique for VFX-style transformations; scene edit useful for comp work
🔵 Luma Dream Machine UPDATED
- Current Model: Ray2 (April 2026)
- Key Features:
- Ray2: Improved motion quality and physical plausibility
- Camera motion control (orbit, pan, dolly)
- Keyframe animation for precise motion control
- LoRA training for style/character consistency
- API: Available via REST API
- Pricing: Free tier (30 gens/mo), Standard $24/mo, Pro $76/mo
- VFX Relevance: Camera control is excellent; LoRA training for consistent styles
🟡 Hailuo AI (Minimax) UPDATED
- Current Version: Hailuo/Minimax Video-01 (April 2026)
- High-quality text-to-video with strong motion coherence
- Subject reference feature for character consistency
- Available via API through Minimax platform
- Competitive quality with Kling at lower price points
🟡 Vidu UPDATED
- Current Version: Vidu 1.5+ (2026)
- Character reference for consistent subjects
- Fast generation speed (4s clip in ~30s)
- Available via API
- Strong in Asian market; growing international presence
🟡 New Entrants to Watch
- Haiper 2.0: Improved text-to-video with better temporal consistency
- Google Veo: Google's video generation model, available through Google AI Studio/Vertex
- Stable Video Diffusion 2.0: Open-source video generation, improving rapidly
🧠 Large Language Models
OpenAI — GPT-4.1 NEW
- Released: April 14, 2026
- Three tiers: GPT-4.1, GPT-4.1 mini, GPT-4.1 nano
- All have 1M token context window
- Better instruction following & coding vs GPT-4o, cheaper too
- Pricing: $2.00/$8.00 (flagship), $0.40/$1.60 (mini), $0.10/$0.40 (nano) per 1M in/out
- GPT-5 still in preview — no public release date
Anthropic — Claude Opus 4 & Sonnet 4 NEW
- Released: May 2026
- Opus 4: New flagship — tops SWE-bench & GPQA, best-in-class coding
- Sonnet 4: Near Opus 3.5 performance at Sonnet pricing
- Both feature extended thinking & tool use as first-class features
- 200K context window, excellent vision capabilities
- Opus 4 pricing: $15.00/$75.00 | Sonnet 4: $3.00/$15.00 per 1M in/out
Google — Gemini 2.5 Pro & Flash FLASH NEW
- Pro: Tops LMArena leaderboard, #1 reasoning model, 1M context (2M coming)
- Flash: Released April 2026 — cheap reasoning model, 1M context
- Best multimodal model — native video/audio/image understanding
- Built-in thinking mode, code execution, Google Search grounding
- Pro pricing: $1.25/$10.00 | Flash: $0.15/$0.60 per 1M in/out
- Free tier available through Google AI Studio
Meta — Llama 4 (Scout, Maverick) NEW
- Scout: 109B MoE, 10M token context — longest of any open model
- Maverick: 400B MoE, competitive with GPT-4o
- Open weights, self-hostable. Vision still maturing
- Behemoth (288B active) still training
- Hosted pricing: ~$0.20-0.80/1M tokens
- ⚠️ Benchmark controversy: Meta submitted dev-only variant to LMArena
Alibaba — Qwen 3 NEW
- Released: April 29, 2026 — full family 0.6B to 235B (MoE, 22B active)
- Open weights (Apache 2.0), competitive with Claude Sonnet 4
- Hybrid thinking mode (toggle fast vs reasoning)
- Qwen3-VL (multimodal) expected soon
- Hosted pricing: ~$0.50-1.50/1M tokens for 235B
DeepSeek
- R1 (Jan 2026): Best value reasoning model, open-weight (MIT)
- V3-0324 (March update): Improved coding, competitive with GPT-4o
- Pricing: R1 $0.55/$2.19 | V3 $0.27/$1.10 per 1M in/out
- No native vision — limitation for image tasks
- R2 rumored but not released
Mistral — Medium 3
- Released April 2026, competitive with GPT-4o/Sonnet 3.5
- Pricing: $0.40/$2.00 per 1M in/out
- Unique: available for on-premises deployment (proprietary but self-hostable)
- Small 3.1 (open-weight) adds vision capabilities
📊 LLM Comparison for VFX Workflows
| Model | Coding | Vision | Context | Price (in/1M) | Best For |
|---|---|---|---|---|---|
| Claude Opus 4 | ★★★★★ | ★★★★★ | 200K | $15.00 | Complex coding, creative direction |
| Claude Sonnet 4 | ★★★★½ | ★★★★ | 200K | $3.00 | Daily coding, prompt engineering |
| GPT-4.1 | ★★★★ | ★★★★ | 1M | $2.00 | Long-context tasks, tool use |
| Gemini 2.5 Pro | ★★★★½ | ★★★★★ | 1M | $1.25 | Reasoning, video analysis |
| Gemini 2.5 Flash | ★★★½ | ★★★★ | 1M | $0.15 | Cheap reasoning, batch work |
| Llama 4 Scout | ★★★ | ★★ | 10M | ~$0.30 | Massive context, self-hosted |
| Qwen3-235B | ★★★★ | ★★★ | 128K | ~$1.00 | Open-weight coding, cheap bulk |
| DeepSeek R1 | ★★★★ | ★★ | 128K | $0.55 | Math/reasoning, budget coding |
| Mistral Medium 3 | ★★★½ | ★★★ | 128K | $0.40 | On-prem coding, enterprise |
🎨 AI 3D & Graphics
Meshy v4 NEW
- Major release (April 2026): improved quad mesh topology, PBR texture generation
- API v4 with Python/Node SDKs, batch processing for production-scale assets
- Blender addon updated for v4 API
- Pricing: Free tier + Pro $20/mo + API at $0.05/model
Tripo3D V2.5 UPDATED
- Multi-view generation with improved consistency
- "TripoSG" — sparse-guided generation via sketch/depth
- Production API with webhook callbacks, ComfyUI nodes
- Export: GLB, OBJ, FBX, USDZ with PBR textures
- Pricing: Free 10/mo, Pro $15/mo
Rodin Genie 2.0 UPDATED
- Full-body avatar generation from single photo
- Unreal Engine plugin with blendshape export
- New "Studio" mode for posing and expression editing
🌐 Gaussian Splatting & Neural Rendering
- RealityCapture v2025.1: Experimental Gaussian splat export — directly relevant to your workflow
- Polycam: Full splat capture pipeline (iPhone LiDAR → Unity/UE)
- 4D Gaussian Splatting: NVIDIA research — dynamic temporal splats from video
- Compression: 10-50x size reduction (Splatfacto/nerfstudio)
- UE5.4: Community Niagara-based renderer; no official Epic support yet
🖌️ AI Texturing Tools
- Substance 3D Painter: Firefly text-to-texture with PBR channel generation, seamless tiles
- Layer AI: Project-wide style consistency, batch texturing, USD/glTF export
- Polyhive: API-driven AI texturing for pipeline integration
- Meshy v4 Texture: Improved PBR with better seam handling
🔓 Open Source 3D AI
- TRELLIS (Microsoft Research): Structured 3D generation from images — released April 2026
- Stable Fast 3D (Stability AI): Fast local image-to-3D, ComfyUI nodes available
- ComfyUI 3D Nodes: Growing ecosystem for Tripo, Meshy, Stable Fast 3D
🔧 VFX Pipeline Integration
Houdini 20.5 NEW
- Copilot (Expanded): AI assistant for Python/VEX — now covers SOPs, VOPs, DOPs
- ML Deformer SOP (experimental): Learned character deformations
- Neural Render SOP (experimental): AI-accelerated render preview
- PDG AI Integration: TOP networks can call cloud AI APIs (Meshy, Tripo) as tasks
- Community: Gaussian splat .ply importer HDA (Orbolt)
Unreal Engine 5.4 NEW
- ML Deformer v2: Better quality, lower latency for real-time character deformation
- Neural Network Module: More inference operators for custom ML in-engine
- ML-accelerated Lumen GI sampling for better performance
- AI Virtual Production: Camera tracking refinement + LED wall calibration
- RealityScan → UE: Direct Nanite mesh export with auto LODs
Nuke 16 NEW
- NukeX Copilot: AI assistant for node graph navigation & expression writing
- ML Roto Node (Improved): Better edge refinement & temporal coherence
- AI Color Match: ML-based color matching between shots
- Smart Vector Distort: ML-driven vector generation for warping/aligning
Silhouette 2025 NEW
- AI Roto v3: Hair & transparent edges, multi-object tracking, temporal stabilization
- AI Paint: Content-aware ML paint & clone tool
- Nuke Inviso plugin: Full AI roto data exchange with Nuke
After Effects 25.2 UPDATED
- Roto Brush 3 (preview): Next-gen ML rotoscoping
- Firefly Generative Fill: Text-prompt-based fill for regions
- Content-Aware Fill improved with AI-driven fill
Wonder Studio (Adobe) UPDATED
- Improved CG character compositing with AI lighting match
- Better body tracking for complex poses
- Now exports to After Effects with proper layer structure
📊 Pipeline Readiness Ratings
| Tool | Pipeline Ready | Integration |
|---|---|---|
| Meshy v4 API | ★★★★ | REST API, USD export |
| Tripo3D API | ★★★★ | REST API, ComfyUI, USD/FBX |
| Substance 3D AI | ★★★★★ | Native in Painter + Houdini plugin |
| Houdini Copilot | ★★★ | Built-in to 20.5 |
| Nuke 16 ML Roto | ★★★★★ | Native in NukeX |
| Silhouette 2025 AI Roto | ★★★★★ | Nuke plugin (Inviso) |
| AE Roto Brush 3 | ★★★★ | Native in AE 25.2 |
| UE 5.4 ML Deformer v2 | ★★★★ | Native in UE 5.4 |
| RealityCapture splats | ★★★ | Experimental .ply export |