Table of Contents
- Overview
- Key Features of the AI Faceless Video Generator
- 1. Tech Stack
- 2. AI APIs, Models To Use
- 3. Frontend and User Interface
- 4. Backend and Video Generation
- 5. Total Monthly Cost (Hosting + API Fees)
- 6. Boilerplates/Templates to Build this Tool
- 7. Is there room for a new player?
- Current Players:
- Free Resources
Get Free Access to 80+ Simple AI Wrapper Apps that makes more than $50k per Month
Overview
An AI faceless video generator SaaS allows users to create professional-quality videos programmatically using AI, based on the Description and theme the user provides.
In this guide, we’ll focus on integrating Editly a versatile open-source video editing library, that allows you to programmatically generate videos by combining different types of media clips (text, audio, video, images) and adding transitions and effects.
Key Features of the AI Faceless Video Generator
Before we get into the technical details, here are the key features your platform will offer:
- Script Generation: Based on a user’s topic, an AI model (GPT-4o) will generate a script for the video.
- Voiceover: A text-to-speech API (Eleven Labs API) will turn the script into a professional voiceover.
- Image Generation: Images or visuals will be created using generative AI models like FLUX 1.1 or Stable Diffusion 3.5. Optionally we can use a video generation model like LTX - Video to convert generated AI images into stunning videos.
- Video Composition: Editly will combine the generated elements (script, voiceover, images) into a cohesive video, applying transitions, text overlays, and effects.
- Cloud Hosting & Storage: The final videos will be hosted in the cloud, and users will receive a downloadable link to access their videos.
1. Tech Stack
Purpose | Tool/Framework Name | Website |
Web App | NextJS | |
Auth & Database | Supabase | |
Programmatic Video Creation | Editly | |
Advanced Programmatic Video Creation (Alternatives) | Diffusion Studio, Remotion |
2. AI APIs, Models To Use
API/Model Name | Description | Website |
OpenAI API | To create Script & captions | |
Flux | AI Image Generation | |
LTX-Video (Optional) | Image to Video Generation | |
Eleven labs | Text to Speech Generation for Voiceovers |
3. Frontend and User Interface
- New Video Gen Section - For MVP build a simple Form UI to allow users to input their video topic, select options for voiceover, visuals, and more.
- Overview Section - Inform users about the status of video creation (e.g., generating script, creating visuals, processing video). Number of Video Generation credits left in the current billing cycle
- History Section - Collection of users previously generated videos that provide users with a link to download or share their video.
- Billing Section - Allow users to Upgrade or Downgrade their plan
- Profile Section - Allow users to Update their Personal information here, update passwords & email
4. Backend and Video Generation
The backend for Video Creation with Editly involves several important steps:
- Processing requests from the frontend.
- Generating content (script, voiceover, images) using AI APIs.
- Video Composing with Editly
- Task Management for Video Processing (Optional)
Here’s a step-by-step breakdown of the backend process:
1. Processing requests from Frontend
The frontend sends a request (via an HTTP POST) to your backend with the necessary information (e.g., topic for the video, which could be used to generate a script). This data is received by the API layer.
2. Generating Content
Once the backend receives the request, it triggers calls to different AI APIs to generate the required assets:
- Generate a Script: OpenAI GPT 4o
- Generate Voiceover: ElevenLabs API
- Generate Visuals: FLUX 1.1, LTX - Video(Optional)
3. Video Composing with Editly
Once all the necessary assets (script, voiceover, images) are generated, the backend uses Editly to combine them into a video.
- Input Configuration: The backend sets up the video with the required clips:
- Title clips for any text that appears.
- Image clips for visuals (static images or generated art).
- Audio clips for the voiceover and any background music.
- Video Composition: The backend invokes Editly to create the video. This involves defining the layout (e.g., duration of each clip, transitions, and effects like fades or zooms) and rendering the final video.
- Instruct OpenAI to give the script in a Structured JSON Output with timestamps to make this process easier.
- Render the Video: The backend then renders the video in a specific format (e.g., MP4) at the desired resolution (e.g., 1080p or 4K).
Here’s a basic example of how you define a video composition:
{
width: 900,
height: 1600,
outPath: './newsTitle.mp4',
defaults: {
layer: { fontPath: './assets/Patua_One/PatuaOne-Regular.ttf' },
},
clips: [
{ duration: 10, layers: [
{ type: 'image', path: './assets/91083241_573589476840991_4224678072281051330_n.jpg' },
{ type: 'news-title', text: 'BREAKING NEWS' },
{ type: 'subtitle', text: 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.', backgroundColor: 'rgba(0,0,0,0.5)' }
] },
],
}
4. Task Management for Video Processing (Optional)
If video generation is resource-intensive and may take a long time (minutes to hours), consider using a task queue to handle video generation asynchronously:
- Task Queue Options: AWS SQS + Lambda: Use AWS SQS to queue video generation tasks, then process them asynchronously with Lambda functions.
5. Total Monthly Cost (Hosting + API Fees)
Tool | Cost | Cost per Video (Approx. for 1 minute Video) |
OpenAI API
GPT-4o Model | $0.01 / 1k Tokens | $0.065 - $0.08
2k input tokens
5-6k tokens to generate Structured JSON outputs |
Flux | $0.025 / image | $0.25 - $0.3
10-12 images per video |
LTX-Video (Optional) | $0.026 / video(6 seconds clip) | $0.26 - $0.3
10-12 video clips per video |
Eleven labs | $0.11 / minute | $0.11 - $0.16
Cost may vary depending on your subscription plan |
Total Cost | $0.6 - $0.8 per video |
6. Boilerplates/Templates to Build this Tool
7. Is there room for a new player?
The broader market is crowded, but niches often remain underserved. Focus on a specific audience or use case, such as:
- Podcast Clips: Generate videos based on stories in Podcasts
- Educators: Creates lecture videos or explainer content.
- Children Stories: Creates Disney Style Bedtime stories.
- Horror Stories Generator: Creates Interesting Horror story videos.
Current Players:
Free Resources
AI Startup Ideas
Curated Collection of 80+ Simple AI Mobile Apps that makes more than $50k/month
AI Mobile Apps List
Curated Collection of 80+ Simple AI Mobile Apps that makes more than $50k/month
Marketing Resources
Subreddits to Promote, High DA Backlink Sites, Guest Post Sites & Startup Directories