How to build an AI Faceless Video Generator SaaS

How to build an AI Faceless Video Generator SaaS

Author
Nandha KT
icon
Category
How To Build
Published on
January 30, 2025

Overview

An AI faceless video generator SaaS allows users to create professional-quality videos programmatically using AI, based on the Description and theme the user provides.

In this guide, we’ll focus on integrating Editly a versatile open-source video editing library, that allows you to programmatically generate videos by combining different types of media clips (text, audio, video, images) and adding transitions and effects.

Key Features of the AI Faceless Video Generator

Before we get into the technical details, here are the key features your platform will offer:

  • Script Generation: Based on a user’s topic, an AI model (GPT-4o) will generate a script for the video.
  • Voiceover: A text-to-speech API (Eleven Labs API) will turn the script into a professional voiceover.
  • Image Generation: Images or visuals will be created using generative AI models like FLUX 1.1 or Stable Diffusion 3.5. Optionally we can use a video generation model like LTX - Video to convert generated AI images into stunning videos.
  • Video Composition: Editly will combine the generated elements (script, voiceover, images) into a cohesive video, applying transitions, text overlays, and effects.
  • Cloud Hosting & Storage: The final videos will be hosted in the cloud, and users will receive a downloadable link to access their videos.

1. Tech Stack

Purpose
Tool/Framework Name
Website
Web App
NextJS
Auth & Database
Supabase
Programmatic Video Creation
Editly
Advanced Programmatic Video Creation (Alternatives)
Diffusion Studio, Remotion

2. AI APIs, Models To Use

API/Model Name
Description
Website
OpenAI API
To create Script & captions
Flux
AI Image Generation
LTX-Video (Optional)
Image to Video Generation
Eleven labs
Text to Speech Generation for Voiceovers

3. Frontend and User Interface

  1. New Video Gen Section - For MVP build a simple Form UI to allow users to input their video topic, select options for voiceover, visuals, and more.
  2. Overview Section - Inform users about the status of video creation (e.g., generating script, creating visuals, processing video). Number of Video Generation credits left in the current billing cycle
  3. History Section - Collection of users previously generated videos that provide users with a link to download or share their video.
  4. Billing Section - Allow users to Upgrade or Downgrade their plan
  5. Profile Section - Allow users to Update their Personal information here, update passwords & email
image

4. Backend and Video Generation

The backend for Video Creation with Editly involves several important steps:

  1. Processing requests from the frontend.
  2. Generating content (script, voiceover, images) using AI APIs.
  3. Video Composing with Editly
  4. Task Management for Video Processing (Optional)

Here’s a step-by-step breakdown of the backend process:

1. Processing requests from Frontend

The frontend sends a request (via an HTTP POST) to your backend with the necessary information (e.g., topic for the video, which could be used to generate a script). This data is received by the API layer.

2. Generating Content

Once the backend receives the request, it triggers calls to different AI APIs to generate the required assets:

  • Generate a Script: OpenAI GPT 4o
  • Generate Voiceover: ElevenLabs API
  • Generate Visuals: FLUX 1.1, LTX - Video(Optional)

3. Video Composing with Editly

Once all the necessary assets (script, voiceover, images) are generated, the backend uses Editly to combine them into a video.

  • Input Configuration: The backend sets up the video with the required clips:
    • Title clips for any text that appears.
    • Image clips for visuals (static images or generated art).
    • Audio clips for the voiceover and any background music.
  • Video Composition: The backend invokes Editly to create the video. This involves defining the layout (e.g., duration of each clip, transitions, and effects like fades or zooms) and rendering the final video.
    1. Here’s a basic example of how you define a video composition:

      {
        width: 900,
        height: 1600,
        outPath: './newsTitle.mp4',
        defaults: {
          layer: { fontPath: './assets/Patua_One/PatuaOne-Regular.ttf' },
        },
        clips: [
          { duration: 10, layers: [
            { type: 'image', path: './assets/91083241_573589476840991_4224678072281051330_n.jpg' },
            { type: 'news-title', text: 'BREAKING NEWS' },
            { type: 'subtitle', text: 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.', backgroundColor: 'rgba(0,0,0,0.5)' }
          ] },
        ],
      }
    2. Instruct OpenAI to give the script in a Structured JSON Output with timestamps to make this process easier.
  • Render the Video: The backend then renders the video in a specific format (e.g., MP4) at the desired resolution (e.g., 1080p or 4K).

4. Task Management for Video Processing (Optional)

If video generation is resource-intensive and may take a long time (minutes to hours), consider using a task queue to handle video generation asynchronously:

  • Task Queue Options: AWS SQS + Lambda: Use AWS SQS to queue video generation tasks, then process them asynchronously with Lambda functions.

5. Total Monthly Cost (Hosting + API Fees)

Tool
Cost
Cost per Video (Approx. for 1 minute Video)
OpenAI API GPT-4o Model
$0.01 / 1k Tokens
$0.065 - $0.08 2k input tokens 5-6k tokens to generate Structured JSON outputs
Flux
$0.025 / image
$0.25 - $0.3 10-12 images per video
LTX-Video (Optional)
$0.026 / video(6 seconds clip)
$0.26 - $0.3 10-12 video clips per video
Eleven labs
$0.11 / minute
$0.11 - $0.16 Cost may vary depending on your subscription plan
Total Cost
$0.6 - $0.8 per video

6. Boilerplates/Templates to Build this Tool

7. Is there room for a new player?

The broader market is crowded, but niches often remain underserved. Focus on a specific audience or use case, such as:

  • Podcast Clips: Generate videos based on stories in Podcasts
  • Educators: Creates lecture videos or explainer content.
  • Children Stories: Creates Disney Style Bedtime stories.
  • Horror Stories Generator: Creates Interesting Horror story videos.

Current Players:

Crayo.ai

Revid.ai

Autoshorts.ai

Invideo AI

Free Resources

image

AI Startup Ideas

Curated Collection of 80+ Simple AI Mobile Apps that makes more than $50k/month

Get Free Access

image

AI Mobile Apps List

Curated Collection of 80+ Simple AI Mobile Apps that makes more than $50k/month

Get Free Access

image

Marketing Resources

Subreddits to Promote, High DA Backlink Sites, Guest Post Sites & Startup Directories

Get Free Access