Getting Started

How to Make an Audiobook: Complete Guide for Authors

Making an audiobook used to mean a $3,000–$5,000 studio bill, six weeks of scheduling around a narrator, and a stack of revision rounds. AI narration collapsed that to an afternoon of work and a few dollars per finished hour — if you pick the right voice and edit with intent.

This guide walks the full pipeline: prep your manuscript, choose a narrator, generate audio chapter by chapter, fix the inevitable mispronunciations, and export a retailer-ready file. We'll use AuthorVoices.ai for the screenshots, but the workflow is the same wherever you produce.

1

Before you start: decide what "done" looks like

An audiobook isn't one file format — it's a target. Pin yours down before you upload anything:

  • MP3 ZIP, ACX-mastered — one MP3 per chapter, RMS and peak normalized to retailer specs. This is what most distributors (Findaway, Kobo, Google Play, Apple Books, Spotify) accept.
  • M4B with chapter markers — single-file audiobook with embedded cover art and a clickable chapter list. Best for direct sales, BookFunnel, or your own website.
  • Raw WAV — only if a sound engineer is doing post-production for you.
2

Step 1: Prep your manuscript

Clean your EPUB or DOCX before upload. Five minutes here saves an hour later.

  • Strip front matter you don't want narrated (copyright pages, dedications you'd rather skip, ISBN blocks).
  • Spell out things the narrator will mangle: "Dr." → "Doctor," "St." → "Saint" or "Street" depending on context, Roman numerals → words.
  • Add a short audiobook-only intro line ("This is the audiobook edition of...") if you want one.
  • Make sure chapter headings use proper heading styles in DOCX, or <h1>/<h2> in EPUB. That's how the parser splits chapters.
3

Step 2: Create the project and upload

Start a new project and drop in your file. EPUB gives the cleanest chapter detection; DOCX works fine if your headings are styled. Plain text paste is the fallback for short pieces.

Upload your EPUB or DOCX to start a new project
Upload your EPUB or DOCX to start a new project

The parser splits your book into chapters and then into sections (paragraph-sized chunks of roughly 200–600 words). Sections are the unit of narration — small enough to regenerate cheaply when one line goes sideways, large enough that you're not clicking forever.

4

Step 3: Audition narrators

This is the step most first-timers rush. Don't. The wrong voice tanks completion rate more than any other production decision.

Filter the 54 narrators by gender, language, and Studio-eligibility
Filter the 54 narrators by gender, language, and Studio-eligibility

Filter the catalog by gender, language, and accent. Play the previews against a paragraph of your prose, not the demo script. A voice that sounds great reading thriller dialogue can feel wrong on a slow literary scene. Shortlist three, then narrate the same 500-word section with each one and listen back.

Want something nobody else has? Clone your own voice from a 30-second clean sample. Record in a quiet room, no music, no plosives.

Clone your own voice from a 30-second sample for a private narrator
Clone your own voice from a 30-second sample for a private narrator
5

Step 4: Narrate chapter by chapter (or batch the whole book)

You have two paths:

  1. Section-by-section with Instant Credits. Pay-as-you-go, credits never expire. Best when you're still tuning the voice or writing as you go.
  1. Whole Book batch queue (Studio subscription). Queue the entire manuscript, walk away, come back to a finished draft. Best once you've locked the narrator and want to move fast.
Narrate section-by-section or queue the whole book at once
Narrate section-by-section or queue the whole book at once

Most authors run a hybrid: narrate chapter 1 section-by-section to confirm the voice is right, then batch the rest of the book overnight.

6

Step 5: Listen, fix, mark proofed

This is the actual work. Budget roughly 1.5× the runtime of your book for proofing — a 6-hour audiobook takes about 9 hours to listen through with a finger on the edit button.

For each section:

  • Listen at 1× speed, headphones on. Mispronunciations and weird emphasis hide at 1.5×.
  • When something's wrong, use Quick Fix to select just the broken phrase and regenerate that selection — not the whole section. Cheaper, faster, and the surrounding audio stays identical.
  • Mark the section Proofed once it's good. The flag is your save-your-place mechanism for a multi-day proof.
7

Step 6: Master and export

When every section is proofed, export. You'll get a choice of formats:

  • MP3 ZIP — one file per chapter, ACX-mastered (−23 to −18 dB RMS, −3 dB peak, ≤ −60 dB noise floor). Use this for retailer distribution.
  • M4B — single file with embedded chapter markers and your cover art. Use this for direct sales and review copies.

If you only have raw audio from somewhere else and need it mastered to spec, the standalone Distribution Ready tool will do it for you.

Master raw audio to retailer specs with the Distribution Ready tool
Master raw audio to retailer specs with the Distribution Ready tool
8

Step 7: Distribute

Upload your MP3 ZIP to a distributor that accepts AI narration. AuthorVoices pushes through SelfPublishing.pro to 50+ retailers in a single submission — Findaway Voices' wider network, Kobo, Google Play, Apple Books, Spotify, Storytel, and others. You set the price once; they handle the per-retailer formatting.

For a deeper walkthrough of the upload-to-distribution arc, see How to Turn a Book Into an Audiobook. If you're specifically wondering about Audible, read How to Make an Audible Book (and Why You Probably Shouldn't) before you spend any time on it. Coming from an existing ebook? How to Convert an Ebook to an Audiobook covers the EPUB-specific quirks.

9

What it actually costs

For a 60,000-word novel (roughly 6.5 hours of audio):

  • Instant Credits — pay per section narrated, credits never expire. Best for one-off projects or short works.
  • Studio subscription — $49/$99/$149 per month with 17% off annual, includes the Whole Book batch queue and the 36 Studio-eligible premium narrators. Pays off above one full-length book per quarter.

Compare against $200–$400 per finished hour for a human narrator (so $1,300–$2,600 for the same book), and the math is hard to argue with — provided the voice you pick actually fits the book.

Frequently asked

How do I make an audiobook from my own book?
Export your manuscript as EPUB or DOCX with proper chapter headings, upload it to an AI narration platform like AuthorVoices.ai, audition narrators against a sample paragraph of your prose, and narrate either section-by-section or as a Whole Book batch. Proof every section with headphones at 1× speed, fix mispronunciations with selection-level edits, then export an ACX-mastered MP3 ZIP for retailers or an M4B for direct sales. Plan on roughly 1.5× the finished runtime for proofing time.
How to create your own audio book without a recording studio?
Use AI narration. Upload your EPUB or DOCX, pick from a curated voice catalog (or clone your own from a 30-second clean sample), and let the system generate the audio. You skip the booth rental, the engineer, and the schedule juggling — but you take on the proofing work yourself. Budget an afternoon to audition voices, a day or two to narrate and queue, and 1.5× your book's runtime to proof. Output is mastered to broadcast spec automatically.
How to make an audiobook file that retailers will accept?
Most non-Audible retailers want ACX-mastered MP3s — one file per chapter, RMS between −23 and −18 dB, peaks at or below −3 dB, noise floor under −60 dB, and 192 kbps or higher. Some accept M4B with embedded chapter markers. AuthorVoices.ai exports both formats already mastered to spec. If you have raw audio from elsewhere, run it through the Distribution Ready tool to bring it into compliance before you upload.
How to produce an audiobook for Audible or ACX?
Audible and ACX prohibit AI-narrated audiobooks unless they're produced through Audible's own internal tools. If you're using a third-party AI narrator, don't target ACX — your submission will be rejected at quality control. Distribute through the 50+ other retailers instead (Apple Books, Google Play, Kobo, Spotify, Storytel, Findaway's network). For human-narrated ACX submissions, you'd hire a narrator on the ACX marketplace, which is a different process entirely.
How long does it take to make an audiobook?
With AI narration, generation itself is fast — a 60,000-word novel batches in a few hours unattended. The bottleneck is proofing. Plan on roughly 1.5× the finished runtime to listen through carefully and catch mispronunciations, so a 6-hour audiobook takes around 9 hours of focused proof time. Compare with traditional production, where booking a narrator, recording, and post takes 4–8 weeks. Most authors using AI go from manuscript to retailer-ready file in under a week.
How can I make an audiobook recording sound natural?
Three things matter most. First, audition narrators against your actual prose, not the platform's demo script — a voice that fits a thriller can feel off in literary fiction. Second, edit the source text for the ear: spell tricky names phonetically, expand abbreviations, add commas where you want a breath. Third, use selection-level regeneration (Quick Fix) to repair single phrases instead of re-narrating whole sections, so the surrounding pacing stays intact. Avoid stacking too many edits in one section — small fixes preserve flow.