AI & ScriptBy HitCueJune 19, 20266 min read

OCR a Script PDF: When Your Script Is Scanned and Nothing Copies

OCR a Script PDF: When Your Script Is Scanned and Nothing Copies

If you can't select text in your script PDF, it's not a document — it's an image. OCR turns those scanned pages into readable, copyable text. But running OCR alone won't give you a rehearsal-ready script. The output usually has mangled character names, missing scene headings, and dialogue merged with stage directions. What you do before OCR — and the cleanup pass after — decides whether you end up with a usable script or 90 pages of garbled text. Here's how to check whether your PDF needs OCR, prep the scan, verify the output, and clean character names before rehearsal study.

How to Tell If Your Script PDF Needs OCR

The fastest test: try to select a line of text on any page. If your cursor or finger drags a box across the page instead of highlighting individual words, the PDF is a scan. Three other signs:

  • The file size is much larger than expected. Scanned PDFs are often much heavier than text-based PDFs because every page is stored as an image.
  • Text looks crooked or has speckles in the background. Scanned photocopies show scanner artifacts, page edges, and slight tilt. Text-based PDFs don't.
  • A keyword search returns nothing. Search for a character's name. If the PDF finds zero matches even though the name is clearly on multiple pages, there is no text layer to search.

If any of these are true, OCR is the only path from your PDF to a script you can actually study.

What to Do Before You Run OCR

Most OCR mistakes happen because the input was bad, not because the tool was bad. A few minutes of prep prevents an hour of cleanup.

  1. Strip out non-script pages. Remove covers, programme inserts, blank pages, and casting pages. OCR processes every page, and any page that isn't script slows the job and adds noise to the output.
  2. Straighten skewed pages. Most PDF apps have a "deskew" or "auto-rotate" function. A tilted page lowers OCR accuracy noticeably — especially on character names in all caps.
  3. Increase contrast on faint scans. If you received a photocopy of a photocopy, run a contrast filter first. Faded text trips OCR more than any other factor.
  4. Check resolution. Use the clearest scan you can get. Low-resolution pages produce broken words, missing punctuation, and character names that need more cleanup.

If your director sent the script via WhatsApp or as phone photos, treat it like a scan and run OCR — even if the file looks like a normal PDF when you open it.

A Step-by-Step OCR Workflow for Actors

This works the same whether you use Adobe Acrobat, Preview on Mac, an online tool, or a dedicated OCR app. The order is what matters.

  1. Save a working copy of the original PDF. Never OCR your only copy. If something goes wrong with the conversion, you want the source untouched.
  2. Run OCR on the working copy. Most tools let you choose between "searchable PDF" (text layer added to the existing images) and "text export" (plain text only). For rehearsal use, the searchable PDF is more useful — it keeps the layout while making text selectable.
  3. Verify the output on three test pages. Open the OCR'd PDF, try to select a line of dialogue, and paste it into a notes app. If the text comes through clean, the OCR worked. If you see boxes, missing letters, or random characters, the input quality needs to improve before continuing.
  4. Export plain text or upload the OCR'd PDF to a parsing tool. Once you trust the OCR output, you can either keep it as a searchable PDF for reading or pull the text out for restructuring into acts, scenes, and characters.

When the text is clean, the next step is turning it into a rehearsal-ready structure — acts, scenes, character lines, cue points. A script-aware parser can take that clean OCR output and break it into navigable acts and scenes; run a character-name pass afterward to fix any OCR slips that made it through before your first off-book pass.

This sits inside the broader digital script workflow for actors, which starts with file conversion and ends with character-level practice.

Cleanup Pass: Common OCR Slips to Fix First

Even good OCR makes the same handful of mistakes on theatre scripts. Catch them once, and the rest of the script becomes navigable.

OCR slipWhat it looks likeQuick fix
Character name mis-spelled"JUL1A" instead of "JULIA", or "JU LIA" with a stray spaceFind-replace by character; verify with a line count per name
Scene headings missing"ACT II SCENE 3" merged into the previous dialogue paragraphAdd a line break before the heading; mark with a consistent format
Stage directions merged with dialogue"[She exits.] Tomorrow then." on one lineSplit into two lines; brackets keep the direction visible
Page numbers inside the textA scene that ends with "— 47 —" mid-lineSearch for digit patterns and remove; common in older typeset scripts
Em dashes converted to hyphens"I just — wait" becomes "I just - wait"Fix broken pauses manually instead of running a global find-replace

Don't try to fix every error in one pass. Fix character names first — that's what every downstream tool will use to organise lines. Scene headings second. The rest can be cleaned as you encounter them in study.

If a long stretch of text comes out unreadable, that section probably had worse scan quality than the rest. Rescan or rephotograph just those pages and re-run OCR on the patch — it's faster than fighting a bad input line by line.

Do it in HitCue

  • Automatic AI parsing: Takes sufficiently cleaned OCR text and turns it into a navigable script with acts, scenes, and character lines.
  • Character resolution: Catches the names OCR mangled (broken across lines, mis-spelled, duplicated) and lets you merge them into the right character without losing any dialogue.
  • Line editing: Repair the OCR slips you spot while studying — merged stage directions, missing words, broken dashes — inline, without re-uploading the whole script.

Upload your OCR'd script to HitCue, let Automatic AI parsing do the structural work, then sweep up the remaining OCR slips with Character resolution and Line editing before your first off-book pass. → Download HitCue

Related Articles

Extract Dialogue From a PDF Script (So You Can Practice by Character)

Extract Dialogue From a PDF Script (So You Can Practice by Character)

The fastest way to practice lines isn't reading the script from top to bottom. It's isolating your character's dialogue — just your cues and your responses — so every session stays focused on your part. The problem: PDF scripts aren't organized that way. You scroll pages of mixed dialogue, scan for your character's name, lose the cue context, and end up reading everything instead of practicing.

7 min read

Convert a Script PDF to Text: Clean Structure for Rehearsal (Including Scanned PDFs)

Convert a Script PDF to Text: Clean Structure for Rehearsal (Including Scanned PDFs)

To convert a script PDF to text you can actually use in rehearsal, the first step is knowing what kind of PDF you have — embedded text or scanned image. Each type needs a different approach, and skipping that check is why standard converters leave you with a mess: character names merged with dialogue, scene headings missing, stage directions inline with spoken lines. This guide gives you the full workflow, from identifying your file type through verifying the output before you start drilling.

6 min read

Community Theatre Tips for Actors: Fast Prep When Rehearsal Time Is Limited

Community Theatre Tips for Actors: Fast Prep When Rehearsal Time Is Limited

In community theatre, time is the constraint you can't fix. You get two, maybe three rehearsal evenings a week, and everything in the community theatre prep cycle — learning your lines, tracking blocking notes, running cues — has to happen around a day job. The mistake most actors make is treating that as a willpower problem. It's not. It's a systems problem. The four systems below won't give you more hours. They'll make the hours you have stop going to waste.

7 min read