If you can't select text in your script PDF, it's not a document — it's an image. OCR turns those scanned pages into readable, copyable text. But running OCR alone won't give you a rehearsal-ready script. The output usually has mangled character names, missing scene headings, and dialogue merged with stage directions. What you do before OCR — and the cleanup pass after — decides whether you end up with a usable script or 90 pages of garbled text. Here's how to check whether your PDF needs OCR, prep the scan, verify the output, and clean character names before rehearsal study.
How to Tell If Your Script PDF Needs OCR
The fastest test: try to select a line of text on any page. If your cursor or finger drags a box across the page instead of highlighting individual words, the PDF is a scan. Three other signs:
- The file size is much larger than expected. Scanned PDFs are often much heavier than text-based PDFs because every page is stored as an image.
- Text looks crooked or has speckles in the background. Scanned photocopies show scanner artifacts, page edges, and slight tilt. Text-based PDFs don't.
- A keyword search returns nothing. Search for a character's name. If the PDF finds zero matches even though the name is clearly on multiple pages, there is no text layer to search.
If any of these are true, OCR is the only path from your PDF to a script you can actually study.
What to Do Before You Run OCR
Most OCR mistakes happen because the input was bad, not because the tool was bad. A few minutes of prep prevents an hour of cleanup.
- Strip out non-script pages. Remove covers, programme inserts, blank pages, and casting pages. OCR processes every page, and any page that isn't script slows the job and adds noise to the output.
- Straighten skewed pages. Most PDF apps have a "deskew" or "auto-rotate" function. A tilted page lowers OCR accuracy noticeably — especially on character names in all caps.
- Increase contrast on faint scans. If you received a photocopy of a photocopy, run a contrast filter first. Faded text trips OCR more than any other factor.
- Check resolution. Use the clearest scan you can get. Low-resolution pages produce broken words, missing punctuation, and character names that need more cleanup.
If your director sent the script via WhatsApp or as phone photos, treat it like a scan and run OCR — even if the file looks like a normal PDF when you open it.
A Step-by-Step OCR Workflow for Actors
This works the same whether you use Adobe Acrobat, Preview on Mac, an online tool, or a dedicated OCR app. The order is what matters.
- Save a working copy of the original PDF. Never OCR your only copy. If something goes wrong with the conversion, you want the source untouched.
- Run OCR on the working copy. Most tools let you choose between "searchable PDF" (text layer added to the existing images) and "text export" (plain text only). For rehearsal use, the searchable PDF is more useful — it keeps the layout while making text selectable.
- Verify the output on three test pages. Open the OCR'd PDF, try to select a line of dialogue, and paste it into a notes app. If the text comes through clean, the OCR worked. If you see boxes, missing letters, or random characters, the input quality needs to improve before continuing.
- Export plain text or upload the OCR'd PDF to a parsing tool. Once you trust the OCR output, you can either keep it as a searchable PDF for reading or pull the text out for restructuring into acts, scenes, and characters.
When the text is clean, the next step is turning it into a rehearsal-ready structure — acts, scenes, character lines, cue points. A script-aware parser can take that clean OCR output and break it into navigable acts and scenes; run a character-name pass afterward to fix any OCR slips that made it through before your first off-book pass.
This sits inside the broader digital script workflow for actors, which starts with file conversion and ends with character-level practice.
Cleanup Pass: Common OCR Slips to Fix First
Even good OCR makes the same handful of mistakes on theatre scripts. Catch them once, and the rest of the script becomes navigable.
| OCR slip | What it looks like | Quick fix |
|---|---|---|
| Character name mis-spelled | "JUL1A" instead of "JULIA", or "JU LIA" with a stray space | Find-replace by character; verify with a line count per name |
| Scene headings missing | "ACT II SCENE 3" merged into the previous dialogue paragraph | Add a line break before the heading; mark with a consistent format |
| Stage directions merged with dialogue | "[She exits.] Tomorrow then." on one line | Split into two lines; brackets keep the direction visible |
| Page numbers inside the text | A scene that ends with "— 47 —" mid-line | Search for digit patterns and remove; common in older typeset scripts |
| Em dashes converted to hyphens | "I just — wait" becomes "I just - wait" | Fix broken pauses manually instead of running a global find-replace |
Don't try to fix every error in one pass. Fix character names first — that's what every downstream tool will use to organise lines. Scene headings second. The rest can be cleaned as you encounter them in study.
If a long stretch of text comes out unreadable, that section probably had worse scan quality than the rest. Rescan or rephotograph just those pages and re-run OCR on the patch — it's faster than fighting a bad input line by line.
Do it in HitCue
- Automatic AI parsing: Takes sufficiently cleaned OCR text and turns it into a navigable script with acts, scenes, and character lines.
- Character resolution: Catches the names OCR mangled (broken across lines, mis-spelled, duplicated) and lets you merge them into the right character without losing any dialogue.
- Line editing: Repair the OCR slips you spot while studying — merged stage directions, missing words, broken dashes — inline, without re-uploading the whole script.
Upload your OCR'd script to HitCue, let Automatic AI parsing do the structural work, then sweep up the remaining OCR slips with Character resolution and Line editing before your first off-book pass. → Download HitCue



