How to Extract Text From an Image or Scanned PDF (OCR Explained)

6 min read · June 2026

Try to select a sentence in a scanned document and nothing highlights. That is because a scan is a photograph of text, not text itself — to your computer it is a grid of coloured dots that happen to look like letters. Optical Character Recognition, or OCR, is the technology that reads those dots and turns them back into real words you can copy, search and edit. This is how it works and how to get a clean result out of it.

What OCR actually does

OCR scans an image looking for shapes that match known characters. It finds the regions that contain writing, separates them into lines and individual letters, compares each shape against a model of what every character looks like, and assembles the best guesses into words and sentences. Modern OCR adds a language model on top, so when a letter is ambiguous it leans on context — recognising that "c1ear" is almost certainly "clear." The output is plain, selectable text that you can paste into a document.

When you need it

Copying text out of a screenshot, photo or scanned page.
Making a scanned PDF searchable so you can find a name or number inside it.
Digitising printed notes, receipts, or an old document into editable text.
Pulling a quote, address or code out of an image without retyping it.

How to extract the text

Open the Image to Text (OCR) tool.
Upload the image or photo containing the text — JPG, PNG and similar formats all work.
Let it process; the recognised text appears ready to copy.
Paste it into your document and skim for the occasional misread, especially in numbers and unusual names.

If your source is a scanned PDF rather than a single image, convert the pages to images first with PDF to JPG, then run each through OCR.

Getting the cleanest possible result

OCR is only as good as the image you feed it. The quality of the input decides almost everything:

Sharp and in focus. A blurry photo gives blurry guesses. If you can read every letter easily, so can the OCR.
Straight, not skewed. Rotate or de-skew a tilted scan first — crooked lines confuse the line-detection step.
Good contrast. Dark text on a light background reads best. Faded print and busy backgrounds hurt accuracy.
High enough resolution. Tiny text in a low-resolution image has too few pixels per letter to identify confidently.
Clean printed type beats handwriting. Standard fonts are recognised almost perfectly; cursive handwriting is far harder and may need correcting by hand.

Where OCR still struggles

Even a good engine has weak spots. Decorative or very stylised fonts, low-light phone snaps, multi-column layouts that get read across instead of down, and tables where alignment carries meaning can all trip it up. Treat OCR output as a strong first draft rather than a finished one: it saves you from retyping a whole page, but a quick proofread — paying special attention to digits, where a misread 0/O or 1/l matters most — is always worth the minute it takes.

OCR closes the gap between a picture of text and text you can actually use. Feed it a clean, sharp, straight image of clearly printed words and it will hand back almost exactly what is on the page — ready to copy, search and edit, no retyping required.

Pull text out of an image

Upload a photo or scan and copy the text — free and instant.

Open Image to Text

Convert PDF to Word and Keep Formatting → What is HEIC? Convert iPhone Photos → Lossy vs Lossless Compression Explained →