How OCR Works: Extracting Text from Images Explained
A complete guide to Optical Character Recognition — how Tesseract OCR works, what affects accuracy, use cases for image-to-text conversion, and tips for getting the best results from any image.
February 22, 2025
You have a scanned contract, a photo of a whiteboard, or a screenshot full of text you need to edit. Copy-pasting from an image is impossible — images store pixels, not characters. This is exactly the problem that Optical Character Recognition (OCR) solves. Once a niche technology used only in expensive enterprise software, OCR is now available free, directly in your browser, with no file uploads and no account required. This guide explains how OCR works, when to use it, and how to get the best results.
What is OCR?
Optical Character Recognition is the process of detecting and extracting text from images, converting visual representations of characters into machine-readable, editable text. When you scan a document, photograph a sign, or take a screenshot of a PDF, the result is a raster image — a grid of colored pixels. OCR analyzes that grid and identifies which pixel patterns correspond to which letters, numbers, and symbols.
OCR has existed since the 1950s, when it was used to automate postal sorting and banking. Early systems required special fonts and controlled lighting. Modern OCR engines like Tesseract use machine learning and can recognize thousands of fonts, handwriting, and text in dozens of languages with high accuracy.
How Tesseract OCR Works
Tesseract is the most accurate open-source OCR engine available. Originally developed by Hewlett-Packard in the 1980s, it was open-sourced in 2005 and has been maintained by Google since 2006. It now powers OCR in Google Docs, Google Drive, and countless third-party applications.
Tesseract processes an image in several stages:
- Binarisation — The image is converted to black and white. Pixels above a threshold become white (background); pixels below become black (text). This strips color noise and prepares the image for analysis.
- Layout analysis — Tesseract identifies regions of the image that likely contain text, separating them from images, tables, and empty space. It detects columns, paragraphs, and line boundaries.
- Line and word segmentation — Within text regions, the engine identifies individual lines and words by detecting gaps between character groups.
- Character recognition — Each segmented character is compared against trained models (LSTM neural networks in Tesseract 4+) to determine the most likely character match. The model was trained on millions of character samples across hundreds of fonts.
- Post-processing — The recognized characters are assembled into words and sentences. Tesseract uses a dictionary to improve accuracy — if the letter-level recognition is ambiguous, real words are preferred over nonsense strings.
The result is a confidence score for each recognized character. Tesseract reports an overall confidence percentage for the entire document — typically above 90% for clean, printed text and significantly lower for handwriting or poor-quality images.
Browser-Based OCR: How It Works Without a Server
Traditional OCR services upload your image to a server, process it remotely, and return the result. This creates two problems: latency (your image travels over the internet) and privacy (your document is processed on someone else's hardware).
Modern browsers support WebAssembly — a binary instruction format that lets complex native applications run directly in the browser at near-native speed. Tesseract has been compiled to WebAssembly, meaning the full OCR engine — including the LSTM neural network model — runs inside your browser tab. Your image is processed locally, never leaving your device.
The trade-off is initial load time: the first time you run OCR, the browser downloads the Tesseract WASM binary and the language model (approximately 10-20MB for English). Subsequent runs reuse the cached engine and are much faster.
What Affects OCR Accuracy?
OCR accuracy is not uniform. Several factors have a major impact on how well the engine can read your image:
Image Resolution
Resolution is the single biggest factor. OCR engines need enough pixels to distinguish character shapes. The recommended minimum is 300 DPI (dots per inch) for scanned documents. At this resolution, the vertical height of a lowercase letter is typically 20-30 pixels — enough for reliable recognition. At 72 DPI (typical screen resolution), the same letter might be 5-7 pixels, making accurate recognition difficult.
If you are photographing a document with a phone, move closer or crop tightly to the text to maximize effective resolution. Screenshots of digital text at normal zoom levels are usually high enough resolution.
Contrast
OCR requires high contrast between text and background. Black text on white paper is ideal. Problems arise with:
- Colored backgrounds (especially mid-tones)
- Shadows falling across text
- Watermarked or stamped documents where marks overlap text
- Faded or lightly printed text
- Text on textured surfaces (wood, fabric, brick)
If your image has poor contrast, try increasing contrast or converting to grayscale in any basic image editor before running OCR.
Font and Text Type
Standard printed fonts (serif, sans-serif) at 10pt or larger achieve the highest accuracy. Accuracy degrades with:
- Decorative or script fonts
- Italicized text (especially in low-resolution images)
- Very small text (below 8pt at 300 DPI)
- Text at an angle or curve (common in photos of physical objects)
- Handwriting — especially cursive
Orientation and Skew
Text should be horizontal and straight. Tesseract has some tolerance for skew (a few degrees of rotation) but performs best on level text. If your scanned document is tilted, rotating it before OCR significantly improves results. Most modern scanners and phone camera apps offer automatic deskewing.
Language Selection
OCR engines are trained on specific language datasets. Selecting the correct language tells Tesseract which character set to expect and which dictionary to use for post-processing. Running English OCR on a German document will miss umlauts (ä, ö, ü) and produce poor word-level corrections because the German words will not match the English dictionary. Always select the language that matches your document.
Practical Use Cases for OCR
Digitizing Scanned Documents
Physical documents — contracts, letters, forms, invoices — become fully searchable and editable text through OCR. This is the original and most common use case. Once digitized, documents can be indexed, searched, and archived without manual transcription.
Extracting Text from Screenshots
When you need to copy text from an image where copy-paste is disabled — a locked PDF, a web app that blocks selection, an error message captured in a screenshot — OCR extracts it in seconds. This is especially useful for technical error messages, software dialogs, and mobile app screenshots.
Processing Business Cards
Photographing a business card and running OCR extracts name, email, phone number, and address into editable text, ready to paste into a contacts app or CRM. This eliminates manual transcription of contact details.
Accessibility and Archiving
Many historical documents exist only as scans or photographs. OCR transforms them into accessible text that screen readers can read aloud for visually impaired users, and that search engines can index for researchers.
Receipts and Expense Tracking
Photographing receipts and running OCR extracts line items, amounts, and dates that can be pasted directly into expense reports or accounting software, eliminating manual data entry.
Multilingual Documents
When working with documents in foreign languages, OCR extracts the text so it can be pasted into a translation tool. This is far faster than manually typing characters from an unfamiliar alphabet.
OCR Limitations to Understand
OCR is not perfect. Understanding its limitations helps you manage expectations and prepare images appropriately:
- Handwriting accuracy is low — Tesseract is primarily trained on printed text. Clear, printed block letters may achieve 70-80% accuracy; cursive is often 40-60% or lower. Dedicated handwriting recognition engines (like those in Google Keep or Apple Notes) are purpose-trained for this use case and perform significantly better.
- Complex layouts may lose structure — Multi-column layouts, tables, and forms can have their structure flattened during extraction. The text is present but the column or cell alignment may not be preserved.
- Mathematical and scientific notation is difficult — Superscripts, subscripts, fractions, and special symbols (∑, ∫, ≤) are poorly handled by general-purpose OCR.
- Right-to-left languages require correct settings — Arabic and Hebrew read right to left. Select the correct language to ensure the engine handles text direction properly.
Privacy: Why Browser-Based OCR Matters
Many OCR services — including free cloud-based tools — upload your documents to remote servers for processing. For everyday images this is fine, but for sensitive content it creates real risk: legal documents, financial records, medical reports, and personal identification documents should not be uploaded to third-party servers you do not control.
Browser-based OCR eliminates this concern entirely. The image is processed locally in your browser; nothing is transmitted. This makes it safe to use with confidential documents, and it also means the tool works without an internet connection once the page has loaded.
Getting the Best Results: A Quick Checklist
- Use the highest resolution image available — aim for at least 300 DPI for scans
- Ensure strong contrast between text and background
- Straighten the image before processing — no tilt or skew
- Select the correct language from the dropdown
- Crop tightly to the text area to reduce noise from surrounding content
- Avoid heavy shadows, glare, or reflections
- For handwritten text, print clearly in block capitals for best results
Summary
OCR converts images of text into machine-readable, editable text. Modern engines like Tesseract use LSTM neural networks trained on millions of character samples across hundreds of fonts, achieving 95%+ accuracy on clean, printed documents. The same engine now runs directly in your browser via WebAssembly, making high-quality OCR available for free, with no uploads, no accounts, and complete privacy. For best results, use high-resolution, high-contrast images and select the language that matches your document.