Upload any PDF and our engine detects exact, text-identical, and visually similar duplicate pages with confidence scoring. Preview every group, choose which copy to keep, and download your optimized PDF in seconds.
Drop your PDF to find duplicate pages
Click to browse or drag & drop
Analyzing document…
Loading PDF library
Hash + text comparison. Best for most PDFs.
Select groups above to begin
Choose which pages to keep, then download your optimized PDF.
PDF Optimized Successfully!
Duplicate pages removed. Download started automatically.
Something went wrong. Please try again.
Four steps. Under two minutes. Files never leave your browser.
Drag and drop any PDF or click to browse. Works entirely in your browser — no server, no upload, complete privacy.
Select Fast for instant exact-duplicate detection, Balanced for text-identical pages, or Deep Scan to catch visually similar and scanned duplicates.
Preview every duplicate group with page thumbnails and confidence scores. Click any page for side-by-side comparison, then choose which copy to keep.
Click Download and get your deduplicated PDF instantly. No watermarks, no quality loss. Your original file on disk is untouched.
Fast (pixel hash), Balanced (text comparison), and Deep Scan (perceptual hash) modes give you full control over detection accuracy vs speed.
Every duplicate gets a confidence percentage (100% for exact matches, down to 80% for visually similar pages) so you know exactly how certain the detection is.
Pages are grouped by similarity using union-find algorithms, so a page that appears 10 times is shown as a single group — not 45 separate pairs.
Click any thumbnail to open a full-resolution side-by-side comparison modal. Visually verify duplicates before making removal decisions.
PDF.js and pdf-lib run entirely inside your browser. Your documents never leave your device — no uploads, no cloud, no tracking, no storage.
Fast mode uses FNV hashing to scan hundreds of pages in seconds. Deep Scan's dHash comparison processes 500-page PDFs in under 2 minutes.
Everything you need to know about why duplicate pages appear in PDF documents and how our three-stage detection engine finds and removes them without touching the rest of your file.
Duplicate pages sneak into PDF documents more often than most people realise. The most common cause is merging files — when you combine two PDFs that share a cover page, a terms-and-conditions section, or a repeating template, every shared page appears twice in the merged output. This is especially common in legal, financial, and academic workflows.
Scanning errors are another major source. An automatic document feeder occasionally pulls the same sheet twice, producing visual near-duplicates that look almost identical but differ at the pixel level due to scanner noise. Standard duplicate finders miss these — PDFcrest's Deep Scan mode catches them.
Template reuse in assembled reports and proposals means a disclaimer, executive summary, or appendix can appear verbatim in multiple included sections. Even converting a Word document to PDF can re-insert section headers or footers as full pages across a multi-section document.
When you remove duplicate pages, PDFcrest uses pdf-lib to copy each remaining page's raw PDF stream directly into a new document. This is not a re-render — no page is redrawn, re-compressed, or converted to an image. Every font, vector graphic, form field, annotation, link, and embedded metadata entry survives intact.
The output file is structurally identical to your original except the unwanted pages are gone. File size shrinks proportionally. No watermarks are added. No quality is lost.
Renders each page at low resolution and computes an FNV-1a pixel hash. If two hashes match, the pages are identical down to every pixel — 100% confidence, false-positive rate of zero. Processes 500 pages in under 30 seconds.
Use when: merging digital PDFs, Word-to-PDF exports, programmatically assembled documents.
Adds text extraction and normalisation via PDF.js. Two pages with the same text content are detected as duplicates even if their fonts, font sizes, or margins differ — 97% confidence. Catches reformatted legal clauses, template-derived pages, and style-changed sections.
Use when: legal contracts, academic papers, business reports assembled from templates.
Adds a perceptual difference hash (dHash) — a 64-bit visual fingerprint that captures page structure while being resistant to scanner noise, JPEG compression, and slight brightness variation. Pages are compared by Hamming distance; your sensitivity slider controls the threshold. 80–95% confidence depending on similarity.
Use when: scanned physical documents, faxed files, photo-captured PDFs, or documents with OCR artefacts.
PDFcrest's duplicate page remover is built on PDF.js and pdf-lib, two industry-standard open-source libraries that run directly inside your browser. When you upload a PDF, it is read into browser memory and never transmitted over the internet. Your document exists only on your device.
Free, private, and instant. No signup. No uploads. No limits.