PDF Compression Deep Dive: Techniques Beyond Basic Optimization
Basic PDF compression only scratches the surface. Advanced techniques like object stream compression, image resampling strategies, and font deduplication can achieve dramatically smaller files for specific document types.
Key Takeaways
- A PDF file consists of objects: page descriptions, fonts, images, metadata, cross-references, and more.
- PDF supports both JPEG and JPEG 2000 internally.
- Full font embedding includes all glyphs (sometimes 10,000+ for CJK fonts).
- PDF 1.5+ supports object streams, which group multiple small objects into a single compressed stream.
- Linearization restructures a PDF so the first page loads immediately while remaining pages download in the background.
Compress PDF
Reduce PDF file size by optimizing internal structure.
Understanding PDF Internal Structure
A PDF file consists of objects: page descriptions, fonts, images, metadata, cross-references, and more. Each object can be individually compressed or left uncompressed. Understanding which objects contribute most to file size reveals where optimization effort should be focused.
Image Compression Strategies
JPEG vs JPEG 2000
PDF supports both JPEG and JPEG 2000 internally. JPEG 2000 produces 20-30% smaller files at equivalent quality but is slower to decode. For PDFs viewed primarily on screen, JPEG 2000 is superior. For PDFs destined for print workflows with older RIPs, stick with standard JPEG.
Resolution-Based Strategy
Not all images in a PDF need the same resolution. Apply different DPI targets based on image type:
- Photographs: 150 DPI for screen, 300 DPI for print
- Line art and diagrams: 300 DPI minimum (lower resolutions cause jagged edges)
- Logos and icons: Keep as vector when possible; if rasterized, maintain original resolution
- Background images: 72-96 DPI is usually sufficient
Duplicate Image Detection
PDFs created by merging multiple documents often contain duplicate images — the same logo or header graphic embedded separately on each page. Deduplication replaces identical images with references to a single shared object, sometimes reducing file size by 30-50% in merged documents.
Font Optimization
Subsetting vs Full Embedding
Full font embedding includes all glyphs (sometimes 10,000+ for CJK fonts). Subsetting includes only the glyphs actually used in the document. A document using 50 Chinese characters from a font with 30,000 glyphs saves approximately 4 MB through subsetting.
Font Deduplication
Merged PDFs may embed the same font multiple times — once from each source document. Identifying and merging duplicate fonts can save 500 KB to 5 MB.
Object Stream Compression
PDF 1.5+ supports object streams, which group multiple small objects into a single compressed stream. This is particularly effective for documents with many bookmarks, form fields, or cross-references. Object stream compression typically saves 5-15% on top of image compression.
Linearization (Fast Web View)
Linearization restructures a PDF so the first page loads immediately while remaining pages download in the background. It doesn't reduce file size but dramatically improves perceived load time for large documents served over the web. The linearization process adds a small overhead (1-3% file size increase) but is worthwhile for any PDF over 1 MB served online.