What are the key takeaways from this guide?

A PDF file consists of objects: page descriptions, fonts, images, metadata, cross-references, and more.. PDF supports both JPEG and JPEG 2000 internally.. Full font embedding includes all glyphs (sometimes 10,000+ for CJK fonts).. PDF 1.5+ supports object streams, which group multiple small objects into a single compressed stream.. Linearization restructures a PDF so the first page loads immediately while remaining pages download in the background..

Who is this guide for?

This guide is designed for beginner-level users and takes about 2 minutes to read.

How-To Beginner 2 min read 391 words

PDF Compression Deep Dive: Techniques Beyond Basic Optimization

Basic PDF compression only scratches the surface. Advanced techniques like object stream compression, image resampling strategies, and font deduplication can achieve dramatically smaller files for specific document types.

Key Takeaways

A PDF file consists of objects: page descriptions, fonts, images, metadata, cross-references, and more.
PDF supports both JPEG and JPEG 2000 internally.
Full font embedding includes all glyphs (sometimes 10,000+ for CJK fonts).
PDF 1.5+ supports object streams, which group multiple small objects into a single compressed stream.
Linearization restructures a PDF so the first page loads immediately while remaining pages download in the background.

Featured Tool

Compress PDF

Reduce PDF file size by optimizing internal structure.

Try it Free

Understanding PDF Internal Structure

A PDF file consists of objects: page descriptions, fonts, images, metadata, cross-references, and more. Each object can be individually compressed or left uncompressed. Understanding which objects contribute most to file size reveals where optimization effort should be focused.

Image Compression Strategies

JPEG vs JPEG 2000

PDF supports both JPEG and JPEG 2000 internally. JPEG 2000 produces 20-30% smaller files at equivalent quality but is slower to decode. For PDFs viewed primarily on screen, JPEG 2000 is superior. For PDFs destined for print workflows with older RIPs, stick with standard JPEG.

Resolution-Based Strategy

Not all images in a PDF need the same resolution. Apply different DPI targets based on image type:

Photographs: 150 DPI for screen, 300 DPI for print
Line art and diagrams: 300 DPI minimum (lower resolutions cause jagged edges)
Logos and icons: Keep as vector when possible; if rasterized, maintain original resolution
Background images: 72-96 DPI is usually sufficient

Duplicate Image Detection

PDFs created by merging multiple documents often contain duplicate images — the same logo or header graphic embedded separately on each page. Deduplication replaces identical images with references to a single shared object, sometimes reducing file size by 30-50% in merged documents.

Font Optimization

Subsetting vs Full Embedding

Full font embedding includes all glyphs (sometimes 10,000+ for CJK fonts). Subsetting includes only the glyphs actually used in the document. A document using 50 Chinese characters from a font with 30,000 glyphs saves approximately 4 MB through subsetting.

Font Deduplication

Merged PDFs may embed the same font multiple times — once from each source document. Identifying and merging duplicate fonts can save 500 KB to 5 MB.

Object Stream Compression

PDF 1.5+ supports object streams, which group multiple small objects into a single compressed stream. This is particularly effective for documents with many bookmarks, form fields, or cross-references. Object stream compression typically saves 5-15% on top of image compression.

Linearization (Fast Web View)

Linearization restructures a PDF so the first page loads immediately while remaining pages download in the background. It doesn't reduce file size but dramatically improves perceived load time for large documents served over the web. The linearization process adds a small overhead (1-3% file size increase) but is worthwhile for any PDF over 1 MB served online.