Miano – Formats de fichier d'image compressée – JPEG, PNG, GIF, XBM, BMP (ACM, 1999) – comment ecrire un livre avec word

  • Image compressée

    Formats de fichierJPEG, PNG, GIF, XBM, BMP

    Votre guide sur les fichiers graphiques sur le Web

    John Miano

  • Formats de fichiers image compressés

  • Un grand nombre des désignations utilisées par les fabricants et les vendeurs pour distinguer leurs produits sont reconnues comme des marques. Lorsque ces désignations apparaissent dans ce livre et que Addison-Wesley était au courant d'une revendication de marque, les désignations ont été imprimées en majuscules ou en majuscules. L'auteur et l'éditeur ont pris soin de préparer ce livre, mais ne l'ont nullement exprimé ou implicite. garantie d'aucune sorte et n'assume aucune responsabilité pour les erreurs ou les omissions. Aucune responsabilité n'est assumée pour les dommages fortuits ou indirects liés à ou découlant de l'utilisation des informations ou des programmes contenus dans les présentes. L'éditeur propose des remises sur ce livre lorsque commandé en quantité pour des ventes spéciales.Pour plus d'informations, veuillez contacter:

    Groupe des sociétés, des gouvernements et des ventes spéciales Addison Wesley Longman, Inc.One Jacob WayReading, Massachusetts 01867

    Données de catalogage avant publication de la Bibliothèque du CongrèsMiano, John, 1961-

    Formats de fichier d'image compressée: JPEG, PNG, GIF, XBM, BMP / par John Mianop. cm.

    Comprend des références bibliographiques.ISBN 0-201-60443-4. ISBN 0-201-61657-2 (CD-ROM) 1. Traitement d'imagesProgrammes informatiques. 2. Compression de données (Computerscience) 3. Programmation informatique. 4. Organisation du fichier (informatique) I. Titre

    TA1637.M53 1999005.74'6dc21 99-15179

    CIP

    Copyright 1999 par ACM Press, une division de l’Association for Computing Machinery, Inc. (ACM). Tous droits réservés. Aucune partie de cette publication ne peut être reproduite, stockée dans un système d'extraction de données ou transmise, sous quelque forme ou par quelque moyen que ce soit, électronique, mécanique, par photocopie, enregistrement ou autrement, sans le consentement préalable de l'éditeur. Imprimé aux États-Unis d'Amérique. Publié simultanément au Canada.

    ISBN 0-201-60443-4Texte imprimé sur du papier recyclé et sans acide.1 2 3 4 5 6 7 8 9MA04 03 02 01 00 99Première impression, juillet 1999

  • Contenu

    Chapitre 1

    Chapitre 2

    chapitre 3

    Chapitre 4

    Préface ixRéconnaissances xi

    IntroductionLa représentation des images 1Vecteurs graphiques bitmap et modèles 3Color 5Couleur vraie et palette 9Compression Classement de 10 octets et bits 13Qualisation de la couleur 16A Format d'image commun 18Conclusion 21

    Windows BMPData Commande 23 Structure du fichier 24Compression 28Conclusion 29

    Format XBMFile 31Lecture et écriture de fichiers XBM 33Conclusion 36

    Introduction aux modes de compression JPEGJPEG 36Quelle partie de JPEG sera couverte dans ce livre? 39Que sont les fichiers JPEG? Format de fichier 40SPIFF Commande de 40 octets 41

    1

    23

    31

    35

    V

  • Table des matièresVI

    Chapitre 5

    Chapitre 6

    Chapitre 7

    Chapitre 8

    Chapitre 9

    Fréquence d'échantillonnage 41Opération JPEG: 44Scans enchevêtrés et non entrelacés 45Conclusion 46

    Format de fichier JPEGMarqueurs 47Données compressées 49Types de marqueurs 49Format JIFF 55 Conclusion 57

    Codage humain JPEG Fréquences d'utilisation 61 Exemple de codage Huffman 63 Codage Huffman à l'aide de longueurs de code 65 Codage Huffman en JPEG 71Limitation des longueurs de code 72Décodage des codes de Huffman 73Conclusion 75

    TransformDCT discret en une seule dimension 78DCT en deux dimensions 84 Opérations sur la matrice de base 85Utilisation du DCT bidirectionnel 87Quantization 88Zigzag Commande 89Conclusion 90

    Décodage des images JPEG en mode séquentiel JPEG Dimensions 91 Unités de données de décodage 94 Exemple de décodage 97Processus de traitement des coefficients DCT 98Sup-échantillonnage 99Traitement du traitement de repérage des marqueurs 99Aperçu du décodage JPEG 100Conclusion 100

    Création de fichiers JPEG séquentiels Paramètres de compression 105 Structure de fichier de sortie 111 Opérations de codage 111Down-échantillonnage 112 Entrelacement 113 Codage d'unités de données 115 Génération de tables de Huffman 117Conclusion 119

    47

    61

    77

    91

    105

  • Table des matières V I I

    Chapitre 10

    Chapitre 11

    Chapitre 12

    Chapitre 13

    Chapitre 14

    Optimisation de la DCTFactorisation de la matrice DCT 121Arithmétique en nombres entiers échelonnés 137La quantification par fusion et la conclusion DCT 138

    JPEG Progressif Division de composant en JPEG progressif 149Processing des fichiers JPEG progressifs 151Processing des analyses progressives 152MCUs en analyses progressives 153 Tableaux de Huffman en analyses progressives 153 Décodage d'unités de données 154Préparation de la création de fichiers JPEG progressifs 160Encodage Analyses progressives 162Huffman Codage 162Décodage de données 162

    Ordre de GIFByte 172Structure de fichier 172Interlacing 178Format de données compressé 178GIF animé 186Défauts juridiques 187GIF non compressé 188Conclusion 188

    PNGHistorique 190Commande des fichiers 190 Format de fichier 190 Organisation des fichiers 195Représentation de la couleur en PNG 195Paramètre indépendant de la couleur 197Gamma 201Intercalaire 202Petits morceaux critiques 203Conclusion des morceaux non critiques 206

    Décompression des données d'image PNGDécompression des données d'image 215Hackman Codage in Deflate 221Format compressé des données 222Blocs de données compressés 223Écriture des données décompressées dans l'image 227Conclusion 237

    121

    149

    171

    189

    215

  • Table des matièresviii

    Chapitre 15 Création de fichiers PNGVue d'ensemble 233Processus de compression en profondeur 234Génération de table de Huffman 238Filtration 241Conclusion 243Glossaire 246Bibliographie 249Index 253

    233

  • Préface

    Le but de ce livre est d’expliquer au lecteur comment écrire des logiciels permettant de lire et d’écrire des fichiers en utilisant divers formats d’image 2D. Je voulais écrire un livre expliquant les formats de fichier les plus fréquemment utilisés avec suffisamment de profondeur pour que le lecteur puisse les implémenter, par opposition à un format couvrant de nombreux formats différents à un niveau élevé ou évitant les formats d'image plus difficiles. En conséquence, j'ai choisi de couvrir les formats de fichier image associés aux navigateurs Web. Ceux qui sont abordés dans ce livre (BMP, XBM, JPEG, GIF et PNG) représentent la grande majorité des fichiers d’images pouvant être trouvés sur Internet. Ils utilisent un large éventail de techniques de codage et varient dans des difficultés de mise en œuvre allant du simple au très complexe.

    L'inspiration pour ce livre était ma propre frustration résultant du manque d'informations sur la façon de mettre en œuvre des encodeurs et des décodeurs pour les formats de fichiers plus complexes. La plupart des informations disponibles étaient à un niveau trop élevé, constituaient des lacunes ou étaient très difficiles à déchiffrer. J'ai essayé de créer un pont entre le programmeur et les documents de normes.

    Au début de ce projet, le problème était de savoir quel langage de programmation utiliser pour les exemples. L'intention était de créer un livre sur les formats de fichiers graphiques plutôt que sur la façon d'écrire des programmes pour lire et écrire des fichiers graphiques dans une langue particulière. Par conséquent, j’ai débattu en utilisant un langage facile à lire (par exemple, Pascal ou Ada) ou celui que la plupart des gens sont susceptibles d’utiliser (C ++). Au final, j’ai estimé que son utilisation répandue faisait du C ++ le meilleur choix. Pour rendre les exemples plus compréhensibles pour les programmeurs non-C ++, j'ai soigneusement évité certaines constructions de langage C ++ (par exemple, des expressions avec des effets secondaires et une interchangeabilité intégrales / booléennes) qui rendraient le code difficile à comprendre.

    IX

  • Afin de rendre les processus de codage et de décodage aussi clairs que possible, j'ai utilisé un pseudo-code de type Pascal. C ++ est utilisé pour des implémentations fonctionnelles complètes et un pseudo-code pour des fragments illustratifs. Ces fragments ne contiennent généralement aucune vérification d'erreur.

    En raison de leur taille généralement importante, il n'était pas possible d'inclure du code source de travail pour les formats dans le livre lui-même. Le CD-ROM d'accompagnement contient le code source complet des codeurs et décodeurs pour la plupart des formats d'image couverts.1 Le lecteur doit utiliser le pseudo-code contenu dans le texte pour apprendre le fonctionnement des processus et les exemples C ++ sur le CD pour savoir comment procéder. les mettre en œuvre.

    Généralement, les décodeurs implémentent plus de fonctionnalités que les encodeurs. Dans les décodeurs, j'ai implémenté toutes les fonctionnalités nécessaires pour décoder les fichiers qu'un lecteur risque de rencontrer sur Internet. Par souci de clarté, les codeurs implémentent généralement un sous-ensemble de fonctionnalités plus petit.

    En rédigeant les exemples de programmation, j'ai donné plus de clarté à l'efficacité de l'exécution et à la portabilité immédiate. Les exemples de source compileront, sans modifications, Microsoft Windows en utilisant à la fois BorlandC ++ Builder V3.0 et Microsoft Visual C ++ V5.0. Les autres compilateurs nécessitent généralement quelques modifications du code.

    Les descriptions des codeurs et des décodeurs pour les différents formats de fichiers utilisent fréquemment le terme "utilisateur" pour décrire la source de certains paramètres d'entrée dans le processus de codage ou de décodage. J'entends par cela l'utilisateur du codeur ou du décodeur, pas nécessairement la personne qui tape au clavier. Étant donné que les codeurs et les décodeurs d'images sont intégrés à d'autres applications, telles que les visualiseurs d'images et les éditeurs, l'utilisateur dans ce cas serait probablement un autre logiciel. . Cependant, dans de nombreuses situations, l'application "utilisateur" peut obtenir certains de ces paramètres directement d'un humain.

    Tout comme il ne s’agit pas d’un livre sur la programmation C ++, c’est aussi un livre sur la programmation dans un environnement spécifique. Pour cette information, les lecteurs auront besoin d'un livre pour leur système particulier.

    1La malheureuse exception est le GIF en raison de problèmes juridiques.

    Préfacex

  • Remerciements

    Un projet aussi vaste que la production d'un livre nécessite la participation de nombreuses personnes. Mike Bailey, Eric Haines, Tom Lane, Shawn Neely et Glenn Randers-Pehrson ont examiné le manuscrit et ont formulé de nombreuses suggestions inestimables. Glenn a également pris des dispositions pour que je puisse obtenir les dernières normes de PNG proposées pour le CD. Mon collègue aviateur, Charlie Baumann, a eu la gentillesse de fournir plusieurs des photographies. Ralph Miano et Margaret Miano ont aidé à préparer le manuscrit. Jean-Loup Gailley a répondu à toutes mes questions sur ZLIB. Albert "The Chipster" Copper a compilé des exemples sur des systèmes auxquels je n’avais pas accès. Plus important encore, Helen Goldstein chez AWL a guidé le processus du début à la fin.

    John M. MianoSummit, New Jersey

    miano @ colosseumbuilders. com

    XI

  • Dans ce chapitre, nous abordons les aspects fondamentaux des formats de fichier image. Vous allez découvrir ici les images bitmap, les méthodes utilisées pour afficher des images, la représentation de la couleur et les méthodes de compression.

    La représentation des images

    Dans la plupart des affichages d'ordinateur, l'image à l'écran est composée d'unités distinctes appelées pixels. Chaque pixel occupe une petite zone rectangulaire à l’écran et n’affiche qu’une couleur à la fois. Les pixels sont disposés de manière à former un tableau à 2 dimensions.

    Les objets sont dessinés à l'écran en ajustant la couleur de pixels individuels. La figure 1.1 illustre un triangle idéal et un triangle divisé en pixels. La représentation en pixels présente des bords irréguliers et n’est pas très agréable à regarder. Les pixels sont généralement stockés sur un périphérique d’affichage, moins les bords irréguliers sont visibles.

    Au fil des ans, le nombre de pixels affichés sur les moniteurs de PC a considérablement augmenté. Il n'y a pas si longtemps, les affichages 640 x 480 (307 200 pixels) étaient standard. Maintenant, les résolutions d'écran de 1024 x 768 (786 432), 1280 x 1024 (1 310 720) et même plus sont courantes. La quantité de mémoire vidéo et les capacités de

    Figure 1.1Idéal Image et image Pixel

    Chapitre 1

    introduction

    1

  • le moniteur et la carte vidéo limitent le nombre de pixels qu’un système informatique doit afficher.

    La figure 1.2 illustre les composants d’un système vidéo typique. Le framebuffer est un bloc de mémoire vidéo qui contrôle la couleur de chaque pixel sur le moniteur. Chaque pixel a un emplacement de mémoire correspondant, allant généralement de 1 à 32 bits. Sur de nombreux systèmes, la mémoire vidéo peut être lue et écrite comme n'importe quel autre emplacement de mémoire. Une application peut changer la couleur affichée sur le moniteur en changeant simplement une valeur de mémoire. Le contrôleur vidéo convertit les valeurs dans la mémoire tampon d'images en un signal pouvant être affiché par le moniteur.

    Les imprimantes informatiques sont également utilisées pour afficher des images. De nos jours, la plupart des imprimeurs utilisent un mécanisme similaire à celui des systèmes vidéo. Ils divisent l'image en un nombre de pixels, mais avec une densité bien supérieure à celle d'un écran d'ordinateur, généralement 300 ou 600 pixels par pouce pour une imprimante de bureau. Les imprimantes conçues pour les applications de composition utilisent des densités encore plus élevées. L’imprimante contient une mémoire analogue au tampon de trame, à la différence près que les données sont transmises via un câble série ou parallèle plutôt que directement par le bus système. L'image est construite dans la mémoire de l'imprimante puis écrite sur la page imprimée.

    Toutes les imprimantes ne fonctionnent pas en mappant la mémoire en pixels. Les traceurs utilisés pour la rédaction et d'autres travaux d'ingénierie ont des stylos contrôlés par des commandes, telles que tirer une ligne d'un point à un autre, dessiner une ellipse dans un rectangle spécifié, un texte ordraw utilisant une police spécifiée à un endroit donné.1

    1Auparavant, les écrans d'ordinateur pour les graphiques fonctionnaient également de cette façon.

    Figure 1.2Système vidéo simple

    Introduction2

  • Graphiques vectoriels et bitmap

    Graphiques vectoriels et bitmap

    Tout comme les dispositifs d’affichage ont deux méthodes générales de fonctionnement, les formats de fichiers graphiques peuvent être divisés en deux classes générales, vecteur et bitmap.2 Les formats Vectorgraphics utilisent une série de commandes de dessin pour représenter une image. Le métafichier AWindows est un format de graphique vectoriel couramment utilisé. La figure 1.3 contient une image vectorielle simple créée à l'aide de commandes permettant de dessiner deux arcs et un arectangle.

    Les formats de graphiques vectoriels ne se limitent pas aux périphériques de sortie, tels que les traceurs, qui créent des images à l'aide de commandes de dessin. Les moniteurs et imprimantes laser ont généralement un logiciel qui convertit les commandes vectorielles en pixels.

    Les graphiques vectoriels présentent deux inconvénients majeurs. Premièrement, ils ne sont pas adaptés à la reproduction de photographies ou de peintures. Une peinture telle que WhistlersMother nécessiterait des dizaines de milliers de commandes de dessin, tout simplement la détermination des commandes à utiliser pour représenter la peinture constituerait une tâche monumentale. Deuxièmement, l'affichage des images complexes prend beaucoup de temps. Sur la plupart des systèmes d'affichage, chaque objet vectoriel doit être converti en une image en pixels.

    Tous les formats d'image abordés dans ce livre sont des formats d'image bitmap. De tels formats représentent les images sous forme de tableaux bidimensionnels où chaque élément de tableau représente une couleur à afficher à un emplacement spécifique. Lorsqu'il est affiché sur un écran d'ordinateur, chaque élément est généralement mappé sur un seul pixel d'écran. Si les pixels sont suffisamment proches sur le périphérique d’affichage, il devient difficile pour l’œil humain de détecter la structure de matrice qui compose l’image.

    Le plus grand avantage des images bitmap est leur qualité. Comme la quantité d'espace disque et de mémoire a augmenté parallèlement à la vitesse des processeurs, l'utilisation des images bitmap s'est également étendue. L’un des exemples les plus visibles de cela est celui de l’industrie du jeu vidéo. Actuellement, même les jeux nécessitant des performances élevées, tels que les simulateurs de vol et les shoot-em-ups, utilisent des bitmapgraphics. Comparez les graphismes de jeux comme Quake et Doom aux graphiques vectoriels de Tank ou même aux graphismes de Death Star du film Star Wars original.

    Un inconvénient majeur des images bitmap est la quantité de données nécessaire pour les conserver. La taille d'une image en octets (sans compter la surcharge) est

    Ainsi, une image 800 x 600 avec 24 bits par pixel nécessite 1 440 000 octets de mémoire à afficher ou d'espace disque à stocker. Au fur et à mesure que la quantité de mémoire sur les ordinateurs a augmenté, le nombre et la taille des images pouvant être affichées en même temps a également augmenté.

    Le format graphique 2Raster est un synonyme courant du format graphique bitmap.

    3

    Figure 1.3Image vectorielle simple

    largeur hauteur bits par pixel + 78

  • introduction

    en même temps. La compression est généralement utilisée pour réduire l'espace qu'un fichier image occupe sur un disque, permettant ainsi de stocker un plus grand nombre d'images.

    Un autre inconvénient des images bitmap est qu’elles sont dépendantes de la taille et ne conviennent pas à une édition en profondeur. Avec les formats vectoriels, il est facile pour les programmes de dessin d'ajouter, de supprimer et de modifier des éléments individuels. Il est également simple d’effectuer des transformations telles que la perspective, l’agrandissement et la mise à l’échelle d’une image vectorielle.

    Avec les images bitmap, même changer la taille pose des problèmes. Pour les réduire, il faut jeter des informations. les agrandir en produisant des effets de blocage. La figure 1.4 illustre le problème posé par l’augmentation de la taille d’une image bitmap en dupliquant des pixels. Des techniques de lissage existent pour améliorer l’apparence des images redimensionnées.

    Le tableau 1.1 résume les avantages des graphiques vectoriels et bitmap. La chose importante à noter est qu’aucune des méthodes n’est meilleure que l’autre mais qu’elles ont simplement des utilisations différentes. En fait, certaines applications utilisent une combinaison de deux techniques.

    Figure 1.4 Image Bitmap et agrandissement

    Tableau 1.1Bitmap Graphicsversus VectorGraphics

    Vitesse d'affichageQualité d'imageUtilisation de la mémoireFacilité à éditerIndépendance d'affichage

    Bitmap GraphicsXX

    Graphiques vectoriels

    XXX

    4

  • Modèles de couleur

    Modèles de couleur

    Dans la section précédente, nous avons vu que dans une image bitmap, chaque pixel a une valeur qui spécifie sa couleur. Alors, comment une valeur numérique est-elle traduite en une couleur?

    Il y a plusieurs façons de représenter les couleurs numériquement. Un système de représentation des couleurs s'appelle un modèle de couleur. Les modèles couleur sont généralement conçus pour tirer parti d'un type particulier de dispositif d'affichage.

    Sur la plupart des moniteurs couleur, il existe trois luminophores (rouge, vert et bleu), ou émetteurs de lumière, pour chaque pixel. Le réglage de l'intensité des différents phosphores permet de contrôler la couleur du pixel. Lorsque les trois luminophores sont à leur intensité minimale, le pixel apparaît en noir. À l'intensité maximale, le pixel apparaît en blanc. Si le phosphore rouge est le seul actif, le pixel apparaît en rouge. Lorsqu'ils sont allumés et que les phosphores verts sont allumés, ils se combinent pour produire des nuances de jaune. Lorsque les trois luminophores ont une intensité maximale, le pixel apparaît en blanc.

    Le modèle de couleur le plus couramment utilisé dans les applications informatiques est connu sous le nom de RVB (rouge-vert-bleu). Le modèle RVB imite le fonctionnement des affichages sur ordinateur. En RVB, les couleurs sont composées de trois valeurs composantes qui représentent les intensités relatives de rouge, de vert et de bleu. La figure 1.5 illustre la relation entre les couleurs dans le modèle de couleur RVB. La gamme de couleurs pouvant être représentée par un modèle acolor est appelée un espace de couleurs. Dans la Figure 1.5, l’espace colorimétrique RVB est le cube dans le diagramme.

    Dans les discussions mathématiques sur la couleur, les valeurs des composants sont souvent représentées sous forme de nombres réels normalisés dans la plage de 0,0 à 1,0. Dans les formats de programmation et d’image, les valeurs de composant entier non signé sont presque toujours utilisées. La plage de valeurs d'une composante de couleur est déterminée par la précision de l'échantillon, qui correspond au nombre de bits utilisés pour représenter une composante. Pour photographique

    Figure 1.5RGB Modèle de couleur

    5

  • introduction

    images, 8 est la précision d'échantillon la plus couramment utilisée. Cependant, 1, 2, 4, 12 et 16 sont également courants.

    Les valeurs de composant entier peuvent aller de 0 à 2Sample Precision – 1. Pour convertir la représentation en nombre réel normalisé d'une couleur en représentation entière, il vous suffit de multiplier par 2Sample Precision – 1.

    Sous Windows, la précision de l'échantillon est presque toujours de 8 bits. Le système de fonctionnement (mais pas nécessairement le matériel sous-jacent) reconnaît 256 nuances différentes de chaque couleur primaire. D'autres systèmes peuvent utiliser une précision d'échantillon plus grande ou plus petite.

    Niveaux de grisCertains périphériques d'affichage, tels que les imprimantes laser, ne peuvent afficher aucune couleur, à part les nuances de gris. Celles-ci sont appelées périphériques en niveaux de gris. Les nuances de gris peuvent être représentées par un seul composant compris entre 0 (noir) et 2Sample Precision – 1 (blanc). La figure 1.5 montre que des nuances de gris apparaissent dans le modèle RVB le long de la ligne où R = G = B.

    YCbCr Color ModelRGB n'est pas le seul modèle de couleur utilisé. À une certaine époque, le modèle de couleur HSB (teinte-saturation-luminosité) était couramment utilisé dans les systèmes informatiques et est encore utilisé par certaines applications de traitement d'images. Les images JPEG sont presque toujours stockées dans un espace colorimétrique à trois composants appelé YCbCr. La composante Y, ou lumi-nance, représente l'intensité de l'image. Cb et Cr sont des composants de chrominance. Cb spécifie le bleu de l'image et donne le rougissement.

    Le modèle de couleur YCbCr est similaire à celui utilisé dans les téléviseurs, ce qui permet aux images couleur d'être compatibles avec les jeux en noir et blanc. Dans le modèle YCbCrcolor, le composant Y en lui-même est une représentation en niveaux de gris de l'image couleur.

    La relation entre les modèles YCbCr et RGB utilisés en JPEG est représentée dans l'équation 1.1.

    La figure 1.6 illustre une image couleur séparée en ses composants Y, Cb et Cr. Vous pouvez voir que le composant Y contribue le plus d'informations à l'image. Contrairement au modèle de couleur RVB, où tous les composants sont approximativement

    Équation 1.1YCbCr / RGBColor EspaceConversion

    Y = 0.299R + 0.587G + 0.114BCb = -0.1687R – 0.3313G + 0.5B + ​​2 Précision d'échantillon / 2Cr = 0.5R – 0.4187G – 0.0813B + 2 Précision d'échantillon / 2

    R = Y + 1,402CrG = Y – 0,34414 (précision d'échantillon Cb-2/2) -0,71414 (précision d'échantillon Cr-2/2) B = Y + 1,722 (précision d'échantillon Cb-2/2)

    6

  • Modèles de couleur

    Figure 1.6Color ImageSéparé dans les composants ItsY, Cb et Cr

    7

  • introduction

    De la même manière, YCbCr concentre les informations les plus importantes dans un composant. Cela permet d'obtenir une compression plus importante en incluant plus de données à partir du composant Y que des composants Cb et Cr.

    Modèle de couleur CMJN Un autre modèle de couleur qu'il convient de mentionner à ce stade est un modèle à quatre composants appelé CMYK (cyan, magenta, jaune, noir), qui est fréquemment utilisé pour l'impression en couleur. La plupart des impressions sont effectuées sur du papier blanc avec de l'encre ajoutée aux couleurs des marqueurs créés. C'est le contraire de ce qui se passe sur un moniteur d'ordinateur. L'espace colorimétrique CMYK suit le modèle d'impression. Les composants représentent les quatre encres couramment utilisées en impression couleur.

    Les modèles de couleur que nous avons examinés jusqu’à présent sont appelés additifs, ce qui signifie que les composants ajoutent de la lumière à l’image. Plus les valeurs des composants sont élevées, plus la couleur est proche du blanc. Cependant, en CMJN, les valeurs des composants les plus grandes représentent des couleurs proches du noir. Ceci est connu comme soustractif. Le cyan, le magenta et le jaune sont les compléments du rouge, du bleu et du vert. Une pure surface cyan absorbe toute la lumière rouge qui lui est dirigée. Si les encres jaune et magenta sont combinées, elles absorbent les lumières verte et bleue, ce qui produit du rouge. Le cyan, le magenta et le jaune se combinent pour absorber toute la lumière, ce qui aboutit à la théorie du noir, de toute façon.

    En pratique, les encres cyan, magenta et jaune ne se combinent pas sur un morceau de papier blanc pour produire un noir pur. Même si vous pouviez obtenir une bonne nuance de noir en combinant des couleurs, il faudrait trois fois plus d'encre que d'utiliser simplement du noir. Comme la plupart des impressions sont en noir de toute façon, il est judicieux d’utiliser blackink pour produire du noir et des nuances de gris.

    Sur un écran d'ordinateur, la relation entre RVB et CMJN peut être approximée comme le montre l'équation 1.2.

    Équation 1.2CMYK / RGBColorspaceConversion

    K = (2 Précision de l'échantillon / 2 – 1) – MAX (R, V, B) C = (2Précision de l'échantillon / 2 – 1) – R – KY = (2 Précision de l'échantillon / 2 – 1) – G – K

    M = (2 Précision d'échantillon / 2 – 1) -B -KR = (2 Précision d'échantillon / 2 – 1) – K – CG = (2 Précision d'échantillon / 2 – 1) – K – YB = (2 Précision d'échantillon / 2 – 1) – K – M

    Lorsque les valeurs des composantes C, M et Y sont égales, la couleur est une nuance de gris. Notez que la conversion de RVB en CMJN remplace les encres cyan, magenta et jaune avec des nuances de gris produites par l’encre noire. Le résultat de cette substitution est qu'au moins un des composants CMJ sera toujours égal à zéro si ce processus de conversion est suivi exactement comme indiqué ici. Le modèle de couleur CMJN n’exige pas que la valeur d’un composant soit égale à zéro, c’est simplement un résultat

    8

  • True Color versus Palette

    de convertir de RVB. Les applications qui utilisent le modèle de couleur CMJN autoriseront toute combinaison de valeurs de composant pour permettre un contrôle complet des couleurs imprimées et permettre des variations entre différents types d’encre.

    Une autre chose à noter à propos du modèle de couleur CMJN est qu’il n’ya pas de correspondance un par un entre ce modèle et le RVB. Au lieu de cela, plusieurs valeurs CMJN sont mappées sur la même valeur RVB.

    True Color versus Palette

    Les exemples de ce livre supposent que le périphérique de sortie utilise le modèle de couleur RVB pour afficher les images et que chaque composant est représenté avec 8 bits et est compris entre 0 et 255.3 Il s'agit de la représentation des couleurs utilisée par la plupart des systèmes d'ordinateur personnel. Un tel système peut produire 16 777 216 (2563) couleurs distinctes. Il existe des systèmes informatiques qui utilisent plus de bits pour représenter la couleur, par exemple l’échelle de gris de 12 bits fréquemment utilisée pour les images médicales. Certains formats d’image prennent en charge les données avec plus de 8 bits par composant (12 pour JPEG, 16 pour PNG). Dans la suite de la discussion, nous supposerons que vous travaillez sur un système utilisant 8 bits par composant.

    Deux méthodes sont couramment utilisées pour attribuer l'une des couleurs possibles à apixel. Le plus simple consiste à stocker la valeur de couleur pour chaque pixel dans le fichier compressé. Pour les images avec 24 bits par pixel, chaque pixel a une valeur de couleur de 3 octets qui lui est associée. Les images qui utilisent 24 bits ou plus sont appelées couleurs vraies car, dans la plage de couleurs qu'un moniteur peut afficher, 24 bits par pixel sont la limite de différences de couleurs qu'un humain peut distinguer.

    Le problème des graphiques 24 bits est que, même si un système peut afficher 16 777 216 couleurs différentes, il risque de ne pas pouvoir le faire simultanément. Les ordinateurs les plus anciens peuvent même ne pas avoir de carte vidéo capable d’utiliser un mode d’affichage 24 bits. Les ordinateurs les plus récents risquent de ne pas disposer de suffisamment de mémoire vidéo pour fonctionner en mode 24 bits à des résolutions d'écran supérieures. Un affichage sur un ordinateur personnel avec une résolution de 1024 768 pixels nécessiterait 2 359 296 (1024 768 3 = 2,25 Mo) de mémoire vidéo pour afficher des images 24 bits. Si l'ordinateur ne disposait que de 2 Mo de mémoire vidéo, il ne pourrait pas afficher d'images 24 bits avec cette résolution, mais avec une résolution inférieure à 800 x 600 (800 600 3 = 1,4 Mo).

    La solution conçue pour représenter les couleurs avant l’affichage sur 24 bits par pixel était de définir une palette de couleurs permettant de sélectionner un sous-ensemble des couleurs possibles. Conceptuellement, la palette est un tableau à 1 dimension d'éléments de 3 octets qui spécifient la couleur. Plutôt que de spécifier directement la couleur, chaque valeur de pixel est un index dans la palette de couleurs. La taille la plus courante pour une palette est 256 entrées, chaque valeur de pixel étant composée de 8 bits. La plupart des ordinateurs aujourd'hui peuvent

    3Pour éviter de trop s'attarder sur des implémentations système spécifiques, cette section contient quelques simplifications sur le comportement des périphériques d'affichage.

    9

  • 10 Introduction

    afficher des graphiques 8 bits dans toutes leurs résolutions d'affichage, mais les très vieux ordinateurs étaient limités à des tailles de palette encore plus petites.

    Les formats de fichier d'image bitmap représentent les couleurs essentiellement de la même manière que les écrans d'ordinateur. Certains spécifient la valeur de couleur pour chaque pixel, certains utilisent une palette de couleurs et d'autres prennent en charge les deux méthodes. Le tableau 1.2 présente les méthodes utilisées pour représenter les couleurs de divers formats d'image.

    Un format de fichier qui utilise une palette peut utiliser des valeurs de pixels de moins de 8 bits afin de réduire la taille du fichier. Une image de 4 bits par pixel nécessite deux fois moins de mémoire qu'une image de 8 bits par pixel de la même taille. Pour les images qui utilisent un nombre limité de couleurs, telles que les dessins animés, une méthode simple pour réduire le fichier

    Tableau 1.2Méthodes de représentation des couleurs

    BMPJPEGGIFPNG

    PaletteX

    XX

    Valeur de couleurXX

    X

    Compression

    Étant donné que les images bitmap en couleurs nécessitent généralement plus d'un mégaoctet de stockage, la plupart des formats de fichiers images incorporent des techniques de compression. Le techniqu de compression tire parti des motifs dans les données d’image pour trouver une représentation équivalente occupant moins d’espace. Les données complètement aléatoires ne peuvent pas être compressées.

    Vous trouverez ci-dessous une brève description des techniques de compression utilisées par les formats d'image décrits dans ce livre. Le tableau 1.3 présente les techniques utilisées par chaque format.

    Tableau 1.3Méthodes de compression utilisées par différents formats de fichier

    RLELZHuffmanDCT

    BMPX

    GIF

    X

    PNG

    XX

    JPEGX

    XX

  • Compression 11

    Run Length Encoding (RLE). Les pixels consécutifs ayant la même valeur sont codés à l'aide d'une paire de longueurs et de valeurs. Par exemple, une image avec la valeur de pixel 8 répétée 9 fois pourrait être représentée par la séquence de 2 octets

    0916 0816

    plutôt que

    0816 0816 0816 0816 0816 0816 0816 0816

    LZ Encodage. Le compresseur gère un dictionnaire contenant des séquences de valeurs de pixels déjà rencontrées. Le flux compressé contient des codes qui représentent des entrées dans le dictionnaire.

    Codage de Huffman. Plutôt que d'utiliser un nombre fixe de bits pour représenter les valeurs de composant, des codes de longueur variable sont utilisés. Les codes les plus courts sont attribués aux valeurs les plus fréquemment utilisées.

    Transformée en cosinus discrète (DCT). Les blocs de pixels sont représentés à l'aide de fonctions cosinus de fréquences différentes. Les hautes fréquences, qui contribuent généralement moins d'information à l'image, sont ignorées.

    L'efficacité d'une technique de compression dépend du type de données. La figure 1.7 est une photographie et la figure 1.8 est un dessin. La photo contient de nombreuses zones présentant de légères modifications de couleur, tandis que le dessin comporte de grandes zones de même couleur.

    Figure 1.7IRENE.BMP

  • 12 Introduction

    Figure 1.8STARS.BMP

    La figure 1.9 illustre les tailles relatives des fichiers lorsque la photo de la figure 1.7 est compressée à l'aide de divers formats: BMP non compressé, BMP avec codage RLE, GIF, PNG et JPEG. Notez que BMP-RLE et GIF produisent une très petite compression tandis que les formats PNG, et en particulier JPEG, réduisent considérablement la taille des fichiers.

    La figure 1.10 contient un graphique similaire, utilisant cette fois le dessin de la figure 1.8. Vous pouvez voir que les autres formats de fichiers rattrapent presque le format JPEG avec cette image.

    Les formats de fichier image Lossless versus Lossy CompressionMost utilisent ce que l'on appelle la compression sans perte. Cela signifie que si nous prenons une image, la compressons à l'aide d'une technique sans perte et la développons à nouveau, l'image résultante est identique, image par image, à l'original.

    Certaines méthodes de compression (notamment JPEG) entraînent des pertes. En utilisant la séquence de compression décrite ci-dessus, la compression avec perte produit une image qui est

    Figure 1.9 Pourcentage de OriginalSize CompressingIRENE.BMP

  • Commande d'octets et de bits 13

    Figure 1.10% de OriginalSize CompressingSTARS.BMP

    proche de l'original mais pas une correspondance exacte. C'est-à-dire qu'un pixel avec une valeur de couleur RVB de (128,243,118) dans une image compressée peut produire (127,243,119) lorsqu'il est développé. En compression d'image, les techniques avec perte tirent parti du fait que l'œil a du mal à distinguer les couleurs presque identiques.

    L'utilisation de la compression avec perte s'explique par le fait qu'elle donne généralement une compression beaucoup plus importante que les méthodes sans perte. Dans de nombreuses situations, de petites pertes de données sont acceptables en échange d’une compression accrue.

    Commande d'octets et de bits

    Tous les fichiers image bitmap contiennent des entiers stockés au format binaire. Pour les codeurs d'octets uniques, il n'y a pas de problème de compatibilité entre les différents types de processeurs. Ce n'est pas le cas avec les entiers multi-octets. Lors de la lecture d'entiers multi-octets, la question de savoir comment ordonner les octets à partir du flux d'entrée dans l'entier. Supposons qu'un fichier image contienne ces deux octets successivement.

    0 1 1 0 0 0 1 1 (6316) 0 0 0 1 1 0 1 (1D16)

    Bit le plus significatif Bit le moins significatif

    Si ces octets représentent un entier de 2 octets, doivent-ils être interprétés comme

    0 0 0 1 1 1 0 1 0 1 1 1 0 0 0 1 1 (1D6316) 75,2310

    ou

    0 1 1 0 0 0 1 1 0 0 0 0 1 1 1 0 1 (631D16) 25,37310

    En d'autres termes, dans un entier multi-octets, l'octet le plus significatif apparaît-il en premier ou en dernier dans le fichier?

  • introduction

    Le problème est que différents types de processeurs ordonnent les entiers différemment et que lire aveuglément des octets bruts dans une variable entière ne fonctionnera pas pour tous. La plupart des processeurs couramment utilisés, y compris Motorola 680×0 et Sun SPARCfamilies, stockent les entiers avec l'octet le plus significatif en premier. . Cette commande d'octets est connue sous le nom de big-endian. Il est également appelé ordre de réseau car il est utilisé par le protocole Internet. Les processeurs qui stockent les entiers avec l'octet le moins significatif sont appelés little-endian. The Intel 80×86 family of processors used in per-sonal computers is the most common little-endian processor.

    This code example produces different output depending upon whether it isrun on a big-endian or a little-endian system.

    14

    The answer to this question depends upon the type of file. Some formatsrequire that the most significant byte be stored first; others require that the leastsignificant be first.

    Why not order bytes in an image file so that a simple read operation like thisreads integers correctly?

    unsigned int value ;inputstream.read ((char *) &value, sizeof (value)) ;

    #include using namespace std ;main () 4

    115

    The encoding of data units is essentially the reverse of the decoding processshown in Chapter 8. There the Extend() function converted a magnitude valueand extra bits into a coefficient value. ReverseExtend() does the opposite,converting a coefficient value into a magnitude and extra bits.

    For each data unit, the DC coefficient is encoded first, followed by the ACcoefficients in zigzag order. The encoded DC value is actually the differencebetween the DC coefficient value in the current data unit and the DC value fromthe last data unit processed for the same component. The DC coefficient isencoded as a Huffman-encoded magnitude value (Table 9.1) followed by astring of unencoded bits. The magnitude value specifies the number of literalbits to follow.

    Only nonzero AC coefficients are encoded. They are stored as a Huffman-encoded byte value followed by a number of literal bits. The encoded byte isdivided into two 4-bit fields, with the 4 low-order bits giving the coefficientmagnitude value (Table 9.2) and the 4 high-order bits giving the number of zero-valued coefficients to skip. Both bit fields are set to zero when all the remainingAC coefficients are zero. The code F016 is used to represent a run of 16 zero-valued coefficients and is not followed by any literal bits.

    Algorithm 9.3 shows how a data unit is encoded using sequential JPEG. Theinput parameter is an array of quantized DCT coefficients arranged in the JPEGzigzag order.

  • 116 Creating Sequential JPEG Images

    Table 9.1DC DifferenceMagnitude Codesand Ranges

    Encoded Value0123456789

    1011

    DC Difference Range0

    -1, 1-3, -2, 2, 3

    -7 . . -4, 4 . . 7-15 . . -8, 8 . . 15

    -31 . . -16, 16 . . 31-63 . . -32, 32 . . 63

    -127 . . -64, 64 . . 127-255 . . -128, 128 . . 255-512 . . -256, 256 . . 511

    -1023 . . -512, 512 . . 1023-2047 . . -1024, 1024 . . 2047

    Algorithm 9.3Sequential-ModeData Unit Encoding

    Global LAST_DC_VALUEProcedure EncodeDataUnit (DATAUNIT (0..63))BeginDIFFERENCE = DATAUNIT (0) – LAST_DC_VALUELAST_DC_VALUE = DATAUNIT (0)ReverseExtend (DIFFERENCE, MAGNITUDE, BITS)HuffmanEncodeUsingDCTable (MAGNITUDE)WriteRawBits (MAGNITUDE, BITS)

    ZERORUN = 0II = 1While II < 64 DoBeginIf DATAUNIT (II) 0 ThenBeginWhile ZERORUN >= 16 DoBeginHuffmanEncodeUsingACTable (F016)ZERORUN = ZERORUN – 16End

    ReverseExtend (DATAUNIT (II), MAGNITUDE, BITS)HuffmanEncodeUsingACTable ((ZERORUN LeftShift 4) Or MAGNITUDE)WriteRawBits (MAGNITUDE, BITS)End

    ElseBeginZERORUN = ZERORUN + 1End

    EndIf ZERORUN 0 ThenBeginHuffmanEncodeUsingACTable (0016)End

    Fin

  • Huffman Table Generation

    Table 9.2AC MagnitudeCodes and Ranges

    Encoded Value123456789

    dix

    AC Difference Value Range-1, 1

    -3, -2, 2, 3-7 . . -4, 4 . . 7

    -15 . . -8, 8 . . 15-31 . . -16, 16 . . 31-63 . . -32, 32 . . 63

    -127 . . -64, 64 . . 127-255 . . -128, 128 . . 255-511 . . -256, 256 . . 511

    -1023 . . -512, 512 . . 1023

    Huffman Table Generation

    The code in the previous section makes use of Huffman coding, but where dowe get the Huffman tables used to encode the coefficients? The JPEG standarddoes not specify how Huffman codes are generated. Any set of Huffman codes,where no code is longer than 16-bits and no code consists of all 1-bits, can beused. There is no requirement that the codes be assigned to values based uponusage frequencies. Thus, the simplest method for Huffman encoding is to usea predefined Huffman table. The JPEG standard includes two sets of sampleHuffman codes for use with DC and AC coefficients,2 and an encoder can beimplemented so that it uses these or any other set of predefined tables.Nevertheless, while this method has the advantage of being fast and easy toimplement, it obviously does not compress as well as using Huffman codesbased upon usage frequencies.

    Unless compression speed is a major issue, it makes sense to create theHuffman tables from usage frequencies, which requires the encoder to make twopasses over the DCT coefficients in a scan. The obvious implementation methodis to have separate functions for gathering statistics and outputting codes. Theproblem with this method is that this requires two sets of nearly identical func-tions that must be kept in strict synchronization. The slightest implementationchange made to one set of functions would have to be made to the other, creatinga maintenance nightmare.

    A better solution is to use separate functions for processing individual codes.An encoder could implement two pairs of procedures similar to those shown inAlgorithm 9.4.

    2Section K.3 in the JPEG standard.

    117

  • 118 Creating Sequential JPEG Images

    Algorithm 9.4AC and DCCoefficientFunctions

    Procedure GatherDC (VALUE, EXTRABITS)Begin// EXTRABITS is not usedIncrementFrequency (VALUE)End

    Procedure PrintDC (VALUE, EXTRABITS)BeginFindHuffmanEncode (VALUE, CODE, CODELENGTH)WriteRawBits (CODELENGTH, CODE)If VALUE 0 Then

    WriteRawBits (VALUE, EXTRABITS)End

    Procedure GatherAC (VALUE, EXTRABITS)Begin// EXTRABITS is not usedIncrementFrequency (VALUE)End

    Procedure PrintAC (VALUE, EXTRABITS)BeginFindHuffmanEncode (VALUE, CODE, CODELENGTH)WriteRawBits (CODELENGTH, CODE)If (VALUE And 0F16 0 Then

    WriteRawBits (VALUE And 0F16, EXTRABITS)End

    Each procedure has an identical interface, so the encoding process code canbe modified to look the procedures in Algorithm 9.5, where DCPROCEDURE is apointer to either GatherDC or PrintDC and ACPROCEDURE is a pointer to eitherGatherAC or PrintAC.

    Using pointers to procedures allows the same function to be used both forgathering Huffman usage statistics and for encoding the actual data.

  • Global LAST_DC_VALUEProcedure EncodeDataUnit (DATAUNIT (0..63), DCPROCEDURE, ACPROCEDURE)

    BeginDIFFERENCE = DATAUNIT (0) – LAST_DC_VALUELAST_DC_VALUE = DATAUNIT (0)ReverseExtend (DIFFERENCE, MAGNITUDE, BITS)DCPROCEDURE (MAGNITUDE, BITS)

    ZERORUN = 0II = 1While II < 64 Do

    BeginIf DATAUNIT (II) 0 Then

    BeginWhile ZERORUN >= 16 Do

    BeginACPROCEDURE (F016, 0)ZERORUN = ZERORUN – 16End

    ReverseExtend (DATAUNIT (II), MAGNITUDE, BITS)ACPROCEDURE ((ZERORUN LEFTSHIFT 4) Or MAGNITUDE), BITS)End

    ElseBeginZERORUN = ZERORUN + 1End

    EndIf ZERORUN 0 Then

    BeginACPROCEDURE (0016)End

    Fin

    Algorithm 9.5Data Unit EncodingUsing FunctionPointers

    Conclusion

    This chapter covered the process for encoding sequential-mode JPEG images.The process for encoding baseline sequential and extended sequential files isbasically the same for 8-bit images, except for restrictions on the number oftables that can be defined. The JPEG standard (JPEG 1994) contains sampleHuffman and quantization tables.

    119Conclusion

  • Creating Sequential JPEG Images

    The source code example for this chapter is an encoder for sequential JPEGimages. The encoder application is a simple one for converting a Windows BMPfile to a sequential JPEG file. The command format is

    ENCODER input.bmp output.jpgto create a color JPEG file or

    ENCODER -g input.bmp output.jpgto create a grayscale file.

    Component ClassThe JpegEncoderComponent class represents a single component during theencoding process. Its main functions are sampling and data unit encoding.

    The EncodeSequential member function encodes data units. Two of itsparameters are pointers to member functions that process codes generated fromthe image. This allows the same function to be used both for gathering valueusage statistics for Huffman code generation and for Huffman encoding thedata. These parameters will be pointers to either the GatherDcData andGatherAcData functions or the PrintDcData and PrintAcData functions.The first pair gathers statistics; the second writes Huffman-encoded values to theoutput stream.

    Encoder ClassThe encoder class is JpegEncoder. Two of its member functions control imagecompression. The compression-versus-quality tradeoff is specified usingSetQuality. The quality value can be in the range 1-100 and determines theamount to scale the sample quantization table values. SetSampIingFrequencysets the horizontal and vertical sampling frequency (1-4) for a component.

    By default the encoder places all components in a single scan. TheSetScanAttributes member function can be used to place components in dif-ferent scans. The last two parameters to this function are used only for progres-sive JPEG (Chapter 10). For now they should always be 0.

    The InterleavedPass and NoninterleavedPass functions are the heartof the encoder. They control the order in which data units are encoded and whenrestart markers are written. These functions use pointers to member functions;thus, they can be used for both gathering Huffman usage statistics and Huffmanencoding.

    120

  • Optimizing the DCT

    At the start of the book we stated that we would strive for clarity rather than pro-gramming efficiency. This is the only chapter that deals with execution effi-ciency. Calculating the IDCT is the most time-consuming part of decoding aJPEG image file. Therefore, this is the best place to optimize for speed. The tech-niques for optimizing the IDCT work for the DCT as well. However, speed isgenerally more of an issue when decoding JPEG files than when encoding them.

    To optimize the IDCT and DCT calculations we are going to use mathemat-ics, not programming tricks. A basic knowledge of linear algebra and trigonom-etry will be very useful in getting through the derivations. Many people find itfrustrating when explanations of mathematical processes leave out just enoughsteps to get you lost, so the explanations are rather detailed. If you have no inter-est in the mathematics or find this tedious, you can simply skip to the end of thechapter to find the end result.

    Factoring the DCT Matrix

    In Chapter 6 we explained that two matrices are multiplied by taking the dotproduct of the rows of the first matrix with the columns of the second matrix.When we use a matrix to calculate the IDCT we use Equation 10.1, where Misthe transform matrix and T is the input data to the transform.

    Equation 10.1 V = MTTMInverse DCT

    Since matrix multiplication is associative, we can perform the two multi-plication operations in either order. For consistency, in this chapter we aregoing to perform the multiplication as V = MT(TM). In other words, we are

    121

    Chapitre 10

  • 122 Optimizing the DCT

    Equation 10.2

    going to multiply the rows of T by the columns of M to create a temporarymatrix. Then we are going to multiply the columns of the temporary matrix bythe rows of MT.

    When we multiply T M each output row depends only on the correspond-ing row in T so we can treat the calculation of each row separately. Similarly,when we multiply MT and the temporary matrix, each column in the outputdepends only on the corresponding column in the temporary matrix.

    The DCT transform matrix as originally presented in Chapter 7 is repeatedwith the substitution in Equation 10.2. Each row/column dot productrequires 8 multiplication operations and 7 additions; therefore, transforming eachrow requires 64 multiplication operations and 56 additions.

    Notice that there is much symmetry in the matrix. We will exploit these sym-metries by factoring the DCT transform matrix into the product of several sparsematrices.

    The first transformation we are going to make is to factor out the constantvalue from each element in the matrix and redefine the IDCT using the equiv-alent definition shown in Equation 10.3.

    Equation 10.3 V = MTTM

    M =

  • Equation 10.4

    Factoring the DCT Matrix 123

    M =

    The next few simplifications take advantage of the symmetries in the valuesof the cosine function to reduce the number of unique values in the transformmatrix. The properties of the cosine function can be found in any mathematicalhandbook.

    If you refer to Figure 10.1 you can see that the cosine function is cyclicalsuch that

    cos x = cos (x + 2)

    Using Equation 10.5 we can replace every occurrence of in Equation 10.4with , giving Equation 10.6.

    Equation 10.5

    Figure 10.1Cosine Function

    For now we will ignore the factor and simply work with the matrix shownin Equation 10.4.

  • 124 Optimizing the DCT

    Equation 10.6

    Equation 10.7

    Equation 10.8

    M =

    Equation 10.9

    Again referring to Figure 10.1, the cosine function is symmetric along thex-axis such that

    Using Equation 10.7 we can replace all the matrix elements in Equation 10.6with arguments to the cosine function that are greater than , giving Equation10.8.

    The cosine function is also symmetric along the y-axis such that

    M =

  • Using Equation 10.9 we can replace all arguments to the cosine function inEquation 10.8 that are greater than , giving Equation 10.10.

    Factoring the DCT Matrix 125

    Equation 70.70

    The value of the cosine function at is well known.

    Substituting Equation 10.11 in to Equation 10.10 gives Equation 10.12.

    Equation 10.11

    Equation 10.12

    M =

    M =

  • Optimizing the DCT

    Disregarding the sign, only seven distinct values remain in the transformmatrix. Now that the values within the transform matrix have been simplified wewill factor the matrix into the product of several sparse matrices. The primarygoal in the factorization of the transform matrix is to create matrix factors withas many zero values as possible. The secondary goal is to create matrices with thevalues +/-1. Zeros are great, ones are so-so, and everything else is bad.

    The process used to factor the matrix is called Gaussian elimination.Gaussian elimination is beyond the scope of this book. However we haveattempted to include enough steps for a reader with a basic knowledge of linearalgebra to clearly see how the factors are obtained.

    Notice that if the matrix in Equation 10.12 is divided in half vertically, theleft half of each row is either a mirror image of the right or a negative mirrorimage. We can factor the matrix to group the mirror image rows together and thenegative mirror image rows together (Equation 10.13). This first factorization isnot strictly necessary. Its only purpose is to make the remaining factorizationsteps clearer.

    The following examples of matrix multiplication operations illus-trate the principles of row reduction used to factor the DCT matrix.

    126

    =

    =

    =

    =

    =

    =

    Row Interchange

    Row Addition

    Column Interchange

    Column Addition

  • M =

    Equation 70.73

    127Factoring the DCT Matrix

  • 128 Optimizing the DCT

    In Equation 10.14, notice that the nonzero elements at the upper left cornerof the center matrix form the same mirror pattern as the rows of the matrix inEquation 10.13. We factor again in a similar manner to attack that corner(Equation 10.15).

    Equation 10.14

    M =

  • Factoring the DCT Matrix 129

    Take a look at the 4 4 submatrix at the lower right corner of the secondmatrix in Equation 10.15 and in Equation 10.16.

    Equation 10.15

    M =

    Equation 10.16

    S =

  • Optimizing the DCT

    Equation 10.18

    Equation 7.11

    Equation 7.20

    130

    This matrix can be factored out even more if we take advantage of theserelations, which can be found in any book of standard mathematical formulas.

    Using Equation 10.17 we find that

    A derivation from Equation 10.18. The other values are derived ina similar manner.

    Equation 10.17

  • Factoring the DCT Matrix 131

    S =

    S =

    Equation 10.21

    Equation 10.20

    Equation 10.19

    S =

    Substituting Equation 10.18 into Equation 10.16 gives the results inEquation 10.19.

    Equation 10.19 can be factored into Equation 10.20.

  • 132 Optimizing the DCT

    Putting Equation 10.21 into Equation 10.15 gives the matrix shown inEquation 10.22.

    Equation 10.22

    M =

  • Factoring the DCT Matrix 133

    Equation 10.22 appears to be much more complicated than the original inEquation 10.2, but it actually requires fewer steps to calculate. Multiplying a rowvector and M requires 64 multiplication and 56 addition operations. In the fac-tored matrices any nonzero value that is not equal to +/-1 represents a multipli-cation and all but one nonzero value in each column represents an addition.A zero in a factor represents no operation and almost all of the array elements arezeros. Table 10.1 shows the operations required for each matrix multiplicationwith the factored matrices.

    Table 10.1OperationsRequired AfterFactorization

    Matrix1234567Total

    Addition0842048

    26

    Multiplication0

    1200200

    14

    Equation 10.23

    Most processors take longer to execute multiplication instructions than addi-tions. For example, on my system the speed advantage is about 4:1 in favor ofaddition. Therefore, it is usually advantageous to replace multiplication opera-tions with addition. In the factored matrix most of the multiplication operationsare in groups of the form

    X = A cos + B sinY = A sin – B cos

    This form is well known in computer graphics because it represents the rota-tion of the point (A, B) by an angle. A rotation uses four multiplication operationsand two additions, but if it is calculated as shown in Equation 10.23, it requires3 multiplications and 3 additions. If this method is used for multiplying thematrix, the total number of operations required to multiply a row by the DCTmatrix becomes 11 multiplications and 29 additions.

  • 134 Optimizing the DCT

    Equation 10.24 T = cos (A + B)X = T – (cos – sin )BY = -T + (cos + sin )A

    The following code example is an implementation of an IDCT function thatslavishly follows the factorization in Equation 10.22 so that you can clearly seehow the factored matrix multiplications are implemented in code.

    typedef double MATRIX (8)(8) ;const double C1 = (sqrt (2.0) * cos (M_PI/16.0)) ;const double C2 = (sqrt (2.0) * cos (2.0*M_PI/16.0)) ;const double C3 = (sqrt (2.0) * cos (3.0*M_PI/16.0)) ;const double S1 = (sqrt (2.0) * sin (M_PI/16.0)) ;const double S2 = (sqrt (2.0) * sin (2.0*M_PI/16.0)) ;const double S3 = (sqrt (2.0) * sin (3.0*M_PI/16.0)) ;const double SQRT2 = (1.0 / sqrt(2.0)) ;unsigned int Limit (double input)

    double value = input + 128.5 ;if (value < 0)

    return 0 ;else if (value > 255)

    return 255 ;else

    return (unsigned int) value ;

    void InverseDCT (MATRIX input, MATRIX output)

    double tmp(SampleWidth)(SampleWidth) ;for (int row = 0 ; row < 8 ; ++ row)

    double a0 = input(row)(0)double a1 = input(row)(4)double a2 = input(row)(2)double a3 = input(row)(6)double a4 = input(row)(1)double a5 = input(row)(5)double a6 = input(row)(3)double a7 = input(row)(7)

    double b0 = (a0 + a1) ;double b1 = (a0 – a1) ;

    // b2 = S2 * 2 – C2 * a3 ;// b3 = C2 * a2 + S2 * a3 ;

    double r0 = S2 * (a2 + a3) ;double b2 = r0 – (S2+C2) * a3 ;double b3 = r0 – (S2-C2) * a2 ;

    // b4 = S1 * a4 – C1 * a7 ;// b7 = C1 * a4 + S1 * a7 ;

  • Factoring the DCT Matrix 135

    double r1 = S1 * (a4+a7) ;double b4 = r1 – (S1+C1) * a7 ;double b7 = r1 – (S1-C1) * a4 ;

    // b5 = C3 * a5 – S3 * a6 ;// b6 = S3 *a5 + C3 * a6 ;

    double r2 = C3 * (a5 + a6) ;double b5 = r2 – (C3+S3) * a6 ;double b6 = r2 – (C3-S3) * a5 ;

    double c0 = b0 ;double c1 = b1 ;double c2 = b2 ;double c3 = b3 ;double c4 = b4 + b5 ;double c5 = b5 – b4 ;double c6 = b7 – b6 ;double c7 = b7 + b6 ;

    double d0 = c0 ;double d1 = c1 ;double d2 = c2 ;double d3 = c3 ;double d4 = c4 ;double d5 = c6 + c5 ;double d6 = c6 – c5 ;double d7 = c7

    double e0 = d0double e1 = d1double e2 = d2double e3 = d3double e4 = d4double e5 = SQRT2 * d5 ;double e6 = SQRT2 * d6 ;double e7 = d7 ;

    double f0 = e0 + e3double f1 = e1 + e2double f2 = e1 – e2double f3 = e0 – e3double f4 = e4double f5 = e5double f6 = e6double f7 = e7

    tmp (row)(0) = (f0 + f7)tmp (row)(1) = (f1 + f6)tmp (row)(2) = (f2 + f5)tmp (row)(3) = (f3 + f4)tmp (row)(4) = (f3 – f4)

  • 136 Optimizing the DCT

    tmp (row)(5) = (f2 – f5)tmp (row)(6) = (f1 – f6)tmp (row)(7) = (f0 – f7)

    for(int col = 0 ; col < 8 ; ++ col)double a0 = tmp (0)(col)double a1 = tmp (4)(col)double a2 = tmp (2)(col)double a3 = tmp (6)(col)double a4 = tmp (1)(col)double a5 = tmp (5)(col)double a6 = tmp (3)(col)double a7 = tmp (7)(col)

    double b0 = (a0 + a1) ;double b1 = (a0 – a1) ;

    // b2 = S2 * a2 – C2 * a3 ;// b3 = C2 * a2 + S2 * a3 ;

    double r0 = S2 * (a2 + a3) ;double b2 = r0 – (S2+C2) * a3 ;double b3 = r0 – (S2-C2) * a2 ;

    // b4 = S1 * a4 – C1 * a7 ;// b7 = C1 * a4 + S1 * a7 ;

    double r1 = S1 * (a4+a7) ;double b4 = r1 – (S1+C1) * a7 ;double b7 = r1 – (S1-C1) * a4 ;

    // b5 = C3 * a5 – S3 * a6 ;// b6 = S3 * a5 + C3 * a6 ;

    double r2 = C3 * (a5 + a6) ;double b5 = r2 – (C3+S3) * a6 ;double b6 = r2 – (C3-S3) * a5 ;

    double c0 = b0 ;double c1 = b1 ;double c2 = b2 ;double c3 = b3 ;double c4 = b4 + b5 ;double c5 = b5 – b4 ;double c6 = b7 – b6 ;double c7 = b7 + b6 ;

    double d0 = c0 ;double d1 = c1 ;double d2 = c2 ;double d3 = c3 ;double d4 = c4 ;double d5 = c6 + c5 ;double d6 = c6 – c5 ;double d7 = c7 ;

  • Scaled Integer Arithmetic 137

    Scaled Integer Arithmetic

    On most processors, floating-point operations take much longer than integeroperations. Another method for speeding up the IDCT and DCT calculations isto use only integer operations. To simulate real numbers we scale the integer val-ues by multiplying them by a power of 2.

    If we were to scale all values by the 2, then we could represent the values… -2, -1.5, 1, -.5, 0, .5, 1, 1.5, 2 … using integers. If we scale them by 4, wecan represent … -1.25, -1, -.75, -.5, -.25, 0, .25, .5, .75, 1, 1.25. The more wescale the integers, the more precision we get in the calculation. Malheureusement,

    double e0 = d0 + (128*8) ;double e1 = d1 + (128*8) ;double e2 = d2double e3 = d3double e4 = d4double e5 = SQRT2 * d5 ;double e6 = SQRT2 * d6 ;double e7 = d7 ;

    double f0 = e0 + e3 ;double f1 = e1 + e2 ;double f2 = e1 – e2 ;double f3 = e0 – e3 ;double f4 = e4 ;double f5 = e5 ;double f6 = e6 ;double f7 = e7 ;

    double g0 = f0 + f7double g1 = f1 + f6double g2 = f2 + f5double g3 = f3 + f4double g4 = f3 – f4double g5 = f2 – f5double g6 = f1 – f6double g7 = f0 – f7

    output (0)(col) = Limit (g0/8.0)output (1)(col) = Limit (g1/8.0)output (2)(col) = Limit (g2/8.0)

    output (3)(col) = Limit (g3/8.0)output (4)(col) = Limit (g4/8.0)

    output (5)(col) = Limit (g5/8.0)output (6)(col) = Limit (g6/8.0)output (7)(col) = Limit (g7/8.0)

  • Optimizing the DCT

    The problem with using scaled integers rather than floating-point values isthat unless you have a system with 64-bit integers you can never get the same pre-cision you can with floating-point numbers. Generally the difference between thetwo methods is very small, but a difference does exist that can produce greaterrounding errors.

    If you are implementing a JPEG application, you may wish to use floating-point values when you are compressing images and scaled integers when you aredecompressing them. Generally speed is more important when you want to viewa compressed image than when you are compressing one. This would give youmore accuracy when you create an image and more speed when you are view-ing one.

    138

    const int scale = 5 ;long v1 = 2 scale ;// Divisionlong v5 = (v1

  • Merging Quantization and the DCT 139

    Merging Quantization and the DCT

    Equation 10.25

    Something we did not account for in the operation totals in Table 10.1 is that wefactored out the value from the IDCT equation. As you can see in Equation10.3, this means that we have another 64 multiplication operations to add in atthe end. If the IDCT is implemented using scaled integers, then dividing by 8 canbe done as a bit shift that can be combined with descaling the final result.Unfortunately, this does not work with floating-point numbers.

    Another way to get rid of the term is to merge this value with the quan-tization values. In fact, if we factor the DCT matrix a little differently, it is pos-sible to save even more than the 64 operations resulting from this term. For thisfactorization of the DCT matrix we are going to follow the same steps as in theprevious one until we get to Equation 10.15. From there we will branch off onanother path.

    A formula that will play a key role in the factorization is the product ofcosines formula shown in Equation 10.25.

    In the factored matrix in Equation 10.15 the cosine terms in each row orcolumn can be grouped into pairs

    whose sum is , where the value of the cosine function is zero.If we extract this submatrix from Equation 10.15 (Equation 10.26) it can be

    factored as in Equation 10.27 using the cosine sum formula in Equation 10.25.

    Equation 10.26

  • 140 Optimizing the DCT

    Equation 10.27

  • Merging Quantization and the DCT 141

    The trick in this factorization is to choose the cosine row scaling values insuch a way that the cos (0) terms end up in the same column.

    If we create another submatrix for the remaining cosine terms in Equation10.15, we can follow a similar factorization process (Equation 10.28).

    Equation 10.28

    Table 10.2OperationsRequired for theNew Factorization

    Matrix123456789

    10Total

    Addition0062221448

    29

    Multiplication6002030000

    11

  • 142 Optimizing the DCT

    Once again, the cosine scaling terms have been chosen so that all of thecos (0) terms end up in the same column. Equation 10.29 is the remainder of thefactorization.

    Equation 10.29

  • Merging Quantization and the DCT 143

    We can repeat the factorization process with the matrix with the cosine termsto give Equation 10.30.

    Equation 10.30

    Putting it all together by substituting Equation 10.27, Equation 10.28, andEquation 10.30 into Equation 10.15 gives Equation 10.31.

  • 144 Optimizing the DCT

    M =

    Equation 10.31

  • Equation 10.32

    Since Equation 10.32 holds true, the matrix multiplication operation can bereordered so that the scaling is the first step when calculating the IDCT and thelast step when calculating the FDCT.

    Substituting Equation 10.32 into Equation 10.31 gives Equation 10.33.The number of operations needed to calculate each row or column using the

    factorization in Equation 10.33 is shown in Table 10.2 (page 141).

    Merging Quantization and the DCT 145

  • 148 Optimizing the DCT

    Equation 10.36

    In JPEG decoding, the IDCT is immediately preceded by dequantizationwhich multiplies each of the elements in T by a constant value. If we merge theIDCT and dequantization processes by scaling each quantization value by Sij,then the steps in Equation 10.35 can be eliminated. In the FDCT the operationsin Equation 10.35 become the final step before quantization. If each element inthe quantization table used for encoding is divided by Sij, then these operationscan be eliminated from the DCT as well. This leaves 29 addition operations and5 multiplication operations per data unit.

    Conclusion

    In this chapter you have seen how matrix factorization can be used to dramati-cally reduce the number of operations required to implement the DCT and IDCT.Efficiency is a product of design; not a result of coding. Before cluttering up yourcode with performance enhancements, be sure to do measurements to ensure thatthe performance benefit, if any, is worth the lack of clarity.

    The code examples for this chapter are new implementations of theJpegEncoderDataUnit, JpegDecoderDataUnit, JpegEncoderQuantiza-tionTable, and JpegDecoderQuantizationTable classes that were origi-nally presented in Chapter 7. The new classes merge the DCT and IDCT withquantization using the process described in this chapter.

    These new classes are structured so that they can replace the previous ver-sions in the JpegDecoder (Chapter 8) and JpegEncoder (Chapter 9) classeswith few modifications. JpegDecoder needs no modifications to use the newclasses. JpegEncoder needs to add a call to the Bui1dScaledTables memberfunction of the quantization table class before performing the DCT.

    The effect of this expression is to multiply each element in the matrix V bythe constant value Sij, where

    Sij = F(i)F(j)F(0) = 1F(n) = , n=1,2, 3, 4, 5, 6, 7

  • Merging Quantization and the DCT 147

    The additional benefit in this factorization is that we can eliminate the sixmultiplication operations in the first matrix by merging them with the quantiza-tion or dequantization processes. Matrix multiplication is associative and matrixmultiplication by a scalar is commutative. Therefore, if M is factored as shown inEquation 9.31, the matrix operations in the IDCT calculation (Equation 10.34)can be ordered so that these three operations take place first (Equation 10.35).

    Equation 10.34

    Equation 10.35

    V = MTTM IDCTT = MVMT FDCT

    T

  • Progressive JPEG

    This chapter covers the encoding and decoding of progressive JPEG images,which, while infrequently used, are becoming more common. One driving forcebehind the use of progressive JPEG is the World Wide Web, an ideal use for pro-gressive JPEG images. Using progressive images in Web pages makes it possiblefor users to get a good idea of the contents of a Web page before the images arecompletely downloaded. The other major force behind progressive JPEG is theincreasing availability of software and libraries (notably the IJG's library) thatsupport it.

    Component Division in Progressive JPEG

    A sequential JPEG file may contain a single scan, or the data may be organizedinto multiple scans, each containing one or more components. However, insequential JPEG, each component is completely encoded within one scan. In pro-gressive JPEG, components are encoded across multiple scans. Each componentis contained in at least two scans and possibly in as many as 896.1

    Components are divided across scans in two distinct ways. One of these isknown as spectral selection. Spectral selection refers to the division of the com-ponent into bands of DCT coefficients. Each band is a continuous range of DCTcoefficients using the zigzag order. The only restriction on the coefficients in aband, other than that the band must contain a continuous range, is that the DCcomponent must be in a band by itself. At a minimum, a component will bedivided into two scans: one containing the DC coefficient and the other the AC

    1In practice the number of scans is never anywhere near the high end of this range.

    Chapitre 11

    149

  • 150 Progressive JPEG

    coefficients. At the most extreme the component can be divided into 64 bandswith one coefficient in each.

    The first scan for a component must contain the DC coefficients. The bandscontaining the AC coefficients may be encoded in any order. Progressive scanscontaining the DC coefficient may be interleaved while scans containing ACcoefficients must be noninterleaved. Figure 11.1 shows an example of a dataunit divided into four bands.

    The other component division in progressive scans is known as successiveapproximation, which is used in conjunction with spectral selection to divideindividual bands into a number of scans. Unlike spectral selection, successiveapproximation in a progressive image is completely optional. When it is used, theprecision of the initial band is reduced by a number of bits. Subsequent scans forthe band increase the precision one bit at a time.

    The conversion function used with successive approximation is called thepoint transform in the JPEG standard. For DC coefficients the point transform issimply a right shift by the number of bits specified by the successive approxima-tion value. For AC coefficients the point transform is

    Output =

    At first glance it may appear that the AC and DC point transforms are thesame, but this is not the case. Table 11.1 shows the AC and DC point transformswith a successive approximation value of 1. Notice that the AC and DC point

    Figure 11.1Example of a DataUnit Divided intoSpectral SelectionBands

  • Processing Progressive JPEG Files 151

    Table 11.1DC and AC Valueswith a SuccessiveApproximationValue of 1

    Input543210

    -1-2_3-4-5

    DC221100

    -1-1-2-2-3

    AC2211000

    -1-1-2-2

    Figure 11.2Sample ScanDivision by SpectralSelection andSuccessiveApproximation

    transform values may be different for negative values and, for the AC point trans-form, F(X) = -F(-X).

    Figure 11.2 is an example of a component divided into 12 scans using a com-bination of spectral selection and successive approximation. The spectral selec-tion division is the same as that shown in Figure 11.1. The first scan in this exam-ple contains the DC coefficient with a successive approximation of 3. Theremaining three spectral selection bands can be output in any order. However,within each band, the scans that refine coefficients using successive approxima-tion must be in strict order.

    Processing Progressive JPEG Files

    The overall processing of Progressive JPEG files can be implemented in thesame manner that is commonly used for sequential files. The major difference isin the processing of the scans. Figure 11.3 illustrates the difference between theprocessing of sequential files and progressive files.

    It should be apparent that displaying progressive JPEG files on the flyrequires significantly more processing than displaying sequential files or evendisplaying progressive files only after they have been completely decoded. Thisis why displaying progressive JPEG files on the fly only makes sense if the data

    Succ

    essiv

    eAp

    prox

    imat

    ionBit

    s0

    13

    0 Coefficients 63

  • is being received over a network at a relatively slow rate compared to the speedof the processor.

    Figure 11.3 shows progressive JPEG images being updated after every scanbut this is not necessary. It is possible for a decoder to be implemented so thatit updates and displays the image only when the decoder determines that it hasreceived enough new data for updating. When displaying the image on the fly,the process for updating is the same as for a sequential file. If a progressiveimage is not displayed on the fly, the overall decoding process is basically thesame as for sequential JPEG and there is little difference in the amount of pro-cessing required.

    Processing Progressive Scans

    The first step in decoding a progressive scan is to determine which of these typesthe scan is. There are four types of progressive scans and each is processed in adifferent manner.

    2See Chapter 5.

    152

    Figure 11.3Sequential andProgressive JPEGProcessing

    Progressive JPEG

    Sequential Process Progressive Process

    First scan for a bandRefining scan for a band

    DC1. First DC scan2. Refining DC scan

    AC3. First AC scan4. Refining AC scan

    All of the information needed to determine the scan type is stored in the SOSmarker,2 where the spectral selection start and end fields specify the coefficientsin the band. If both of these values are zero, the scan is a DC scan. If both valuesare nonzero, then it is an AC scan.

    The successive approximation value is stored in two 4-bit fields packed into1 byte. If the 4 high-order bits are zero, the scan is the first scan for the band.Otherwise, when this field is nonzero it is a refining scan. If both of these bitfields are zero, then successive approximation is not used for the band.

  • Huffman Table Usage In Progressive Scans

    The validations on these fields that decoders should perform include thefollowing:

    If the spectral selection start is zero, the spectral selection end must be zero. If the spectral selection end is zero, the spectral selection start must be zero. The spectral selection start must not be greater than the spectral selection

    end. If the spectral selection start is not zero, the scan may contain only one

    component.

    The spectral selection end may not be greater than 63. The low-order and high-order successive approximation bit fields may not

    be greater than 13. The high-order successive approximation bits must be either zero or one

    greater than the low-order bits.

    MCUs in Progressive Scans

    Data in progressive scans is organized into MCUs in exactly the same manner asin sequential scans. DC progressive scans can be interleaved or noninterleaved.AC scans are always noninterleaved so there will always be one data unit perMCU. For progressive scans containing AC data, the number and position of thedata units are the same as for a noninterleaved sequential scan.

    Progressive scans may include restart markers. The restart interval specifiesthe number of MCUs between restart markers just as in sequential scans. In DCscans, restart marker processing is the same as in sequential scans. For AC scans,the end-of-band runs may not cross restart markers. End-of-band runs will beexplained shortly.

    Huffman Table Usage In Progressive Scans

    The SOS marker specifies the numeric identifier of the Huffman tables used byeach component in the scan. The JPEG standard requires that all of the Huffmantables required by a scan be defined before the SOS marker. When a decoder ishandling progressive scans it must validate the existence of the Huffman tablesused by the scan differently from the way it does with a sequential scan.

    153

  • 154 Progressive JPEG

    Each component in a sequential scan requires two Huffman tables (DC andAC). In progressive JPEG a scan will use either DC tables or AC tables, but notboth. In fact, a refining progressive DC scan does not use Huffman tables at all.It is entirely possible for a progressive DC scan to occur in a JPEG file beforeany AC Huffman tables have been defined, something that is illegal in sequentialJPEG.

    Data Unit Decoding

    First DC ScansThe process for decoding the first scan for DC coefficients is nearly identical tothat for DC coefficients in a sequential scan (see Chapter 8). The only differenceis that the point transform is applied to the decoded DC value (left-shifted by thesuccessive approximation value) before being stored as a coefficient value.Algorithm 11.1 illustrates the decoding of a data unit.

    Algorithm 11.1DecodingProgressive DCCoefficients

    GLOBAL SUCCESSIVEAPPROXIMATIONGLOBAL LAST_DC_VALUE

    Procedure FirstDCDataunit (COEFFICIENTS (0..63))BeginBITCOUNT = DecodeUsingDCHuffmanTable ()BITS = ReadLiteralBits (BITCOUNT)DCDIFFERENCE = Extent (BITS, BITCOUNT)DCVALUE = DCDIFFERENCE + LAST_DC_VALUELAST_DC_VALUE = DCVALUECOEFFICIENTS (II) = DCVALUE LeftShift SUCCESSIVEAPPROXIMATIONEnd

    Refining DC ScansRefining DC scans are the simplest of all JPEG scans to handle. The scan's com-pressed data consists entirely of raw data bits, one bit per data unit. Algorithm11.2 shows all the processing required to process a data unit in this type of scan.

    Algorithm 11.2Refining DCCoefficients

    GLOBAL SUCCESSIVEAPPROXIMATION

    Procedure RefineDCDataUnit (COEFFICIENTS (0..63))BeginBIT = ReadLiteralBits (1)DCCOEFFICIENTS (0) = DCCOEFFICIENTS (0)

    Or (BIT LeftShift SUCCESSIVEAPPROXIMATION)End

  • Table 11.2AC Codes andCorresponding EOBRun Length

    First AC ScansThe simplicity of decoding DC coefficients is more than compensated for by thecomplexity of decoding AC coefficients. For the first scan of an AC band theencoding is similar to sequential with some additions.

    Progressive JPEG adds the concept of an end-of-band (EOB) run. This is arun of data units where the AC coefficients within the band are all zero. Insequential JPEG each data unit is encoded independently from every other.Because of EOB runs, data units are not independent in AC scans.

    In sequential JPEG the Huffman-encoded value 0016 is used to set theremaining coefficients in a data unit to zero. This is equivalent to an EOB runof 1. Progressive scans can contain much longer EOB runs. Table 11.2 lists theEOB codes and the possible EOB run lengths associated with them.

    Raw bits following the Huffman-encoded byte are used to specify EOB runsjust as they are with literal values. The 4 high-order bits specify the number ofadditional bits when used as part of an EOB code. Since EOB runs are alwayspositive, the Extend() function is not used. Instead, the conversion from raw bitsto the EOB run length is

    EOBRUN = (1 LeftShift HIGHBITS) + ReadRawBits (HIGHBITS)

    If an EOB code occurs in the compressed stream when a band has been par-tially processed, the current band becomes the first data unit in the EOB run andthe remaining coefficients are set to zero. Be sure to count the current band whenprocessing an EOB run.

    Code Value

    0016101620163016401650166016701680169016A016B016C016D016E016

    EOB Run Length12-34-78-15

    16-3132-6364-127

    128-127256-511512-1023

    1,024-20472,048-40954,096-81918,192-16,383

    16,384-32,767

    Data Unit Decoding 155

  • Progressive JPEG

    Algorithm 11.3 shows how the first scan for an AC band is decoded. Themain differences between sequential and progressive processing are:

    Only coefficients within the spectral selection band are updated. Coefficient values are left-shifted by the successive approximation value. EOB run processing

    Refining AC ScansRefining AC scans is the nightmare of progressive JPEG decoding. The imple-mentation problem is that these scans contain data to refine all previouslynonzero coefficients that are skipped as a result of a zero run or EOB run.

    In a refining AC scan, the 4 low-order bits of the Huffman-encoded valuescan be only 1 or zero. This should make sense since a refining scan only adds1 bit per coefficient. If a coefficient needed more than 1 bit it would have beenencoded in earlier scans.

    The processing of refining AC scans is nearly identical to that of initial scans.Whenever the value of the 4 low-order bits of the Huffman-encoded value is 1, acoefficient is being made nonzero for the first time. This value is immediately fol-lowed in the input stream by 1 raw sign bit. The new coefficient value is

    156

    If SignBit = 1 ThenCoefficientValue = 1 LeftShift ScanSuccessiveApproximation

    Else If SignBit = 0 ThenCoefficientValue = -1 LeftShift ScanSuccessiveApproximation

    This is essentially the same as using the Extend function.The major difference in refining scans is the processing of zero and EOB

    runs. Zero runs in a refining scan only count zero-valued coefficients. Thissequence of coefficients would be skipped by a zero run of 4 (not 6):0 0 4 0 2 0

    Whenever a nonzero coefficient is skipped, as a result of a zero or EOB run, theinput stream contains 1 raw bit to refine that coefficient.

    In our previous example, suppose that the Huffman-encoded value were4116. Three raw bits would follow this code: The first bit would be the sign bitfor the coefficient being made nonzero; the next bit would refine the coefficientwith the value 4, and the last would refine the coefficient with the value 2.

    Algorithm 11.4 shows the process for refining a previously nonzerocoefficient.

    Once the correct number of zero-valued coefficients has been skipped, thenext coefficient is assigned the value found at the start of the procedure. Noticethat the data for coefficients is not stored strictly in order. The data for the lastcoefficient comes first in the input stream.

  • Global EOBRUNGlobal SSS // Spectral Selection StartGlobal SSE // Spectral Selection EndGlobal SUCCESSIVEAPPROXIMATIONGlobal EOBRUN

    Procedure ACFirstDataUnit (COEFFICIENTS (0..63))Begin

    For II = SSS To SSE DoCOEFFICIENTS (II) = 0

    If EOBRUN > 0 ThenBeginEOBRUN = EOBRUN – 1ReturnEnd

    II = SSSWhile II

  • 158 Progressive JPEG

    Algorithm 11.4Refining ACCoefficient Values

    Procedure RefineAC (COEFFICIENT)BeginIf COEFFICIENT > 0 Then

    BeginIf ReadRawBits (1) 0 Then

    BeginCOEFFICIENT = COEFFICIENT

    + (1 LeftShift SUCCESSIVEAPPROXIMATION)End

    EndElse if COEFFICIENT < 0 Then

    BeginIf ReadRawBits (1) 0 Then

    BeginCOEFFICIENT = COEFFICIENT

    + (-1 LeftShift SUCCESSIVEAPPROXIMATION)End

    Fin

    If the 4 low-order bits of the Huffman-encoded byte are zero, the code doesnot create any new nonzero coefficients. However, the process of updatingexisting nonzero coefficients still takes place. When the value of the Huffman-encoded byte is F016 the next 16 zero-valued coefficients are skipped butall intervening nonzero-valued coefficients are updated using the previousprocedure.

    As with initial scans, the code values 0016, 1016, … E016 instruct the decoderto skip to the end of a number of bands. The 4 high-order bits specify the num-ber of additional bits to read and the EOB count is calculated in the same way asbefore. All nonzero-valued coefficients that are skipped as a result of an EOBskip are refined using raw bits.

    An important thing to remember when skipping zero-valued coefficients isthat the decoder should end up either at the end of a band or on a zero valuedcoefficient.

    Suppose a band contained 20 coefficients with these values

    00000800080800000088

    at the start of a scan. If the first byte decoded for this band were the value 8116,this would instruct the decoder to skip 8 zero-values. After skipping 5 coefficientsthe decoder would read 1 raw bit to refine the first value of 8.

    Refined

    0 0 0 0 0 8 0 0 0 8 0 8 0 0 0 0 0 0 8 8

  • Data Unit Decoding 159

    The decoder would skip the next 3 zero-valued coefficients (for a total of 8), atwhich point it would be positioned here.

    00000800080800000088

    Since this coefficient is already nonzero, all we can do is refine it. The decodershould advance to the next zero-valued coefficient while refining the nonzerocoefficients it skips.

    Refined New Coefficient

    0 0 0 0 0 8 0 0 0 8 4 8 0 0 0 0 0 0 8 8

    Algorithm 11.5 illustrates the process for refining AC coefficients in a pro-gressive scan.

    Algorithm 11.5Refining ACProgressive ACCoefficients

    Global SSS // Spectral Selection StartGlobal SSE // Spectral Selection EndGlobal SUCCESSIVEAPPROXIMATIONGlobal EOBRUN

    Procedure ACRefineDataUnit (COEFFICIENTS (0..63))BeginII = SSSWhile II 0 Then

    BeginWhile II 0 OR COEFFICIENTS (II) 0 Do

    BeginIf COEFFICIENTS (II) 0 Then

    RefineAC (COEFFICIENTS (II)) (continued)

  • Progressive JPEG

    ElseHIGHBITS = HIGHBITS – 1

    II = II + 1End

    If EXTRABIT 0COEFFICIENTS (II) = 1 LeftShift SUCCESSIVEAPPROXIMATION

    ElseCOEFFICIENTS (II) = -1 LeftShift SUCCESSIVEAPPROXIMATION

    II = II + 1End

    Else If LOBITS = 0 ThenBeginIf HIGHBITS = F16 Then // Run of 16 Zeros

    BeginWhile HIGHBITS >= 0 Do

    BeginIf COEFFICIENTS (II) 0 Then

    RefineAC (COEFFICENTS (II))Else

    HIGHBITS = HIGHBITS – 1End

    EndElse If HIGHBITS = 0 Then

    EOBRUN = 1Else

    EOBRUN = (1 LeftShift HIGHBITS) + ReadRawBits (HIGHBITS)End

    EndEnd

    Preparing to Create Progressive JPEG Files

    The biggest issue in implementing a progressive JPEG encoder is figuring outhow to specify what goes into each scan. For each scan, the encoder needs toidentify:

    The components to include (multiple components in DC bands only) The spectral range The successive approximation

    Requiring the user to specify all the parameters for each scan would betedious and error prone. A possible approach would be to predefine a set of scansequences that the user could select from. A method that we have found workswell is to define a default scan sequence for progressive scans and allow the userto make modifications if desired. For each scan, the user can assign the compo-nents, the last value in the spectral range, and the initial successive approxima-

    Algorithm 11.5Continued

    160

  • tion value. At the start of the encoding process, the spectral range of the earlierscans is used to determine the initial spectral range of later scans. If the user hasdefined the scans for a component with the last values in the spectral range set to

    0 5 20 63

    then the encoder automatically assigns these ranges0-0 1-5 6-20 21-63

    as the spectral bands for the scan.The encoder processes the scans in the order defined by the user. If spectral

    selection is specified for any of the scans, the decoder repeats the scan list andreduces the spectral selection by 1. It then outputs all the scans that contain spec-tral bands that still have to be refined. This process is intermediate in complex-ity. It gives the user many options for outputting progressive scans, but it does notallow the user to have complete control over scan content and ordering.

    An encoder should not be overzealous when it comes to breaking image datainto scans. Each scan in a progressive image should contribute meaningful datato it. Since the last AC coefficient is almost always zero, creating scans with aspectral range of 63 to 63 or even 61 to 63 is not very useful. The deeper into thezigzag order you go, the more coefficients should be included in a scan. As a gen-eral guideline, the larger the quantization values, the larger the number of coef-ficients per band.

    Successive approximation has even greater potential for abuse. Using spec-tral selection, a band can be divided into up to 14 bands, where the last 13 scanscontribute 1 bit to each coefficient in the band. With 8-bit sample data, DC coef-ficients require no more than 11 bits to encode (for 12-bit samples this numberis 15). Using a successive approximation value of 13 to encode 8-bit data makesno sense since many scans will contribute no data to the image.

    Because the magnitude of AC coefficients is generally much smaller thanthat of DC coefficients, using successive approximation to divide AC coefficientsinto a large number of scans makes even less sense. Reasonable successiveapproximation values are determined by the magnitude of the quantization val-ues and how far the band is into the zigzag order.

    If the encoder allows the user to specify the contents of the scans, it needs toperform several validations to ensure that the output file will contain a validJPEG image. In addition to the sequential validations, a progressive encodershould ensure that:

    The scans contain the entire spectral range for each component. The spectral bands for a component do not overlap. Spectral bands containing the DC coefficient do not contain any AC

    coefficients. The sucessive approximation value is in the range 0-13.

    Preparing to Create Progressive JPEG Files 161

  • Progressive JPEG

    Encoding Progressive Scans

    Encoding a progressive JPEG image is very similar to encoding a sequentialimage using multiple scans. As with progressive decoding, the significant differ-ences are in the encoding of data units.

    As you saw earlier in this chapter, there are four distinct types of scanswithin a progressive JPEG file. The JPEG encoder in Chapter 9 usedInterleavedPass() and Noninterleaved() to drive the creation of scans.These functions handled division of the image into MCUs and the processing ofrestart markers. They were implemented so that they took pointers to memberfunctions to do the actual data unit encoding. This may have looked odd withsequential images, but this same function is used to control progressive DC scansonce integrated with progressive JPEG code, thus allowing the same code todrive a total of three types of scans.

    Huffman Coding

    Progressive images may have up to four DC and four AC Huffman tables definedat any time. The progressive JPEG source code at the end of this chapter onlyuses a maximum of two of each type, just like the baseline sequential code pre-sented earlier. As before, two passes are made to encode each scan: the first isused to gather Huffman frequency statistics, and the second is used to encode andoutput the data. The same driver code is used for each pass, with pointers to func-tions controlling which of these two operations is performed for a given pass.

    Data Unit Encoding

    DC First ScansWith the exception of applying the point transform to the coefficient value, theencoding of DC coefficients in the first scan of a band is identical to the encod-ing of DC coefficients in a sequential scan. DC coefficients are encoded as thedifference between the current coefficient and the value of the last encoded valuefor the same component. The difference is encoded as a Huffman-encoded mag-nitude value followed by a number of literal bits.

    For simplicity, the example below shows only one last-DC-value variable.An actual implementation requires a separate variable for each component. Justas with sequential JPEG, the last DC coefficient value for each componentshould be set to zero at the start of the scan and whenever a restart marker isprocessed.

    Algorithm 11.6 illustrates the process for encoding a data unit in a first DCscan.

    162

  • Data Unit Encoding 163

    Algorithm 11.6Encoding DCCoefficients inInitial Scans

    Global SUCCESSIVEAPPROXIMATIONGlobal LAST_DC_VALUE

    Function CountBits (VALUE)BeginCOUNT = 0While VALUE 0 Do

    BeginCOUNT = COUNT + 1VALUE = VALUE RightShift 1End

    Fin

    Procedure EncodeDCFirst (COEFFICIENTS (0..63))Begin// Point TransformVALUE = COEFFICIENTS (0) RightShift SUCCESSIVEAPPROXIMATIONDIFFERENCE = VALUE – LAST_DC_VALUELAST_DC_VALUE = DIFFERENCEIf DIFFERENCE >= 0 Then

    BeginBITCOUNT = CountBits (DIFFERENCE)HuffmanEncode (BITCOUNT)OuputLiteralBits (BITCOUNT, DIFFERENCE)End

    ElseBeginBITCOUNT = CountBits (-DIFFERENCE)HuffmanEncodeDC (BITCOUNT)OuputLiteralBits (BITCOUNT, DIFFERENCE Xor FFFFFFFF16)End

    Fin

    Refining DC ScansEncoding refining DC coefficients for a data unit is trivial as shown in Algorithm11.7. A scan contains a single bit for refining a DC coefficient. No Huffman cod-ing is required.

    Algorithm 11.7Encoding DCCoefficients inRefining Scans

    Global SUCCESSIVEAPPROXIMATION

    Procedure EncodeDCRefine (COEFFICIENTS (0..63))BeginVALUE = (COEFFICIENTS (0) RightShift SUCCESSIVEAPPROXIMATION) And 1OutputLiteralBits (1, VALUE)End

  • Progressive JPEG

    AC First ScansAC coefficients are encoded using a sequence of Huffman-encoded bytes fol-lowed by a string of literal bits. The Huffman-encoded byte is divided into two 4-bit fields. If the 4 low-order bits are not zero, the code represents a nonzero lit-eral coefficient. The low-order bits are the magnitude of the code and specify thenumber of literal bits that follow. The 4 high-order bits specify the number ofzero-valued coefficients to skip before writing the nonzero coefficient.

    The byte code F016 means that the next 16 coefficient values are zero. Anyother value with the 4 low-order bits set to zero represents the magnitude of a runof bands that are all zero (see Table 11.2). The 4 high-order bits specify the num-ber of raw bits written to the output stream used to specify the exact EOB runlength.

    An encoder processes a data unit by starting at the first coefficient in theband and working to the last coefficient, proceeding in zigzag order. Onlynonzero coefficients are processed in AC encoding. The encoder needs to main-tain a count of the number of consecutive zero-valued coefficients within the dataunit and a count of blocks where all the coefficients are zero.

    Algorithm 11.8 illustrates how to encode a data unit in a progressive AC firstscan. The EncodeACRefine procedure calls the PrintEOBRun procedure to out-put any outstanding EOB runs right before encoding a literal coefficient. What isnot shown in this example is that PrintEOBRun must be called whenever arestart marker is output. EOB runs may not cross restart markers. PrintEOBRunmust also be called at the end of the data unit encoding to ensure that all EOBruns have been flushed to the output stream.

    Global SUCCESSIVEAPPROXIMATIONGlobal SSSGlobal SSEGlobal EOBRUN

    Procedure PrintEOBRunBeginIf EOBRUN = 0 Then

    ReturnBITCOUNT = CountBits (EOBRUN RightShift 1)HuffmanEncodeAC (BITCOUNT LeftShift 4)OutputLiteralBits (BITCOUNT, EOBRUN)End

    Procedure EncodeACFirst (COEFFICIENTS (0..63))BeginZERORUN = 0 // Number of sequential zero coefficients

    164

    (continued)

    Algorithm 11.8Encoding AC InitialAC Scans

  • For II = SSS To SSEBegin// Point TransformVALUE = COEFFICIENTS (II) / (1 LeftShift SUCCESSIVEAPPROXIMATION)If Value = 0 Then

    ZERORUN = ZERORUN + 1Else

    Begin// We have literal value so any EOB run started with a// previous block is over as well as a zero run within// this block.PrintEOBRun// The longest zero run we can put out with a literal// is 15. If we have a longer run then we need to// output the code for 16 zeros before we write// the literalWhile ZERORUN >= 16 Do

    BeginHuffmanEncodeAC (F016)ZERORUN = ZERORUN – 16End

    If VALUE > 0 ThenBeginBITCOUNT = CountBits (VALUE)HuffmanEncodeAC ((ZERORUN LeftShift 4) Or BITCOUNT)OutputLiteralBits (BITCOUNT, VALUE)End

    ElseBeginBITCOUNT = CountBits (-VALUE)HuffmanEncodeAC ((ZERORUN LeftShift 4) Or BITCOUNT)OutputLiteralBits (BITCOUNT, VALUE Xor FFFFFFFF16)End

    ZERORUN = 0End

    Fin

    // If the band ended in a run of zero then convert them// into an EOB run.If ZERORUN 0 Then

    BeginEOBRUN = EOBRUN + 1// Make sure we do not exceed the maximum EOB run.If EOBRUN = 7FFF16 Then

    PrintEOBRunEnd

    Fin

    Algorithm 11.8Continued

    Data Unit Encoding 165

  • Progressive JPEG

    Refining AC ScansRefining AC scans is one of the most complex processes in JPEG encoding. Theencoding process for refining AC scans follows the basic pattern used by the firstscan for a band. An important issue with refining scans is that Huffman-encodedvalue/literal bits sequences are used only to encode coefficients that the currentscan will make nonzero for the first time. Coefficients that were made nonzeroin a previous scan are not included in zero runs. Instead, any code that causes zerovalues to be skipped is followed by a refining bit for each of the interveningnonzero values.

    Algorithm 11.9 illustrates the process for encoding refining AC scans. TheRefineBand procedure contains most of the logic specific to refining scans. Itoutputs a refining bit for each nonzero code that is skipped.

    The codes used to encode the scans are identical to those used with the firstAC scan for a band. The only difference in the encoded data is that Huffman-encoded value/literal bits pairs are followed by a series of bits that refine eachcoefficient value skipped. If a zero or EOB run causes the coefficients from Mto N to be skipped, RefineBand (M, N) outputs the refining bits.

    The RefineEOBRun procedure for refining scans is slightly different thanthat for first scans because it has to refine all of the nonzero coefficients that areskipped. The skipping of coefficients because of EOB runs requires additionalglobal data. The example uses the variables RUNSTARTDATAUNIT andRUNSTARTCOEFFICIENT to mark the data unit and coefficient where the EOBrun began. (An EOB run can start in the middle of a data unit)

    The encoding of data units is also more complicated. Since the zero run isthe count of zero-valued coefficients, not the number of coefficients skipped inthe zero run, the code for processing zero runs of greater than 16 is a bit morecomplicated as well.

    Global DATAUNITS (0..*)(0..63)Global CURRENTDATAUNIT // Index of the current data unit in DATAUNITSGlobal EOBRUNGlobal RUNSTARTDATAUNITGlobal RUNSTARTCOEFFICIENTGlobal SSS // Spectral Selection StartGlobal SSE // Spectral Selection EndGlobal SUCCESSIVEAPPROXIMATION

    166

    (continued)

    Algorithm 11.9Encoding RefiningAC Scans

  • Procedure RefineBand (COEFFICIENTS (0..63), START, END)BegirtFor II = START To END

    BeginVALUE = COEFFICIENTS (II) / (1 LeftShift SUCCESSIVEAPPROXIMATION)If VALUE 0 Then

    BeginIf VALUE < 0 Then

    VALUE = – VALUEOutputLiteralBits (1, VALUE And 1)End

    EndEnd

    Procedure PrintEOBRunBeginIf EOBRUN = 0 Then

    ReturnBITCOUNT = CountBits (EOBRUN RightShift 1)HuffmanEncodeAC (BITCOUT LeftShift 4)OutputLiteralBits (BITCOUNT, EOBRUN)

    For II = 1 To EOBRUN DOBeginRefineBand (DATAUNITS (RUNSTARTDATAUNIT + II – 1),

    RUNSTARTCOEFFICIENT, SSE)RUNSTARTCOEFFICIENT = SSSEnd

    EOBRUN = 0End

    Procedure EncodeACRefine (COEFFICIENTS (0..63))BeginZERORUN = 0For II = SSS To SSE DO

    BeginVALUE = COEFFICIENTS (II) / (1 LeftShift SUCCESSIVEAPPROXIMATION)If VALUE = 0 Then

    BeginIf ZERORUN = 0 Then

    ZEROSTART = IIZERORUN = ZERORUN + 1End

    Else If Value = 1 Or Value = -1 ThenBeginPrintEOBRun

    Data Unit Encoding 167

    Algorithm 11.9Continued

    (continued

  • Progressive JPEG

    While ZERORUN >= 16 DoBeginZEROLIMIT = ZER0STARTZEROCOUNT = 0While ZEROCOUNT < 16 Do

    BeginOLDVALUE = COEFFICIENTS (ZEROLIMIT)

    / (1 LeftShift SUCCESSIVEAPPROXIMATION)If OLDVALUE = 0 Then

    ZEROCOUNT = ZEROCOUNT + 1ZEROLIMIT = ZEROLIMIT + 1End

    HuffmanEncodeAC (F016)RefineBand (COEFFICIENTS, ZEROSTART, ZEROLIMIT – 1)ZEROSTART = ZEROLIMITZERORUN = ZERORUN – 16End

    If VALUE > 0 ThenBeginBITCOUNT = CountBits (VALUE)HuffmanEncodeAC ((ZERORUN LEFTSHIFT 4) Or BITCOUNT)OutputLiteralBits (BITCOUNT, VALUE)End

    ElseBeginBITCOUNT = CountBits (-VALUE)HuffmanEncodeAC ((ZERORUN LEFTSHIFT 4) Or BITCOUNT)OutputLiteralBits (BITCOUNT, VALUE Xor FFFFFFFF16)End

    RefineBand (COEFFICIENTS, ZEROSTART, II – 1)ZERORUN = 0End

    Fin

    If ZERORUN 0 ThenBeginIf EOBRUN = 0 Then

    BeginRUNSTARTDATAUNIT = CURRENTDATAUNITRUNSTARTCOEFFICIENT = ZEROSTARTEnd

    EOBRUN = EOBRUN + 1If EOBRUN = 7FFF16 Then

    PrintEOBRunEnd

    Fin

    Algorithm 11.9Continued

    168

  • Conclusion 169

    Conclusion

    This chapter on progressive mode brings to a close our coverage of JPEG. In thisand preceding chapters, we have discussed all of the JPEG features that are incommon use. The material presented covers all of the JPEG modes that you arelikely to encounter in JPEG files on the Internet.

    The source code for this chapter is not a complete example, but rather theadditional code required to expand the sample applications from Chapters 8 and9 to include progressive encoding and decoding.

  • Chapitre 12

    GIF

    This chapter describes the CompuServe GIF format and the LZW compressionmethod used to compress image data in this format. Until recently, CompuServeGIF (Graphics Interchange Format) was the most widely used format for imagestorage.

    In 1987 CompuServe published the first GIF specification called GIF87a.This specification was freely distributed, and the format was adopted by practi-cally every image processing application. CompuServe later released anenhanced, upwardly compatible version of the standard known as GIF89a.However, most GIF images use only the features in GIF87a.

    The main features of GIF are:

    Up to 256 colors using 1 to 8 bits per pixel Multiple images per file

    Because of its better compression and greater color depth, JPEG has gen-erally replaced GIF for photographic images. GIF continues to be used forother applications, but legal entanglements have certainly condemned it toobsolescence.

    171

  • 172 GIF

    Byte Ordering

    The GIF format stores multi-byte integers with the least significant byte first (lit-tle-endian). Bit strings are read from the least significant bit to the most signifi-cant bit. In bit strings that cross byte boundaries, the bits in the second byte aremore significant than the bits in the first byte.

    File Structure

    Figure 12.1GIF File Structure

    A GIF file consists of a fixed area at the start of the file, followed by a variablenumber of blocks and ending with an image trailer. In the GIF87a format thevariable area consists solely of image definitions. In the GIF89a format these canbe either images or extension blocks. The general format of a GIF file is shownin Figure 12.1.

    GIF HeaderThe GIF header is required and must occur at the very start of the file. The headerallows an application to identify the format as GIF and determine the version.Table 12.1 shows the structure of the GIF header. It is still common for applica-tions that do not use the features added in GIF89a to create GIF87a headers inorder to insure the image is compatible with older decoders.

    Logical Screen DescriptorThe global screen description defines the logical screen area in which the indi-vidual images in the GIF file are displayed. It specifies the dimensions of the areaas well as the background color (selected from the global color table). The indi-vidual image descriptors specify where the image is to be placed within the log-ical screen. The screen descriptor structure is 7 bytes and is shown in Table 12.2.

    Figure 12.2 illustrates the relationship between the logical screen and theindividual images in a GIF file.

    Global Color TableThe individual images within the file can either use the global color table ordefine a color table of their own. Having the images in a file share the global

    Table 12.1GIF HeaderStructure

    Field NameSignatureVersion

    Size3 bytes3 bytes

    DescriptionMust be the ASCII string GIF.Must be the ASCII string 87a or 89b.

  • File Structure 173

    Figure 12.2LogicalScreen/ImageRelationship

    Images

    Logical Screen

    Field NameLogical Screen WidthLogical Screen HeightBit FieldsGlobal Color Table Size

    Color Table Sort Flag

    Bits Per PixelGlobal Color Table FlagBackground ColorPixel Aspect Ratio

    Size2 bytes2 bytes1 byteBits 0-2

    Bit 3

    Bits 4-6Bit 71 byte1 byte

    La description

    2(N+1) gives the number of entries inthe global color table.Set when the colors in the global colortable are sorted in order of importance.Bits per pixel minus 1.Set when there is a global color table.Index into the global color table.If this value is nonzero, the pixel widthand height are not equal. (N+15)/64 isthe pixel width divided by the pixelheight.

    Table 12.2Logical ScreenDescriptor Format

  • 174 GIF

    Field NameRedGreenBlue

    Size1 byte1 byte1 byte

    DescriptionRed component valueGreen component valueBlue component value

    Table 12.4Block Codes

    Figure 12.3Image DefinitionStructure

    color table reduces the file size and makes it easier for systems that can displaya limited number of colors. In addition, the global color table specifies the back-ground color for the logical screen.

    If the global color table bit is set in the screen descriptor, the global colortable immediately follows the screen descriptor. The global color table is anarray of structures whose format is shown in Table 12.3. The number of entriesin the array is determined from the Global Color Table Size field in thescreen descriptor.

    Block TypesAfter the global color table, the variable part of the GIF file begins. The file con-tains a sequence of blocks that are identified by a 1-byte code at the start of theblock. Table 12.4 lists the block types and the associated block code.

    TerminatorThe terminator block marks the end of the GIF file. This block is 1 byte long andconsists solely of the block code.

    Image BlockAn image block defines an image within the GIF file. The image starts with animage header structure that defines its size and placement. The format of thisstructure is given in Table 12.5. If the Local Color Table flag in the imageheader is set, the image uses a local color table rather than the global color table.Just like the global color table, the local color table is an array of color entries(Table 12.3). The number of elements in the array is determined from the LocalColor Table Size field in the header.

    Figure 12.3 shows the general layout of an image definition within aGIF file.

    Table 12.3Color Table EntryFormat

    Code21162C163B16

    TypeExtensionImage blockGIF terminator

  • File Structure 175

    Figure 12.4Extension BlockFormat

    The colortable (or the image header if there is no color table) is followed byone byte that contains the initial code size used in the compressed data. The valuein this field is usually 8.

    Data BlocksThe code size byte is immediately followed by an uninterrupted series of datablocks that contain the compressed image data. A data block consists of a 1-bytecount field, followed by 1 to 255 data bytes.

    The chain of data blocks for an image is always terminated by a data blockwith zero data bytesin other words, a single zero-valued byte. We will postponethe discussion of the compressed block format until later in the chapter.

    Extension BlocksExtension blocks were added to GIF in the GIF89a specification. The layout ofan extension block is shown in Figure 12.4. The first byte in the extension blockcontains the code 2116. This is followed by a second byte that contains a code thatspecifies the extension type. The extension type codes are listed in Table 12.6.

    The format of the extension header is specific to the extension type. The firstbyte in the header gives the size of the header not including the size byte. Theheader is followed by a list of data blocks. Each block consists of a count bytefollowed by 1 to 255 data bytes. The list of data blocks is terminated by a zerobyte (data block with zero data bytes).

    The structure of the extension blocks is such that an application can skip overthem without having to understand the structure of each individual extensiontype. Algorithm 12.1 illustrates how to skip over an extension.

    Table 12.5Image HeaderStructure

    Field NameLeft Position

    Right Position

    Image WidthImage HeightBit FieldLocal Color Table Size

    ReservedSort Flag

    Interlace FlagLocal Color Table Flag

    Size2 bytes

    2 bytes

    2 bytes2 bytes1 byteBits 0-2

    Bits 3-4Bit 5

    Bit 6Bit 7

    DescriptionLeft offset of the image within thelogical screen.Top offset of the image within thelogical screen.

    (N+1)2 is the number of entries in thelocal color table.Unused.If set, the colors in the local color tableare sorted in order of importance.If set, the image data is interlaced.If set the image uses a local color table.

  • 176 GIF

    Table 12.6Graphic ExtentionType Codes

    Algorithm 12.1Skipping GIFExtension Blocks

    DATA = ReadByte () // Extension TypeDATA = ReadByte () // CountWhile DATA 0

    BeginFor II = 1 To DATA

    ReadByte ()End

    Plain Text ExtensionThe plain text extension block is used to draw a grid of fixed-spaced text on thelogical screen. This extension consists of a header followed by a series of datablocks containing the text to draw. The data blocks following the header containthe text to display. The format of the plain text extension header is shown inTable 12.7.

    Graphic Control ExtensionThe graphic control extension affects how the next image in the GIF file is tobe drawn. Graphic control extensions are commonly used to specify how the

    Table 12.7Plain Text ExtensionFormat

    Code1F9FEFF

    TypePlain text extensionGraphic control extensionComment extensionApplication extension

    Field NameBlock SizeText Grid LeftText Grid RightText Grid WidthText Grid HeightCharacter Cell WidthCharacter Cell HeightText Foreground Color

    Text Background Color

    Size1 byte2 bytes2 bytes2 bytes2 bytes1 byte1 byte1 byte

    1 byte

    DescriptionThe value 12Text position in the logical screenText position in the logical screenSize of the text block in pixelsSize of the text block in pixelsWidth in pixels of each characterHeight in pixels of each characterIndex into the global color table for thetext colorIndex into the global color table for thebackground color

  • File Structure 177

    Table 12.8Graphic ControlExtension Header

    individual frames in a GIF animation are drawn. There can only be one graphiccontrol extension per image. This GIF extension block has no data, so a termi-nating data byte with a value of zero immediately follows the header. Thegraphic control extension header is shown in Table 12.8.

    Comment ExtensionThe comment extension allows an encoder to store information of any type withina GIF file. An application should not use a comment extension to store data thatit uses to control how the GIF stream is to be processed. The comment extensionhas no header, not even a count byte. The data blocks containing the commenttext immediately follow the extension type code.

    Application ExtensionAn encoder can use an application extension block to store data within a GIFstream that is specific to the application. Its purpose is similar to that of a com-ment extension except that it may be used to hold data that affects the decodingprocesses. The application-specific data is stored within the data blocks that fol-low the header. The format of the application header is shown in Table 12.9.

    Field NameBlock SizeBit FieldsTransparent Color FlagUser Input Flag

    Disposal Method

    ReservedDelay Time

    Transparent Color Index

    Size1 byte1 byteBit 0Bit 1

    Bits 2-4

    Bits 5-72 bytes

    1 byte

    DescriptionMust be 4.

    Set when the Transparent Color Index is used.When set, the application should wait for user input beforedisplaying the next image.Specifies what the decoder is to do after the image is displayed.0No action.1Leave the image In place.2Restore the background color.3Restore what was in place before the image was drawn.

    The amount of time the decoder should wait before continuingto process the stream in 1/100ths of a second.If Transparent Color Flag is set, pixels with this color valueare not written to the display.

  • 178 GIF

    Interlacing

    In general, GIF images are stored from top to bottom, left to right. However, ifthe interlace flag in the image header is set, the rows of pixel data are not trans-mitted in strict top-to-bottom order. Instead, the image is transmitted in fourpasses. The first pass contains the pixel data for every eighth row starting withthe topmost row (row zero). The remaining passes fill in the rows omitted in theprevious passes. Table 12.10 shows the rows included in each pass.

    The interlacing process serves the same function as progressive JPEG.When an image is transmitted across a network it allows the user to get an ideaof what it contains before all the image data is downloaded. Applications thatdisplay interlaced GIFs on the fly will usually fill in the missing rows in a passwith copies of the rows already received. On the first pass it will copy each roweight times, four times on the second pass, and once on the third pass. This givesthe effect of having the image fading in rather than having disconnected rows onthe screen.

    Table 12.10Interlacing RowInterlacing

    Pass1234

    Starting Row0 (Top row)421

    Interval8842

    Compressed Data Format

    The pixel data within a GIF image is compressed using a process known asLZW. LZW is what is known as a dictionary-based compression scheme. Bythat, we mean a compression method that maintains a list or dictionary ofsequences within the uncompressed data. During compression, when thesesequences occur in the uncompressed data they are replaced by a code that ref-

    Field NameBlock SizeApplication ID

    Authentication Code

    Size1 byte8 bytes

    3 bytes

    Description11An ASCII string that identifies the applicationthat created the blockThree bytes an application may use toauthenticate the block

    Table 12.9ApplicationExtension Header

  • erences the sequence in the dictionary. The longer the dictionary sequences areand the more frequently they are repeated in the uncompressed data, the greaterthe compression.

    The trick in dictionary-based compression is how to transmit the dictio-nary in the compressed data. The most common dictionary-based compressionschemes in use are based upon those described by Abraham Lempel and JacobZiv (1977 and 1978), and known as LZ77 and LZ78, respectively. LZ77 uses asliding window into the uncompressed data to implement the dictionary.1 LZ78builds a dictionary dynamically from the uncompressed data.

    GIF CompressionLZW is a variant of the LZ78 process that was described in a paper by TerryWelsh of Sperry (now Unisys) in 1984. CompuServe adopted it for use in theGIF format shortly afterwards. In the LZW method, the compressed datastream consists entirely of codes that identify strings in a dictionary. The dic-tionary is initialized so that it contains each possible data value as a predefinedstring. For example, if 8-bit data is being encoded, the dictionary initially con-tains 256 1-byte strings containing the values 0-255.

    The compression reads characters from the input stream and appends themto the current string until the current string no longer has a match in the dictio-nary. At that point it outputs the code for the longest matching string, adds thenonmatching string to the dictionary (the matching string plus one character),and finally starts a new string that contains the first nonmatching character. Eachtime a code is written to the output stream, an entry is added to the dictionary.

    Algorithm 12.2 illustrates how the dictionary is created in the LZW process.Here we are assuming that 8-bit data is being used and that the function Outputwrites 9-bit codes to the output stream.

    Figure 12.5 shows how a sample string would be compressed using the LZWprocess. The first column shows the string at each stage in the compressionprocess; the second column shows the value written to the output stream at eachstage; and the third column shows the new code that is created.

    This input string in Figure 12.5 consists of 27 characters. Using 8 bits percharacter, the string requires 216 bits. The LZW process uses 20 literal valuesand dictionary codes to represent the string, so 9 bits are required to encode eachLZW value, giving a total of 180 bitsa reduction of 17% in the compressedversion. It takes only 33 codes to encode the same string repeated twice (a 31%reduction) and 42 codes to encode it repeated three times (a 41% reduction). Themore repetition in the input stream, the greater the compression the LZWprocess gives.

    1The LZ77 method will be described in greater detail in the chapters on PNG.

    Compressed Data Format 179

  • 180 GIF

    Algorithm 12.2Simplified LZWCompression

    Global String DICTIONARY (0..511)Global NEXTCODE = 256

    Procedure InitializeBeginFor I = 0 To NEXTCODE – 1 DoDICTIONARY (I) = CHARACTER (I)End

    Function SearchDictionary (String SEARCH)BeginFor I = 0 To NEXTCODE – 1 Do

    BeginIf DICTIONARY (I) = SEARCH Then

    Return IEnd

    Return -1End

    Procedure Compress (String DATA)BeginInitializeLASTSTRING = NULLFor I = 1 To Length (DATA) Do

    BeginCurrentString = LASTSTRING + DATA (I)CODE = SearchDictionary (CURRENTSTRING)If CODE < 0 Then

    Begin// Me now have a string with no match in the// dictionary. Output the code for the longest// string with a match.CODE = SearchDictionary (LASTSTRING)Output (CODE)// Add the nonmatching string to the dictionary.DICTIONARY (NEXTCODE) = CURRENTSTRINGNEXTCODE = NEXTCODE + 1// Start a new string, beginning at the point// where we no longer had a match.LASTSTRING = DATA (I)End

    ElseBegin// The current string has a match in the dictionary.// Keep adding to it until there is no match.LASTSTRING = CURRENTSTRINGEnd

    End// Output what is left over.Output (SearchDictionary (LASTSTRING))End

  • Compressed Data Format 181

    Figure 12.5LZW CompressionExample

    A MAN A PLAN A CANAL PANAMAInputA MAN A PLAN A CANAL PANAMAMAN A PLAN A CANAL PANAMAMAN A PLAN A CANAL PANAMAAN A PLAN A CANAL PANAMAN A PLAN A CANAL PANAMAA PLAN A CANAL PANAMAA PLAN A CANAL PANAMAPLAN A CANAL PANAMALAN A CANAL PANAMAAN A CANAL PANAMAA CANAL PANAMACANAL PANAMACANAL PANAMAANAL PANAMAAL PANAMAL PANAMAPANAMAPANAMAANAMAMA

    Output

    UNE

    MAN

    256PL259261

    C259AL

    P269258

    New Code

    256='A'257='M'258='MA'259='AN'260='N'261='A'262='AP'263='PL'264='LA'265='AN'266='A'267='C'268='CA'269='ANA'270='AL'271='L'272='P'273='PA'274='ANAM'

    GIF DecompressionAn LZW decompressor reads one code at a time from the compressed streamwhile maintaining the dictionary in the same way the compressor does. Eachcode simply gets translated from the dictionary.

    The only subtlety in decompression is that it is possible for the compressedstream to contain codes that have not been defined in the dictionary. Figure 12.6contains a compression example that illustrates how this situation can occur.Notice that the code 259 is output at the same time the code is defined. A decoderprocessing this sequence would read 259 before code 259 is defined.

    This situation can only occur when the new dictionary string consists of thestring for the last code processed with the last character of the last code definedadded. In Figure 12.6, the last code output before 259 is 256, whose value is'AB'. The last character of the previous code (258) is 'A', so the dictionary stringfor 259 is 'AB' + 'A' = 'ABA'.

    Algorithm 12.3 illustrates the LZW expansion process.

  • 182 GIF

    Algorithm 12.3Simplified LZWExpansion

    Code SizesUp to now we have assumed that we are using 9 bits to represent dictionary codesand literal values in the compressed data stream. If we have an input stream largerthan 256 bytes we need to use 10 bits to represent each value. Larger inputstreams can require even larger code sizes. Since choosing a code size that is toolarge gives poor compression, how would you select a code size that is suitablefor most situations?

    The solution to this problem in LZW compression is to use varying codesizes. At the start of the compression process, each value is stored using the

    Figure 12.6LZW Example with aCode Used Before itis Defined

    Procedure ExpandBeginLASTCODE = InputCode ()Output (LASTCODE)While NOT EndOfStream DO

    BeginCODE = InputCode ()If CODE < NEXTCODE Then

    BeginOutput (Dictionary (CODE))Dictionary (NEXTCODE) = Dictionary (LASTCODE)

    + Dictionary (NEXTCODE -1)(1)NEXTCODE = NEXTCODE + 1LASTCODE = CODEEnd

    ElseBegin// Special Case where the code is used before it is defined.Dictionary (NEXTCODE) = Dictionary (LASTCODE)

    + Dictionary (LASTCODE)(1)NEXTCODE = NEXTCODE + 1Output (DICTIONARY (CODE))LASTCODE = CODEEnd

    EndEnd

    ABYABABAXInput

    ABYABABAXBYABABAXYABABAXABABAXABAXX

    Output

    ABY256259X

    New Code

    256='AB'257='BY'258='YA'259='ABA'260='ABAX'

  • Compressed Data Format 183

    Figure 12.7Tree Represenationof an LZWDictionary

    smallest possible number of bits (almost always 9). When the number of codesbecomes too large to be represented by the current code size, the code size isincreased by 1 bit. If the initial code size is 9, codes are output using 9 bits untilcode 512 is created. Then the code size is automatically bumped up to 10 bits.Likewise, after reaching 1024 it is increased to 11 bits. Twelve bits is the maxi-mum code size allowed by GIF. When the code values reach 212-1, GIF encodersand decoders stop adding to the dictionary.

    A GIF encoder can output a special clear code to instruct the decoder to resetthe dictionary to its initial state. An encoder may wish to output a clear codewhenever it detects a situation where compression would be improved by usinga new dictionary. An obvious situation where this might be desirable is after themaximum code sized is reached. Repeated clear codes are legal, but wasteful.

    The value for the clear code is not fixed, but rather depends upon the mini-mum code size that followed the image header. The minimum code size gives thenumber of bits used to encode the pixel data. Usually this value is 8 bits per pixel,but pixel values for images that do not use 256 colors can be compressed usingfewer bits.

    Table 12.11 shows the relationship of the minimum code size to the codeusage within a GIF image. The end code is used to mark the end of a compressedstream.

    Dictionary StructureIn the GIF compression and decompression examples we used an array of stringsto represent the dictionary. Normally the dictionary is represented as a tree struc-ture. Figure 12.7 shows how the strings generated from compressing the stringABABCABD can be arranged in a tree structure. A code can be translated to astring by starting at the tree node for the code and working to the tree root.

    Translating the code by walking the tree from a leaf to the root produces thestring in reverse order. In Figure 12.7, following the code 260 gives "DBA". Astack is used to put the string in the correct order. While walking the tree, eachtime a leaf node is reached, the corresponding character is pushed on the stack.When you reach the root of the tree, you pop characters off in the correct order.

    The tree can be represented as an array of structures. Since the maximumcode length is 12 bits, the maximum number of tree nodes is 212. The size of thestack is also 212 characters.

    Table 12.11GIF Code Usage

    Encoded Value Range0-2Minimum Code Size – 12Minimum Code Size

    2Minimum Code Size + 12Minimum Code Size + 2-212 – 1

    UsageLiteral codesClear codeEnd codeString codes

  • GIF

    Algorithm 12.4 illustrates how LZW decompression can be implementedusing a tree structure for a dictionary and using variable length codes.

    Global Structure DICTIONARYTREE Array (0..212-1)BeginByte CHARACTERInteger PARENTEnd

    Global STACK Array (1.. 212) Of ByteGlobal STACKPOINTERGlobal MINIMUMCODESIZE = 8Global CODESIZEGlobal ENDCODEGlobal CLEARCODEGlobal NEXTCODEGlobal FIRSTCHARACTER

    Procedure InitializeDictionaryBeginFor II = 0 TO 2CodeSize-1 DO

    BeginDICTIONARYTREE (II).CHARACTER = IIDICTIONARYTREE (II).PARENT = 0End

    STACKPOINTER = 0CODESIZE = MINIMUMCODESIZECLEARCODE = 2MINIMUMCODESIZE

    ENDCODE = CLEARCODE + 1NEXTCODE = ENDCODE + 1End

    Procedure OutputCode (CODE)Begin// Push the characters of the dictionary entry// on the stack in reverse order.Do

    BeginSTACK (STACKPOINTER) = DICTIONARYTREE (CODE).CHARACTERSTACKPOINTER = STACKPOINTER + 1CODE = DICTIONARYTREE (CODE).PARENTEnd

    While DICTIONARYTREE (CODE).PARENT 0

    // The tree structure makes it more difficult to// find the first character in the last code processed.// We remember it here.

    FIRSTCHARACTER = STACK (STACKPOINTER)

    184

    (continued)

    Algorithm 12.4GIF Expansion

  • Compressed Data Format

    // Pop the value off the stack in reverse order.While STACKPOINTER > 0 Do

    BeginOutput (STACK (STACKPOINTER))STACKPOINTER = STACKPOINTER – 1End

    Fin

    Procedure Expand (OUTPUT : String)BeginInitializeDictionary ()CODE = ReadBits (CODESIZE)While CODE = ClearCode Do

    CODE = ReadBits (CODESIZE)OutputCode (CODE)While TRUE Do

    BeginLASTCODE = CODEIf NEXTCODE >= 2CODESIZE And CODESIZE < 12 Then

    CODESIZE = CODESIZE + 1CODE = ReadBits (CODESIZE)If CODE = ENDCODE Then

    ReturnElse If CODE = CLEARCODE Then

    BeginInitializeDictionary ()CODE = ReadBits (CODESIZE)While CODE = CLEARCODE Do

    CODE = ReadBits (CODESIZE)If CODE = ENDCODE Then

    ReturnOutputCode (CODE)End

    Else If CODE < NEXTCODE ThenBeginOutputCode (CODE)DICTIONARYTREE (NEXTCODE).PARENT = LASTCODEDICTIONARYTREE (NEXTCODE).CHARACTER = FIRSTCHARACTERNEXTCODE = NEXTCODE + 1End

    ElseBegin// Special Case of an Undefined CodeDICTIONARYTREE (NEXTCODE).PARENT = LASTCODEDICTIONARYTREE (NEXTCODE).CHARACTER = FIRSTCHARACTERNEXTCODE = NEXTCODE + 1OutputCode (CODE)End

    EndEnd

    185

    Algorithm 12.4Continued

  • 186 GIF

    Animated GIF

    Unlike all of the other formats discussed in this book, GIF allows multipleimages to be stored in a single file. Web browsers have taken advantage of thiscapability to store simple animations. There is no official standard that describeshow multi-image GIF files should be displayed. Many image viewers will justdisplay the first image in a GIF file. However, among the major Web browsers,there is some consistency in how GIF animations are displayed.

    Figure 12.8 shows a series of four frames that compose a simple animationsequence. To store this as a GIF animation, the four images are encoded one afteranother in separate image blocks. Typically a Web browser will display eachimage in sequence as rapidly as it can. You can use a graphic control extension(Table 12.8) to control the amount of time between the individual frames. Agraphic control extension only affects the image that immediately follows it, soyou will probably need one for each image in the animation sequence.

    Most applications will only play an animation once. To create an animationthat repeats in a loop you have to include a special application extension (Table12.9), whose format is shown in Table 12.12. Although the application ID is"NETSCAPE," this only reflects the creator of the block. Other Web browsersrecognize this block format as well.

    The Repeat Count field in the application extension gives the number oftimes the animation is played. A value of zero specifies that the animation isrepeated indefinitely. Since displaying images can take up a significant amount

    Figure 12.8Sample GIFAnimation

    Table 12.12Loop ApplicationExtension Format

    Field NameBlock SizeApplication IDAuthentication CodeBlock SizeExtension TypeRepeat CountTerminator

    Size1 byte8 bytes3 bytes1 byte1 byte2 bytes1 byte

    Description11"NETSCAPE""2.0"31Number of times the animation should repeat0

  • Legal Problems 187

    Figure 12.9An Animation ThatPartially Updatesthe Logical Screen

    of CPU time on home computers, especially those with no delays, it is generallynot a good idea to have animations loop forever.

    The logical screen descriptor (Table 12.2) specifies the dimensions of theanimation. Each image does not have to update the entire logical screen. Thedimensions and position fields in the image header (Table 12.5) can specify aregion within the logical screen for an image to update. Not updating the screenin every image can reduce the size of the GIF file significantly.

    Figure 12.9 shows how the previous animation can be set up so that it onlyupdates part of the logical screen.

    Legal Problems

    Welsh did not mention in his 1984 paper, "IEEE Computer," that he had filed fora patent in 1983 on the process he described. The patent was subsequentlygranted (4,558,302) in 1985 and assigned to Sperry (now Unisys). For whateverreason (software patents were new and untested legal territory at the time),CompuServe never checked to see if LZW was patented when it issued the GIFspecification.

    Over time the use of GIF grew and the patent issues remained theoreticaluntil 1994, when Unisys started to demand licensing fees for LZW from GIFusers. This made the use of GIF impossible in freely distributed applications.

    Solving the GIF patent issue is not simply an issue of getting a license fromUnisys. There are numerous other patents related to LZ compression that animplementation could infringe upon. A notable GIF-related patent was awardedto Victor Miller and Mark Wegman of IBM in 1989 (4,814,726). Their patentdescribes a process that is almost identical to the LZW process. As a result of thepatent situation, anyone using GIF in any type of application needs the advice ofan attorney.

    The fundamental problem is that the LZW process itself is a derivative ofearlier work. With the patent database full of LZ derivations, it is impossible todetermine what patents an innovative GIF implementation might infringe upon.

  • GIF

    Worse than simply their numbers, these patents are written in legalese, makingthem difficult to understand. With the legal mess surrounding GIF, the best solu-tion for developers is to use other formats. Unlike JPEG, the patent situation withGIF is unlikely to be satisfactorily resolved.

    Uncompressed GIF

    It is the LZW process, not GIF, that is covered by patents, and it is entirely pos-sible to create GIF files that do not use LZW compression. The easiest way toimplement a GIF encoder without using LZW is to simply encode each data byteusing 9 bits and output a clear code after every 254 codes. While this gives neg-ative compression, a conforming GIF decoder will correctly interpret the data.

    Other GIF-compatible compression schemes have been implemented. It ispossible to count runs of the same pixel value or alternating pixel values andcompress them in the GIF format. What is unknown is when these methods crossthe bounds of any of the various patents.

    Conclusion

    The GIF format was the first image format to be universally accepted.Unfortunately, legal problems have ended GIF development. Unlike other majorgraphics formats, no enhancements to GIF are under way. This, coupled withinherent limitations compared to other formats, has made it essentially dead.

    Nelson (1992) gives a excellent introduction to the LZ compression methodsand their general implementation. The LZ77, LZ78, and LZW algorithms wereoriginally described in Ziv (1977, 1978) and Welsh (1984).

    We can tell you how GIF works. Unfortunately, we cannot show you as well.Because of the patent issue, there is no GIF source code on the accompanyingCD. Simply writing about GIF in a book requires involving lawyers in theprocess. Shakespeare was right.

    188

  • PNG

    Portable Network Graphics (PNG) is the newest graphics format in this book. Itis just now starting to receive widespread use by the Internet community, and isalready supported by any image viewing application worth using.

    The PNG format uses a lossless compression process and supports the fol-lowing:

    Up to 48 bits per pixel in color images 1-, 2-, 4-, 8-, and 16-bit sample precision Alpha channel for full control of transparency Sophisticated color matching

    Because of the legal issues surrounding the use of GIF, PNG should now beused instead of GIF in those applications where JPEG is not a suitable alterna-tive. In situations where you need lossless compression for 24-bit images, suchas the intermediate format for images that are repeatedly edited, PNG is muchbetter suited than JPEG.

    189

    Chapter 13

  • 190 PNG

    In this chapter we are going to cover the structure of PNG files and the for-mat of the individual chunks that are defined by the PNG standard. In the twofollowing chapters we will cover the format of the compressed data and how toread and write PNG files.

    When Unisys began demanding license fees from users of GIF it became impos-sible to use GIF in many situations, notably in free software. After Unisys'saction, the JPEG format quickly replaced GIF for photographic images.However, JPEG does not compress certain types of images well, so it could notreplace GIF for all applications.

    Thomas Boutell organized what was to be called the PNG DevelopmentGroup, started within days of Unisys's announcement that they would demandlicenses for GIF usage. Development of the PNG standard proceeded at a rapidpace, with several drafts issued to the public. The final version was released onOctober 1, 1996, just over a year and half after the project began.

    The PNG format stores multi-byte integers with the most significant byte first(big-endian). Bit strings are read from the least to the most significant bit.When a bit string crosses a byte boundary the bits in the second byte are themost significant.

    Huffman codes within compressed data are stored with the code bits inreverse order. The most significant bits of the Huffman code are stored in theleast significant bits of the data bytes.

    Format de fichier

    A PNG file is organized into a sequence of blocks referred to as chunks in thePNG standard. Chunk types are defined by three sources. Some are defined bythe PNG standard; these are the most important chunks a decoder has to dealwith. The PNG Development Group also maintains a list of registered publicchunk types. If someone creates a chunk type that may be of use to the generalpublic, the creator can submit it to be considered for addition. Finally, there areprivate chunks defined by applications.

    Chunks follow the format shown in Table 13.1. This format allows a decoderto skip over those chunks that it does not know how to process and those that the

    Byte Ordering

    L'histoire

  • File Format 191

    Table 13.1PNG Chunk Format

    FieldLength

    TypeDataCRC

    Size4 bytes

    4 bytesLength bytes4 bytes

    DescriptionNumber of bytes in the Data field 0-2,147,483,647(231-1) bytes.1Chunk name.Chunk data. The format depends upon the chunk type.CRC-32 value calculated from the data.

    implementers feel are not important enough to implement. Being able to ignoreunknown chunks is essential because decoders need to be able to process filescontaining private chunks created by other applications and new public chunks.

    Chunk NamingPNG chunks are given unique names that consist of four ASCII letters. The first,second, and last characters in a chunk type can be either upper or lower case. Thecase used for these characters follows the convention described below, whichallows a decoder to determine information about the chunk from the name alone.

    The third character in the chunk name must be in upper case. If an appli-cation encounters a chunk type containing any values other than ASCII letters,it should consider it invalid. Figure 13.1 shows some sample chunk names andtheir meanings.

    Critical ChunksIf the first character of the chunk type is in upper case (bit 5 clear), the chunk isreferred to as critical. A critical chunk is one that the decoder absolutely mustprocess in order to decode the image. If a decoder encounters a critical chunk thatit does not recognize, it should report an error. The PNG standard only definesfour critical chunks: IHDR, PLTE, IDAT, and IEND.

    Public and Private ChunksThe second character in the chunk type is upper case for chunks that are publiclydefined. Public chunks types include all those defined by the PNG standard aswell as additional chunk types that are registered with the PNG DevelopmentGroup.

    Applications can create private chunks of their own to store data that is spe-cific to the application. These should have the second character in lower case toensure that they do not conflict with publicly defined chunks.

    1The size limitations in the PNG specification take into account programming languages that cannot han-dle unsigned integers.

  • 192 PNG

    Figure 13.1Sample PNG ChunkNames

    IHDRgAMApHYsapPxA1PXApPxapPXaaaX

    Critical, public, unsafe to copyNoncritical, public, unsafe to copyNoncritical, public, safe to copyNoncritical, private, safe to copyInvalidCritical, private, safe to copyNoncritical, private, unsafe to copyInvalid

    Safe-to-Copy ChunksThe last character in the chunk type should be in lower case if the chunk is safeto copy and in upper case if it is not. An application should not copy chunks thatit does not recognize if the fourth character in the chunk type is in upper case.

    Suppose you are developing a PNG editor application that automatically putsa border and logo on an image and then saves the image to a new file. If the appli-cation encounters an unknown chunk it has two choices: it could pass that chunkon to the output file without making any modifications to it or discard it.

    If the editing application encountered a private chunk created by an archiv-ing program that stored indexing information within the image (e.g., subject,date, and photographer), it would not know how to interpret the information withit. However, copying the chunk to the output file would produce perfectly validresults. Such a private chunk would probably be made a safe-to-copy chunk.

    On the other hand, suppose editor encountered a private chunk that containedinformation on the usage of colors with the image. After the border was added,the information in the chunk would no longer be valid. Such a private chunkshould be unsafe to copy.

    Cyclic Redundancy CheckEach PNG chunk contains a 32-bit CRC (Cyclic Redundancy Check) value thathas been calculated from the chunk type code and chunk data. The CRC is amathematical function that is commonly used in networking software to ensurethat data in a network packet has been received correctly. Before sending a datapacket the transmitter applies the CRC function to the data and then appends theCRC value to the packet. The receiver of the packet applies the CRC function to

    While it is legal for an application to create private critical chunks,using such chunks will most likely make the images unreadable byother applications.

    CriticalPublic Reserved

    Safe to Copy

  • the data and then compares the calculated CRC value to the value in the packet.If the two values are not the same, the receiver can send a negative acknowledg-ment to the transmitter to request that the packet be resent.

    The PNG file format applies the CRC function to each chunk so thatdecoders can verify that the data in the chunk has not been corrupted since thefile was created. A decoder should calculate the CRC value for every chunk ina PNG file and ensure that the calculated value matches the CRC value storedin the chunk. A chunk where these two values do not match should be consid-ered invalid.

    The CRC function is based upon performing modulo 2 polynomial divisionon the input data, where each bit in the input stream is treated as a coefficient ina giant polynomial. The CRC function value is the remainder from the divisionoperation. The choice of the polynomial determines the type of bit errors that canbe detected. Both 16-bit and 32-bit CRC functions are in common use. A 32-bitCRC can detect more errors in larger packet sizes. PNG uses the 32-bit version,which is known as CRC-32. The polynomial used by PNG is

    x32

    + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1

    which is essentially the value

    1 0000 0100 1100 0001 0001 1101 1011 01112

    Software implementations of the CRC function invariably use a table lookupto calculate the CRC function. As each byte in the input stream is processed, avalue known as the CRC register is updated using a value in the table. In the CRCfunction used by PNG, the CRC register is initialized with all bits set to 1. Afterthe last byte has been processed the final CRC value is the 1s-complement of thevalue in the CRC register.

    The CRC process used by PNG, when implemented using a lookup table,looks like this.

    unsigned long CrcRegister ;void CrcByte (unsigned char data)

    unsigned int index = (CrcRegister ^ data) & 0xFF ;CrcRegister = CrcTable (index) ^ ((CrcRegister >> 8) & 0X00FFFFFF) ;return ;

    unsigned long Crc (unsigned char buffer (), unsigned int length)

    CrcRegister = 0xFFFFFFFFL ;for (unsigned int ii =0 ; ii < length ; ++ ii)CrcByte (buffer (ii)) ;return ~CrcRegister ;

    193File Format

  • PNG

    unsigned long CrcTable (256) ;void MakeCrcTable ()

    for (unsigned int ii = 0 ; ii < 256 ; ++ ii)CrcTable (ii) = ii ;for (unsigned int jj = 0 ; jj < 8 ; ++ jj)

    if ((CrcTable (ii) & 0x1) == 0)CrcTable (ii) >>= 1 ;

    elseCrcTable (ii) = 0xEDB88320L ^ (CrcTable (ii) >> 1) ;

    return ;

    The mathematics of the CRC process is outside the scope of thisbook. However, to give you some idea of how it is done, comparethe constant

    EDB8832016 = 1110 1101 1011 1000 1000 0011 0010 00002used in generating the CRC lookup table to the value that we saidwas equivalent to the CRC. If you take this value, reverse the orderof the bits, then prepend a 1-bit, you have the CRC-32 polynomialvalue

    1 0000 0100 1100 0001 0001 1101 1011 01112.

    Chunk ProcessingMost PNG decoders will probably be implemented with a common functionfor reading the chunk data into memory. This common process would followthese steps:

    1. Read the chunk data size.2. Read and save the chunk type.3. If the chunk data size is larger than the data buffer, allocate a larger buffer.4. Read the chunk data.5. Calculate the CRC value of the chunk data.6. Read the chunk CRC from the file.

    194

    Before making any CRC calculations, the lookup table containing precalcu-lated values for each possible byte integer value needs to be initialized using afunction like this.

  • Color Representation in PNG 195

    7. Compare the calculated CRC to the CRC read from the file. If they are notthe same, the chunk is invalid.

    After the last step the decoder can call a function to handle the specificchunk type.

    File Organization

    Figure 13.2 shows the general organization of a PNG file. A PNG file must startwith a PNG signature followed by an IHDR chunk and end with an IEND chunk.The ordering of the other chunks within a file is somewhat flexible. The orderingrestrictions are covered in the discussions of chunk formats.

    The PNG signature consists of 8 bytes that must have the values 137, 80, 78,71, 13, 10, 26, and 10. These are the ASCII values 137, P, N, G, ,, , and . There is a bit of hidden logic in usingthese values in the signature. Other than the obvious device of including thestring "PNG" to identify the format, most of the reasoning is rather subtle.

    On Unix, a character is used to separate records in a text file.In MS-DOS, records are separated by a pair. Many filetransfer programs can operate in either binary or text mode. In binary mode theseapplications make a byte-to-byte copy, but in text mode they replace characters with pairs when going from Unix toDOS and replace pairs with characterswhen going from DOS to Unix. If a PNG file is transferred employing text modeusing one of these programs, either or willbe corrupted, so a decoder will have to go no further than the signature to knowit has a bad file.

    The first byte in the signature is not a displayable ASCII value, making it lesslikely that a decoder will confuse a text file with a PNG file. If you accidentallytype a PNG file at the DOS command line, the in the header stops itfrom printing beyond the signature.

    Figure 13.2PNG FileOrganization

    Color Representation in PNG

    The PNG format supports five different color types or methods for representingthe color of pixels within an image. The method for representing color is speci-fied in the file's IHDR chunk.

  • 196 PNG

    RGB TripleLike BMP, PNG can represent colors as an RGB triple. Each pixel is representedby three component values of either 8 or 16 bits. The components are stored inred, green, blue order (the opposite of BMP). RGB triples may only be used whenthe bit depth is 8 or 16 bits.

    PalettePNG images can also use a color palette in the same way BMP and GIF do. Thesize of the palette depends upon the sample precision. Images that use a palettemust contain a PLTE chunk that defines the palette. Palettes may only be usedwhen the bit depth is 1, 2, 4, or 8 bits.

    GrayscaleIn the grayscale color type there is one component per image, and it representsthe relative intensity of the pixel. The grayscale color type can be used for allPNG bit depths. On most systems a decoder will need to create a palette to dis-play grayscale images using a process like the one shown in Algorithm 13.1.

    Algorithm 13.1Grayscale PaletteCreation

    MAXPIXELVALUE = 2DISPLAYBITDEPTH-1

    For II = 0 To MAXPIXEVALUE DoBeginPALETTE (II).RED = IIPALETTE (II).GREEN = IIPALETTE (II).BLUE = IIEnd

    RGB with Alpha ChannelPNG images support the use of an Alpha channel to control the transparency ofthe image. The Alpha channel allows an image to be combined with its back-ground. Each pixel value has an additional Alpha value whose size in bits is thesame as the image bit depth. The RGB with Alpha color type can only be usedwith bit depths of 8 and 16.

    An Alpha value of zero means that the pixel is fully transparent, in whichcase the background shows completely through. A value of 2Image Bit Depth-1 isfully opaque, which means that the background is completely covered by theimage. When the Alpha channel has an intermediate value, the pixel is mergedwith the background using the process in Algorithm 13.2.

  • MAXPIXELVALUE = (1 LeftShift BITDEPTH) – 1OUTPUT.RED = (ALPHA * IMAGEVALUE.RED

    + (MAXPIXELVALUE – ALPHA) * BACKGROUND.RED) / MAXPIXELVALUEOUTPUT.GREEN = (ALPHA * IMAGEVALUE.GREEN

    + (MAXPIXELVALUE – ALPHA) * BACKGROUND.GREEN) / MAXPIXELVALUEOUTPUT.BLUE = (ALPHA * IMAGEVALUE.BLUE

    + (MAXPIXELVALUE – ALPHA) * BACKGROUND.BLUE) / MAXPIXELVALUE

    Algorithm 13.2Alpha ChannelMerging

    Grayscale with Alpha ChannelAn Alpha channel can also be used with grayscale images, but the image bitdepth is restricted to 8 or 16. Each pixel using this color type is represented usingtwo values containing the same number of bits, with the Alpha value followingthe pixel intensity value. The merging process for this color type is the same asfor RGB with Alpha except that there is only one image color component.

    Device-Independent Color

    All of the colorspaces we have dealt with until now have been relative colorspaces where color components have values from 0 to 2N-1, where N is the num-ber of bits used to represent a component. Zero represents the minimum compo-nent value for the device, and 2N-1 is the maximum value. Suppose that youworked for Coca-Cola and needed labels printed for bottles with the backgroundcolor the same bright red used on all Coke bottles. If you told the printer youwanted the color to be (230,0,0) using the RGB colorspace with a sample preci-sion of 8 bits, the color of the labels would depend upon the printing equipment.What you really need is a method to specify the absolute color.

    The CIE 1931 standard (CIE stands for Committee Internationale deL'Eclairage, International Lighting Committee) has served exactly that purposein photography, printing, and film since 1931. It uses three components that areusually designated XYZ. The Y component represents luminance, as it does inthe YCbCr colorspace; the X and Z components represent chrominance. Theseare analogous to the Cb and Cr components, but the implementation is different.

    If an application knows the XYZ color values for red, blue, green, andwhite for the device where the image was originally created, it is possible toconvert the RGB values in the image to XYZ values giving absolute color. Si

    Device-Independent Color 197

  • PNG

    these color values are known for the display image as well, it is possible to con-vert them to the corresponding RGB values for the display device so that theimage will be displayed using the original colors. This assumes that the displaydevice can actually display all of the XYZ colors in the imagesomething thatis not always the case.

    To make things even more confusing, the makers of monitors list the colorsusing a colorspace related to XYZ, known as xyY, which is a projection of theXYZ colorspace into two dimensions. The relationship between the xyY andXYZ colorspace is

    Equation 13.1xyY/XYZ ColorspaceConversion

    Projecting the XYZ colorspace into two dimensions allows all possible col-ors to be represented on a piece of paper, although with the loss of luminance orbrightness possibilities.

    Figure 13.3 illustrates the xyY colorspace. The black triangle in the centerrepresents the gamut, or range, of colors that can be displayed by a typical com-puter monitor. This triangular range is only a small subset of the visible colors.The gamut available to a particular device can vary substantially. For example,the gamut that can be displayed by the typical desktop color printer is a bit dif-ferent than that for a computer monitor.

    One of the reasons colors are not generally encoded within image files usingXYZ colorspace is that it requires much more precision in the data. Using RGB,the data precision is only such that the possible colors for the device can be rep-resented. Since XYZ covers all possible visible colors rather than the small sub-set a device can show, it requires more data bits to encode. The solution used byPNG is to encode values using RGB and create chunks that allow a decoder todetermine the colors that were actually used at the source.

    When I contacted the manufacturer of my monitors, the information theygave me was:

    Vendors of monitors for personal computers rarely include themonitors' XYZ values with the documentation. They generally havethis information available if you ask them directly.

    198

  • Device-Independent Color 199

    Figure 13.3xyY Colorspace

    Display devices are normally built so that the white point falls on or close toa set of data points known as the black body curve. 9300K is a standard whitepoint value for a computer monitor that has an xy value of (0.285, 0.293). TheY value is implicitly 1.0. The white point in the XYZ colorspace is, then,

    X = 0.973Y = 1Z = 1.440

    for both monitors.

    Monitor 2

    RedGreenBlue

    x.625.280.155

    y.340.595.070

    White Point: 9300K

    Monitor 1

    RedGreenBlue

    x.612.293.149

    y.353.595.068

    White Point: 9300K

  • 200 PNG

    Equation 13.2RGB to XYZConversion

    The conversion from RGB to XYZ is a matrix multiplication of the form

    where CR, CG, and CB are constants and the RGB values have been scaled to therange 0-1.0. The trick now is to find the values of these three constants.

    If we substitute the xy values for the first monitor into Equation 13.2 we get

    Equation 13.3

    We already calculated the white point XYZ coordinate and we know that theRGB value of the white point is (1.0, 1.0, 1.0). If we substitute that into the pre-vious equation we get

    Equation 13.4

    Equation 73.5

    This is a set of three linear equations with three unknown variables. We cannow solve for CR, CG, and CB, which gives (0.698, 1.094, 1.512). This makes thetransform from RGB to XYZ for this monitor

    Equation 73.6

    To convert from XYZ back to RGB you have to invert the transform matrix.The easiest method is to use a Gaussian elimination. Row reduction and matrixinversions are beyond the scope of this book, but you will find an explanation ofthese techniques in any book on linear algebra, such as Anton (1981). This is theinverse function for the first monitor.

    Equation 13.7

    which can be factored into

  • Gamma 201

    Gamma

    Equation 13.8GammaCombination

    The color models used with image files assume that there is a linear relationshipbetween a component value and the color that appears on the screen. In reality,the display devices in use do not tend to respond in a linear manner to the inputsupplied to them. Gamma approximates the nonlinear behavior of these devices.It is simply a power function:

    (x) = xwhere0 x 1

    > 0

    Adjusting the Gamma of an image can be used in conjunction with convert-ing to the XYZ colorspace or on its own. Gamma adjustments have a greatereffect on the appearance of an image on a computer monitor than does conver-sion to XYZ and back.

    The effect of the Gamma function is to make component values generallydarker or lighter. Gamma values greater than 1 make the image darker and thoseless than 1 make it lighter. Notice that the domain and range of the Gamma func-tion are the same. If the input to the function is between 0 and 1 the output willalways be in that range as well.

    The Gamma value for a display system is the combined Gamma of thecomponents.

    In other words, the Gamma value for a display system is the product of theGamma values for all of its components.

    The gAMA chunk allows an encoder to store the Gamma value for a systemused to create the image. The implementer of a PNG decoder is faced with twoissues:

    What Gamma values should be used to view the image?

    What is the Gamma of the system being used to display the image?

    There is really no way for an application to determine what the viewingGamma should be. In a well-lighted room it should probably be around 1.0. In adark room it should be higher, around 1.5 or so. The problem here is that, unlessthe decoding software has access to a light sensor, there is no way for it to deter-mine this. The best choice is either to allow the user to input a Gamma or to use

  • 202 PNG

    Equation 13.9

    a viewing Gamma of 1.0. Many image display programs allow the user to adjustthe viewing Gamma after the image is displayed to get the best results.

    Some high-end display systems allow the system's Gamma to be queried oreven adjusted through software. If you are writing software for personal comput-ers, you should assume that the Gamma for all of the components other than themonitor is 1. The PNG standard recommends a Gamma value of 2.5 for monitorsif the exact value is unknown.2 Most monitor vendors have this information avail-able even though it is not in the printed documentation. The manufacturer saysthe Gamma value for the monitors in the previous section is 1.8.

    Unless you have a high-end display system, a PNG viewing application can-not change the Gamma. If the PNG file contains a gAMA chunk giving theGamma value for the image, that value is fixed as well. Since the total Gammaof the display system is

    Desired Viewing Gamma = Application Gamma Display Gamma File Gamma

    an application can adjust it by adjusting the pixel values. The Gamma correctionthe application should use is, then

    Application Gamma =

    Applications should only apply Gamma correction to color components. Gammacorrection is not applied to the Alpha channel.

    Equation 13.10

    Interlacing

    Just as in GIF, PNG supports interlacing of images. The interlacing method usedin the current PNG standard is called Adam 7.3 Other interlacing methods maybe added in the future, but this is the only one supported now. Adam 7 interlacesthe image by pixels rather than by rows. It divides the image into 8 8 pixelblocks and updates it over seven passes. The Adam 7 interlace pattern is shownin Figure 13.4.

    Adam 7 is considerably more difficult to implement than GIF's row inter-lacing. Fortunately the pattern contains regular intervals that can be exploited bya decoder. Figure 13.5 shows how a decoder would display an 8×8 block of pix-els on the fly using the Adam 7 pattern. This illustration makes the regular pat-tern of the sequence more clear.

    2The next version of the PNG standard is expected to recommend a value of 2.2 in order to be compatiblewith the sRGB standard.3After the pattern's creator Adam M. Costello.

  • Critical Chunks 203

    Figure 13.4Adam 7 InterlacePattern

    17573757

    67676767

    47574757

    67676767

    27573757

    67676767

    47574757

    67676767

    Figure 13.5Adam 7 InterlaceDisplay

    Critical Chunks

    The PNG standard defines four critical chunks (IHDR, PLTE, IDAT, and IEND).Most PNG files need not contain any chunks other than these. The IHDR, IDAT,and IEND chunks must appear in every PNG file. For the critical chunks we havelisted the general steps a decoder should take to process the chunk.

    IHDREvery PNG file contains one IHDR chunk, which must immediately follow thePNG signature. The IHDR block specifies the image dimensions, bit depth, and

  • 204 PNG

    color type. The structure of the data block within an IHDR chunk is shown inTable 13.2. The length of the IHDR chunk data is 13 bytes. Decoders should con-sider any other length invalid.

    A decoder needs to ensure that the bit depth and color type combination isvalid. As discussed earlier, not every combination of bit depth and color type islegal. RGB and color types with Alpha channel are only valid with bit depths of8 and 16; palette color types are invalid when the bit depth is 16.

    The Compression Method and Filter Method fields are for futureextensions of the PNG standard. Currently the only compression method sup-ported is Deflate with a 32K-byte or smaller sliding window, and only one filter-ing method is defined. A decoder should ensure that these values are zero.

    To process this chunk a decoder should

    Ensure that no other chunks have been processed. Validate the following:

    – Chunk data length should be 13.- Compression method and filter method should be 0.- Interlacing method should be 0 or 1.- Color type must be 0, 2, 3, 4, or 6.- Sample precision must be 1, 2, 4, 8, or 16.- Sample precision and color type must be consistent.- Storage for the image buffer and palette has been allocated.

    Table 13.2IHDR Data Format

    Field NameWidthHeightBit DepthColor Type

    Compression MethodFilter MethodInterlace Method

    Field Size4 bytes4 byte1 byte1 byte

    1 byte1 byte1 byte

    DescriptionImage width in pixels.Image height in pixels.Sample precision (1, 2, 4, 8, or 16).Method for interpreting image data:0Grayscale image.2RGB triple.3Palette.4Grayscale with Alpha channel.6RGB with Alpha channel.Must be zero.Must be zero.0The image is not interlaced.1Adam 7 interlacing.

  • PLTEThe PLTE chunk defines a color palette for the image. There may only be onePLTE chunk per image and it must occur before the first IDAT chunk. When theimage's color type is palette the file must contain a PLTE chunk. The data withinthe PLTE chunk is an array of palette entry structures shown in Table 13.3. Thenumber of palette entries is the number of data bytes in the PLTE chunk dividedby 3, and is limited to a maximum of 2Bit Depth. The palette can contain fewerentries than the maximum allowed by the bit depth. The color values in the PLTEchunk are in the range 0-255 no matter what the image bit depth or color type is.

    When the color type is RGB or RGB with Alpha channel, the PLTE chunkis optional. An encoder may include a palette for images with these color typesin order to provide a recommended palette to use if the image needs to be quan-tized to 256 colors. A PLTE chunk is legal in this situation even when the bitdepth is 16 bits. However, its component values are 8 bits no matter what the bitdepth is. Grayscale images may not contain a PLTE chunk.

    To process this chunk a decoder should

    Make sure that no other PLTE chunk has been processed. Ensure that the image color type is not grayscale or grayscale with Alpha. Validate chunk data:

    – The number of data bytes in a PLTE is a multiple of 3.- The number of palette entries is not greater than 2Bit Depth.- The number of palette entries is not greater than 256.

    Store the chunk's RGB values in the palette used to decode the chunk.

    IDATIDAT chunks contain the compressed image data. All of the IDAT chunks withina PNG file must be consecutive with no intervening chunks. The IDAT blocksmay occur anywhere after the IHDR block and before the IEND block. If thePNG file contains a PLTE block, the IDAT blocks must come after it. The size ofthe block data in an IDAT block can be in the range 0 to 231 – 1. The usual num-ber of data bytes within an IDAT block is between 4K and 16K bytes. Of course,

    Table 13.3Palette EntryStructure Format

    Field NameRedGreenBlue

    Size1 byte1 byte1 byte

    DescriptionRed component intensity value (0-255)Green component intensity value (0-255)Blue component intensity value (0-255)

    Critical Chunks 205

  • PNG

    the last IDAT block in a chain may have substantially fewer bytes. A PNG filethat does not contain an IDAT block is invalid. The organization of the com-pressed data within IDAT blocks is covered in Chapter 13.

    Unlike all other chunk types, most decoders will probably treat all the IDATblocks as a group rather than process them independently. This makes the decom-pression process simpler and helps to ensure that all the IDAT blocks in the fileare consecutive. When the first block in an IDAT block chain is encountered, thedecoder should

    Ensure that no other IDAT block chains have been encountered. If the color type is Palette then make sure that a PLTE chunk has been

    processed. Decompress the image data.

    IENDThe IEND chunk marks the end of PNG, so it obviously must be the last chunkin a PNG file.

    To process this chunk a decoder should

    Ensure that at least one IDAT block has been processed. Make sure that the chunk data length is zero. Conclude the decoding process.

    Noncritical Chunks

    The PNG standard defines several noncritical or ancillary chunks. These arechunks that are not absolutely essential within a PNG file. An encoder does nothave to create any of these chunks and a PNG decoder can simply ignore them.However, if you were writing a PNG decoder it would be desirable to implementas many of these standard chunks as possible. Likewise, an encoder should usethem when applicable, rather than create application private chunks, in order toensure the greatest portability. Many of the noncritical chunks are really onlyappropriate for specialized applications or when used for intermediate storage.For files that are going to be transmitted over the Internet or embedded in Webpages, a tEXt chunk or two and possibly a gAMA chunk are all that is appropri-ate in most cases.

    bKGDThe bKGD chunk suggests a background color for the image. If the image isbeing displayed in a window that is larger than it is, a decoder can use this color

    206

  • Noncritical Chunks 207

    to display the areas outside the image. If the image contains an Alpha channel thedecoder can merge the background color with the image. The bKGD chunk mustappear before the IDAT chunks in the image. If the image contains a PLTEchunk, then it must precede that bKGD chunk.

    The format of the data within a bKGD chunk depends upon the image'scolor type.

    Palette Color Type. The chunk data consists of single byte that is an indexinto the color palette.Grayscale or Grayscale with Alpha Channel. The chunk data contains asingle 2-byte integer that specifies the intensity of the background.RGB or RGB with Alpha Channel. The chunk data contains three 2-byteintegers that specify the values for the red, green, and blue components of thebackground.

    cHRMAn encoder can create a cHRM chunk to store the device-independent (1931CIE) specification of the colors used to view the image on the source display.These values can be used to convert the RGB pixel values to absolute colorsusing the process described earlier in the chapter. The format of the cHRM chunkis shown in Table 13.4.

    gAMAIf a PNG encoder knows the correct Gamma value used to view the originalimage, it can store this information in a gAMA chunk so the decoder can recre-ate the image as it was seen on the device that created it. The gAMA chunk datacontains a 4-byte integer that holds the product of the Gamma value and100,000. Thus, a Gamma value of 2.1 would be stored in the gAMA chunk as

    Table 13.4cHRM Chunk DataFormat

    Field NameWhite Point XWhite Point YRed XRed YGreen XGreen YBlue XBlue Y

    Size4 bytes4 bytes4 bytes4 bytes4 bytes4 bytes4 bytes4 bytes

    DescriptionWhite point value 100,000White point value 100,000Red point value 100,000Red point value 100,000Green point value 100,000Green point value 100,000Blue point value 100,000Blue point value 100,000

  • 208 PNG

    210,000. A gAMA chunk must precede any PLTE and IDAT chunks in the file.The format of the gAMA chunk is shown in Table 13.5.

    hISTAn encoder can place a hIST chunk in any PNG file that contains a PLTE chunkin order to supply decoders with the approximate usage frequencies for eachcolor in the palette. The hIST chunk can assist a decoder in selecting the colorsto use if it is unable to display all the colors in the palette. If an image contains ahIST chunk, it must follow the PLTE chunk and precede the IDAT chunks.

    The hIST chunk data is an array of 2-byte, unsigned integers. The number ofarray elements in the hIST chunk must be the same as the number of color entriesin the PLTE chunk. Each entry in the hIST array reflects the approximate rela-tive usage of the corresponding color in the PLTE chunk.

    If the encoder knows the absolute usage frequency of the colors within thepalette, it can scale the values to fit into 16 bits. However, a zero frequency valueshould only be used when a color is not used at all. In the case of an RGB image,the frequency values will always be approximate and none should be zero.

    pHYsThe pHYs chunk is used to store the absolute or relative pixel size of the deviceused to view the image when it was created. If a PNG file does not contain apHYs chunk, the decoder should assume that the pixels are square and that theoriginal physical size is unknown. A pHYs chunk must precede the IDAT chunksin the file. The format of the data for the pHYs chunk is shown in Table 13.6.

    When the Unit Specifier field is 0, the X and Y pixel dimensions inthe pHYs chunk give the relative sizes of the pixels on the source display. Thedecoder can use this information to scale the image on the output display. Si la

    Table 13.5gAMA Chunk DataFormat

    Table 13.6pHYs Chunk Data

    Field NameGamma Value

    Size4 bytes

    DescriptionFile Gamma x 100,000

    Field NamePixels Per Unit XPixels Per Unit YUnit Specifier

    Size4 bytes4 bytes1 byte

    La description

    0The X and Y values give a ratio.1Unit is meters.

  • Uni t Specifier field is 1, the X and Y dimensions give the number of pixelsper meter on the source display. The decoder can use this information to outputthe image in the same size it was on the source display.

    sBITAn encoder can use an sBIT chunk to store the number of significant bits in theoriginal sample data. If the original data uses a bit depth that is not supported byPNGfor example, 12a decoder can use the information in an sBIT chunk torecreate the original sample values.

    The format of the data within the sBIT depends upon the color type of theimage.

    Grayscale. The chunk data contains 1 byte giving the number of signif-icant bits.RGB and Palette. The chunk data contains 3 bytes giving the number ofsignificant bits for the red, green and blue components.Grayscale with Alpha Channel. The chunk data contains 2 bytes givingthe number of significant bits for the grayscale data and Alpha channel.RGB with Alpha Channel. The chunk data contains 4 bytes that specifythe number of significant bits in the source for the red, green, and blue com-ponents and Alpha channel, respectively.

    All data values within the sBIT chunk must be greater than zero and less thanor equal to the bit depth.

    A decoder can use a procedure like this to convert a sample value from aPNG file to the value at the original bit depth.

    tEXtAn encoder can use a tEXt chunk to store text information that does not affectthe decoding of an image. The tEXt chunk can appear anywhere between theIHDR and IEND chunks (except among the IDAT chunks) and there can be anynumber of them in a file.

    unsigned int sourcemax = 1

  • 210 PNG

    The chunk data contains a keyword and a keyword value. The chunk dataformat is shown in Table 13.7. The length of the Keyword field is determined bylocating the N U L L (0) terminator. This length may not be zero. The length of theText field is the length of the chunk data minus the length of the Keyword andthe Terminator. This length may be zero. Line breaks within the text should berepresented with a single character.

    The PNG standard defines the keywords shown in Table 13.8. An encodercan use these or create new keywords; however, a decoder should use the prede-fined keywords when they are applicable to maximize portability. Keywords arecase sensitive.

    tIMEThe tIME chunk is used to store the last time the image was modified. The tEXtchunk is used to store the creation time. The format of the tIME chunk data isshown in Table 13.9. Zulu (or Greenwich) time should be used rather than localtime. Applications that do not modify the image data should not create a newtIME chunk. The tIME chunk may appear anywhere after the IHDR chunk andbefore the IEND chunk, except within the IDAT chunks.

    Table 13.7tEXt Chunk Format

    Field NameKeywordTerminatorText

    SizeVariable 1-79 bytes1 byteVariable

    DescriptionASCII stringA zero terminator for the keywordASCII string

    Table 13.8tEXt Pre-definedKeywords

    KeywordAuthorCommentCopyrightCreation TimeDescriptionDisclaimerSoftwareSourceTitleWarning

    DescriptionName of the image's creatorGeneric comment; conversion from GIF commentCopyright noticeTime the image was originally createdExtended image descriptionLegal disclaimerApplication that created the imageDevice used to create the imageBrief image description or titleContent warning

  • Noncritical Chunks 211

    Field NameYearMonthDayHour

    MinuteSecond

    Size2 bytes1 byte1 byte1-byte1 byte1 byte

    DescriptionGregorian year (2020, not 20)1-121-311-230-590-60

    tRNSThe tRNS chunk is used to implement transparency without using an Alphachannel. Using this mechanism, the Alpha values are associated with colorsrather than per pixel. A tRNS chunk must appear after the PLTE chunk, if pre-sent, and before the IDAT chunks. The format of the data within the tRNS chunkdepends upon the color type used to store the image.

    Palette. The chunk data contains an array of bytes that specify the Alphavalue to be associated with the color entries in the PLTE chunk. Each pixelwith a color index of N has the Nth entry in the tRNS data as its Alpha value.The number of entries in the tRNS chunk must be less than or equal to thenumber of color entries in the PLTE chunk. Palette entries without an entryin the tRNS chunk have an Alpha value of 255.Grayscale. The chunk data contains a 2-byte integer that defines a trans-parent color. All pixels with this color value are fully transparent; pixels withany other value are fully opaque.RGB. The chunk data contains three 2-byte integers that specify an RGBcolor value. All pixels of this color are fully transparent; pixels of any othercolor are fully opaque.

    The tRNS chunk may not be used when the color type has an Alphachannel.

    zTXtThe zTXt chunk performs the same function as the tEXt chunk except that thetext data is compressed using the Deflate compression method (the same methodused to compress the image data). Just like the tEXt chunk, there can be any num-ber of zTXt chunks and they can occur anywhere after the IHDR chunk andbefore the IEND chunk, except among the image's IDAT chunks. The format ofthe zTXt chunk is shown in Table 13.10.

    Table 13.9tIME Chunk Format

  • 212 PNG

    Table 13.10zTXt Chunk Format

    Field NameKeywordSeparatorCompression MethodCompressed Text

    Size1-79 bytes1-byte1-ByteVariable

    DescriptionUncompressed text stringZero value keyword terminatorMust be zero.Deflate compressed text.

    Conclusion

    In this chapter we have introduced the PNG format and described its file andchunk structure. The PNG format contains support for device-independent colorthrough Gamma correction and the XYZ color model. It is superior to GIF in allrespects except one: animations. Unlike GIF, PNG files can only contain a sin-gle image. As this is being written a new multiple image standard called MNGis under development that will remove the final barrier to PNG completelyreplacing GIF.

    The PNG file and block format is defined in the PNG standard, which isincluded on the accompanying CD-ROM. Foley et al. (1996) contains moreinformation on Gamma correction, Alpha channel, and the XYZ colorspace.Blinn (1998) and Porter and Duff (1984) contain useful information on Alphachannel and compositing. Campbell (1987) and Ramabadran and Gaitonde(1988) give introductory descriptions of CRC calculations.

    The source code example for this chapter is an application called PNGDUMPthat displays the name, chunk data length, and CRC value for each chunk in aPNG file. The organization of this application is very similar to that of a func-tioning PNG decoder. For the critical chunks defined by the PNG standard,PNGDUMP performs appropriate validations. The major piece that a decoderwould add is the decompression of the image data stored in the IDAT blocks(covered in the next chapter).

    To run this application type

    > PNGDUMP somefile.png

    at the command line. Sample output from this program is shown in Figure 13.6.

  • 213

    IHDRData Length: 13Data CRC: 9cc69707File CRC: 9cc69707Image Size: 383 x 262Bit Depth: 8Color Type: Palette IndexCompression Method: deflate/inflate – 32k Sliding WindowFilter Method: adaptiveInterlace Method: none

    PLTEData Length: 768Data CRC: 9fe76824File CRC: 9fe76824Palette Color Count: 100

    IDATData Length: 2000Data CRC: 710a2c5bFile CRC: 710a2c5b

    IDATData Length: 2000Data CRC: d857c86aFile CRC: d857c86a

    IDATData Length: 2000Data CRC: 119cab52File CRC: 119cab52

    IDATData Length: 2000Data CRC: 1ab5b934File CRC: 1ab5b934

    IDATData Length: 2000Data CRC: 610914dbFile CRC: 610914db

    IDATData Length: 5b7Data CRC: cee96fbeFile CRC: cee96fbe

    IENDData Length: 0Data CRC: ae426082File CRC: ae426082

    Figure 13.6Sample PNGDUMPOutput

    Conclusion

  • DecompressingPNG Image Data

    The previous chapter explained how to decode a PNG file up to the point wherethe image data within the IDAT blocks is interpreted. This chapter covers theremaining topics needed to decode a PNG file. The main focus is the Deflatecompression process that PNG uses to store pixel data in IDAT chunks.

    Decompressing the Image Data

    The first step in processing the image data is to decompress it. During the decom-pression process we treat the chunk data from the IDAT chunks in the image fileas a continuous stream of bytes; then we pass them to a decompressor for theDeflate/Inflate processes. The segmentation of the compressed stream into IDATchunks is irrelevant to the decompression processes. The sequence of com-pressed data bytes would be the same if one IDAT were used to hold the entirecompressed block, if each compressed byte were placed in a separate IDATchunk, or any combination between these two extremes.

    ZLIB, Deflate, and PNGBefore Unisys started to demand licenses for its use in software, LZW had notbeen confined to GIF. It had also been used in many types of compression appli-cations including the Unix compress program. When LZW could no longer beused in free software, there was an immediate need for a freely usable compres-

    215

    Chapter 14

  • Decompressing PNG Image Data

    sion method to replace it. The solution came in the form of a general-purposecompression library known as ZLIB.

    ZLIB employs an LZ77-based compression process known as Deflate,which had its origins in the ZIP and PKZIP programs. The compression sourcecode within ZIP was too tightly bound to the application for general use, so Jean-Loupe Gailly and Mark Adler created ZLIB to implement Deflate compressionin a manner that can be used by other applications. ZLIB has been used not onlyin PNG but in the GZIP archiving program as well. Currently, Deflate is the onlycompression method supported by ZLIB, but the ZLIB format has provisions forother methods to be added in the future.

    For this discussion we are going to describe ZLIB and Deflate only as theyapply to PNG. Not all settings that are valid in ZLIB/Deflate are legal when usedin PNG files. The source code examples in this and the following chapters con-tain implementations of the PNG subset of Deflate.

    LZ77 CompressionThe LZ77 process uses a sliding window to maintain a dictionary of recentlyprocessed text. The compressed stream is a sequence of codes that are either lit-eral values or commands for the decompressor to copy text from the window tothe output stream.

    An LZ77 decompressor reads each code in the compressed stream insequence. Codes that represent literal values are copied directly to the outputstream. Command codes are replaced in the output stream with text copied fromthe LZ window. In either case, the LZ window is advanced so that the last char-acter copied to the output stream is included in the window. The big advantage ofdictionary compression over Huffman coding is that compression can be done onthe fly without having to process the entire stream, making it suitable for appli-cations such as compression of data streams in a computer network.

    Figure 14.1 contains a simplified example of LZ77 decompression using a16-byte window. The data consists of 7-bit ASCII text, so by using 1 bit to dif-ferentiate a literal value from a command, each code in the compressed streamcan be encoded using 8 bits. In this example copy commands are represented as where the offset is the number of bytes from the start of theLZ77 window and the length is the number of bytes to copy.

    In this example the first six codes are literal values that are copied to the out-put stream. The seventh code copies two characters from the tenth position in theLZ Window ("A") to the output stream. As new codes are read, the window fillsup and text starts to become lost to the compression processes. Notice that thefinal "MA" in the text could have been compressed into a code had that stringnot slid out of the window.

    216

  • Decompressing the Image Data

    Figure 14.1LZ77 CompressionExample

    Deflate CompressionData in the Deflate format is stored in blocks that are either uncompressed orcompressed using a variation of the LZ77 process. For the moment we will con-centrate on the format of compressed Deflate blocks. Deflate uses a 32K-byte (orsmaller power of 2) sliding window into the most recently decoded text. Duringthe decompression process this window holds the most recently decompresseddata bytes.

    The compressed stream contains codes that represent literal values or dis-tance/length codes. When the decompressor encounters a literal value, it copiesit to the output stream and then advances the sliding window by one position.When a distance/length code is read from the compressed stream, the decom-pressor uses the distance to locate the start of a previously decompressed stringto copy to the output stream.

    When copying a string from the sliding window you need to maintain twopointers or indices into the sliding window: one to the source text being copied

    A MAN PLCALMAMAN PLCALMA

    MAN PLCALMAAN PLCALMAN PLCALMAPLCALMA

    PLCALMAPLCALMALCALMACALMACALMAALMAALMALMAMAMAMAA

    Time 1

    2345678910111213141516171819

    AA M

    A MAA MAN

    A MAN A PA MAN A PL

    A MAN A PLA

    217

  • Decompressing PNG Image Data

    and the other to the destination. Since the size of the window is a power of 2, itis easy to implement the window as a circular buffer rather than one that physi-cally moves. Wrapping of indices in a 32K buffer is accomplished by perform-ing a bitwise AND operation with an index and the value 7FF16 (215-1).

    Algorithm 14.1 shows how a copy operation is implemented. It assumes thatthere is a function called OutputByte that processes each byte as it is decom-pressed. The arguments to the CopyData function are the number of bytes tocopy and the distance from the current output position in the window.

    Deflate uses codes in the range 0-285 in the compressed stream. The codesin the range 0-255 naturally represent literal bytes in the compressed stream.The code 256 is a special value that is used to mark the end of a compressedblock, and codes 257-285 are used to start a command to copy from the slidingwindow.

    The format for a copy command is shown in Figure 14.2. The codes257-285 are part of the specifier for the length component of the copy com-mand. Each length code has a base length value and a number of extra bits asso-ciated with it. After reading the code, the decoder reads the specified number ofextra bits. The value of the extra bits is added to the base value to give the finalcopy length. The base values and number of extra bits for each length code areshown in Table 14.1.

    Global OUTPUTPOSITIONConstant WINDOWSIZE = 1 LeftShift 15Constant WINDOWMASK = WINDOWSIZE – 1 // 7FFPROCEDURE CopyData (LENGTH, DISTANCE)

    Commencer

    // We add the window size to ensure the index is always positive.// The AND operation will get rid of any extras distance from// this addition.COPYPOSITION = (WINDOWSIZE + OUTPUTPOSITION – DISTANCE)COPYPOSITION = COPYPOSITION AND WINDOWMASK

    For II = 1 To LENGTH DoBeginWINDOW (OUTPUTPOSITION) = WINDOW (COPYPOSITION)OutputByte (WINDOW (OUTPUTPOSITION))// Advance to the next output positionOUTPUTPOSITION = OUTPUTPOSITION + 1OUTPUTPOSITION = OUTPUTPOSITION And WINDOWMASK// Advance to the next byte to copyCOPYPOSITION = COPYPOSITION + 1COPYPOSITION = COPYPOSITION And WINDOWMASKEnd

    Fin

    Algorithm 14.1LZ77 Data Copying

    218

  • Decompressing the Image Data 219

    Figure 14.2Copy CommandFormat

    (Length Extra Bits)(Distance Extra Bits)

    The extra bits are followed by another distance code in the range 0-29. Thisvalue specifies the base value for the distance from the current location in thebuffer and the number of additional bits. The extra bits and base values for dis-tance codes are shown in Table 14.2.

    Length Code257258259260261262263264265266267268269270271272273274275276277278279280281282283284285

    Base Value3456789

    10111315171923273135435159678399

    115131163195227258

    Extra Bits00000000111122223333444455550

    Possible Length Values3456789

    1011-1213-1415-1617-1819-2223-2627-3031-3435-4243-5051-5859-6667-8283-9899-114

    115-130131-162163-195195-226227-257258

    Table 14.1Length Code BaseValues and Extra Bits

  • 220 Decompressing PNG Image Data

    Example1. Read Length Code 275

    From Table 14.1 the Base Value is 51 and there are 3 extra bits.2. Read 3 Bits giving 1012 (5)

    Length Value is 5 + 51 = 563. Read Distance Code 14

    From Table 14.2 the Base Value is 129 and there are 6 extra bits.4. Read 6 Bits giving 0011002 (12).

    The Distance value is 12 + 129 = 141

    This command copies the 56 characters located 129 positions before thecurrent buffer position.

    Table 14.2Distance Code BaseValues and Extra Bits

    Distance Code0123456789

    1011121314151617181920212223242526272829

    Base Value1234579

    13172533496597

    129193257385513769

    1,0251,5372,0493,0734,0976,1458,193

    12,28916,38524,577

    Extra Bits0000112233445566778899

    1010111112121313

    Possible Distance Values12345-67-89-12

    13-1617-2425-3233-4849-6465-9697-128

    129-192193-256257-384385-512513-768769-1,025

    1,025-1,5361,537-2,0482,049-3,0723,073-4,0964,097-6,1446,145-6,1458,193-12,288

    12,289-16,38416,385-24,57524,577-32,768

  • One would normally expect value ranges that are limited by powers of 2.However, in Table 14.1 and Table 14.2 we have the values 0-285 and 0-29. Sohow are the lengths and distances efficiently coded?

    The answer is Huffman coding. Two Huffman tables are used during most ofthe decompression process. One is used to decode length and literal values andthe other is used to decode distance values.

    Huffman Coding in Deflate

    The Huffman coding process used in PNG with Deflate is almost identical to theone used in JPEG (Chapter 6). In fact, for the PNG decoder on the CD-ROM, wewill only have to make small modifications to the JPEG Huffman decoder class.These are the differences you need to be aware of between Huffman coding inJPEG and in PNG:

    In JPEG, the Huffman codes containing all 1 bits are invalid. In PNG theyare legal.

    In JPEG, the maximum Huffman code length is 16 bits. In PNG, lengths anddistance codes are a maximum of 15 bits while the tables are encoded usinga maximum of 4 bits.

    In PNG, if values of X and Y have Huffman codes of the same length and Xis greater than Y, the Huffman code for X is greater than the Huffman codefor Y. In JPEG, the ordering of Huffman codes matches the ordering of thevalues in the file.

    In PNG, the Huffman codes are stored with their bits reversed. The Huffmancode 11002 (6) is stored as 01112 (3) in the compressed data.

    In PNG, Huffman table definitions contain the Huffman code length forevery possible value. Unused values are given a code length of zero. InJPEG, code lengths are only given for values that are actually used.

    As with JPEG, the input to the Huffman table generation process is an arrayof values and an array of Huffman code lengths. In PNG, the values are sortedby value rather than code length and the array contains zero length codes. Wehave to add a step to sort these arrays by code length, and during the Huffmancode generation we have to take into account the values with zero length codes.

    Algorithm 14.2 illustrates the process for decoding a compressed block. Theprocedures DecodeUsingLengthTable and DecodeUsingDistanceTableare assumed to Huffman decode the next value in the input stream using the lit-eral/length and distance Huffman tables, respectively. ReadLiteralBits (n)is a function that returns the next n bits from the input stream and CopyData isthe function defined in Algorithm 14.1.

    Huffman Coding in Deflate 221

  • Decompressing PNG Image Data

    Algorithm 14.2Deflate Process

    Procedure DecodeBlockBeginWhile True Do

    BeginCODE = DecodeUsingLengthTable ()If CODE = 256 Then

    ReturnElse If CODE < 256 Then

    OutputByte (CODE)Else

    BeginEXTRA = LENGTHEXTRABITS (CODE)BASE = LENGTHBASES (CODE)LENGTH = BASE + ReadLiteralBits (EXTRA)CODE = DecodeUsingDistanceTable ()EXTRA = DISTANCEEXTRABITS (CODE)BASE = DISTANCEBASES (CODE)DISTANCE = BASE + ReadLiteralBits (EXTRA)CopyData (LENGTH, DISTANCE)End

    EndEnd

    Compressed Data Format

    Until now we have dealt with the PNG compression from the top down. Now weare going to back up and examine the ZLIB/Deflate compressed data format.

    The structure of the compressed data is shown in Table 14.3. Notice thatmost of the fields are not complete bytes and that when used with PNG most havemandatory values. The Compression Level field is an advisory field. It givesa clue as to whether there may be any benefit in recompressing the data. Thevalue in the Check Bits field is used to ensure that the header value is a multi-ple of 31. A 2-byte header that is not evenly divisible by 31 is invalid.

    The Adler-32 checksum serves the same function the CRC-32 does for PNGblocks. The major difference is in how it is used. The CRC-32 value for a PNGblock is calculated using the bytes stored in the file. The Adler-32 value for acompressed stream is calculated on the uncompressed bytes. As each byte isdecompressed, the decoder should update the Alder-32 value. After all the datahas been decompressed, a decoder should compare the Adler-32 calculated fromthe decompressed data with the value stored in the field. If the two values are notthe same, the decoder should assume that the data has been corrupted.

    The following source code illustrates how to implement the Adler-32 check-sum. The UpdateAdler function updates the value in the AdlerRegister vari-able for each byte as it is decompressed. The Adler register is initialized to 1

    222

  • Compressed Data Blocks 223

    Table 14.3Compressed DataFormat

    Field NameHeaderCompression MethodWindow SizeCheck BitsPreset DictionaryCompression Level

    Compressed BlocksAdler Checksum

    Size2 bytes4 bits4 bits5 bits1 bit2 bits

    Variable4 bytes

    La description

    Must be 8.Must be 7 or less.Makes the first 2 bytes a multiple of 31.Must be zero.0Fastest compression used.1Fast compression used.2Default compression used.3Maximum compression used.

    Adler-32 Checksum calculated from theuncompressed data.

    unsigned long AdlerRegister = 1 ;const unsigned long PRIME = 65521L ;void UpdateAdler(unsigned char value){unsigned long low = AdlerRegister & 0X0000FFFFL ;unsigned long high = (AdlerRegister >> 16) & 0X0000FFFFL ;low = (low + value) % PRIME ;high = (low + high) % PRIME ;AdlerRegister = (high

  • 224 Decompressing PNG Image Data

    Table 14.4Compressed BlockHeader Format

    Field NameFinal

    Type

    Size1 bit

    2 bits

    Description1This is the last compressed block.0There are additional compressed blocks after this one.0The data is uncompressed.1Compressed with fixed Huffman codes.2Compressed with dynamic Huffman codes.3Invalid.

    Uncompressed Block FormatIf the Type field in the block header specifies that the data is uncompressed, theremaining data in the block is byte aligned. Any unused bits following the headerare discarded. The format of an uncompressed data block is shown in Table 14.5.A decompressor simply copies the uncompressed bytes to the output stream.

    Dynamic Huffman CodesDynamic Huffman codes is the most useful compression method. The bit fieldsshown in Table 14.6 immediately follow the compressed block header. These val-ues give the number of values that are actually used in the compressed data.While two Huffman tables are used to decompress the data, there are threelengths defined here. The reason for the extra field is that the code lengths for thelength/literal and distance Huffman tables themselves are Huffman encoded.

    The structure in Table 14.6 is followed by a sequence of up to 19 3-bit fields.The actual number of bit fields is the value of the Lengths field plus 4. Thesebit fields contain the Huffman code lengths for the values 0-18. The lengths arestored in the order

    Table 14.5UncompressedBlock Format

    Field NameLengthNLengthBlock Data

    Length2 bytes2 bytesLength bytes

    DescriptionThe number of data bytes in the block.The 1's-complement of Length. Used for validation.The uncompressed data bytes.

    Table 14.6Dynamic HuffmanCode Fields

    16 17 18 0 8 7 9 6 10 5 11 4 12 3 13 2 14 1 15

    Field NameLiteralsDistancesLengths

    Length5 bits5 bits4 bits

    DescriptionNumber of length/literal codes-257 (257-286).Number of distance codes-1 (1-32).Number of code length codes-4 (4-19).

  • Compressed Data Blocks

    Entries at the end of the list are less likely to have a corresponding lengthvalue stored in the file. Values without an entry are assigned a code length ofzero. Using these length codes, a decompressor creates the Huffman table usedto decode the code lengths for the other literal/length and distance Huffmantables.

    The Huffman-encoded values 0-18 are used to encode the literal/length anddistance Huffman tables. Table 14.7 shows the meanings of these codes. Thecodes 16, 17, and 18 have a number of extra bits associated with them. When adecoder encounters one of these codes, it is followed by the specified number ofliteral bits.

    ExampleDecoder reads the Huffman encoded value 17 from the input stream.Following Algorithm 14.7, the decoder reads the 3 extra bits 1012 (5).This value is added to the base value giving 8 (= 3 + 5).This command sets the next 8 code lengths to zero.

    The code lengths for the literal/length Huffman table come next in the inputstream. The number of code lengths is the value of the Literals field in Table14.6 plus 257. A decompressor reads Huffman-encoded command values (0-18)and processes them according to Table 14.7 until the specified number of codelengths has been read.

    The distance Huffman table code lengths immediately follow. The number ofdistance codes is the value of the Distance field in Table 14.7 plus 1. Distancetable length codes are encoded in the same manner as they are in the literal/lengthtable.

    Algorithm 14.3 illustrates how to read the code lengths for a Huffman tablefrom the input stream. The parameters to the ReadLengths function are thenumber of code lengths to read (from Table 14.6) and an output array of codelengths where LENGTHS (n) is the Huffman code length for the value n.

    After the code lengths have been read, they are used to generate theliteral/length and distance Huffman tables. The compressed data format is iden-tical to that used with fixed Huffman codes.

    Table 14.7Length Encodings

    Code0-15161718

    DescriptionLiteral value.Repeat the previous code 3-6 times.Repeat length 0 3-10 times.Repeat length 0 11-138 times.

    Base ValueN/A33

    11

    Extra Bits0237

  • Decompressing PNG Image Data

    Procedure ReadLengths (LENGTHCOUNT, LENGTHS ())BeginINDEX = 0While INDEX < LENGTHCOUNT DO

    BeginCODE = HuffmanDecodeInputStream ()If CODE < 16 Then

    BeginLENGTHS (INDEX) = CODEINDEX = INDEX + 1End

    Else If CODE = 16 ThenBeginCOUNT = 3 + ReadRawBitsFromInputStream (3)For I = 1 To COUNT Do

    BeginLENGTHS (INDEX) = LENGTHS (INDEX – 1)INDEX = INDEX + 1End

    EndElse If CODE = 17 Then

    BeginCOUNT = 3 + ReadRawBitsFromInputStream (3)For I = 1 To COUNT Do

    BeginLENGTHS (INDEX) = 0INDEX = INDEX + 1End

    EndElse If CODE = 18 Then

    BeginCOUNT = 11 + ReadRawBitsFromInputStream (7)For I = 1 To COUNT Do

    BeginLENGTHS (INDEX) = 0INDEX = INDEX + 1End

    EndEnd

    Fin

    Fixed Huffman CodesWhen fixed Huffman codes are used, the compressed data in the block immedi-ately follows the block header. The compressed data may or may not be bytealigned. This block uses a predefined set of Huffman codes rather than codesgenerated from usage frequencies.

    A block that uses fixed Huffman codes is decompressed in the same man-ner as is one with dynamic Huffman codes. The only difference in processing

    226

    Algorithm 14.3ReadingCompressedHuffman Tables

  • Writing the Decompressed Data to the Image 227

    Table 14.8Literal/LengthHuffman CodeLengths for FixedHuffman Codes

    is that the Huffman table lengths are not stored in the input stream. TheHuffman table for length/literal codes is generated using the code lengthsshown in Table 14.8. The Huffman table for distance code uses a length of 5 forall possible values (0-29).

    Writing the Decompressed Data to the Image

    The process described in the previous sections in this chapter shows how to con-vert a stream of compressed bytes contained in a sequence of IDAT blocks intoa stream of uncompressed data bytes. We need to perform interlacing, filtering,and color conversion on this stream before writing it to the image.

    InterlacingWhen an image is interlaced, it is divided into a series of seven passes. We canexploit the regularities in the Adam 7 interlace pattern to create a structure likethe one shown in Table 14.9. This structure gives the location of the first pixelwithin an 8 8 block that is updated by each pass and the row and columnintervals between the next pixels. All of the row and column intervals are main-tained across adjacent 8 8 blocks. This example shows how the informationin Table 14.9 would be used to determine the sequence of pixels to process ina pass of an interlaced image.

    Algorithm 14.4Interlace ProcessingUsing a Table

    Procedure ProcessPassBeginROW = FirstRowWhile ROW < IMAGEHEIGHT Do

    BeginCOL = FirstColumnWhile COL < IMAGEWIDTH Do

    BeginProcessDataForPixel (ROW, COL)COL = COL + ColumnIntervalEnd

    ROW = ROW + RowIntervalEnd

    Fin

    Value0-143

    144-255256-279280-287

    Code Length8978

  • 228 Decompressing PNG Image Data

    Using Table 14.9, the number of pixels per row in a given pass in an inter-laced image is

    Pixels per Row =

    If the image is not interlaced, the number of pixels per row is simply the imagewidth and there is only one pass. The number of bits required to represent a pixelfor each color type is shown in Table 14.10. The number of bits to represent eachrow is, then,

    Bits per Row = Bits per Pixel Pixels per Row

    and the number of bytes per row is

    Pixels per Row =

    FilteringA filter is a function that transforms the data in a pixel row into a format that ismore compressible. The pixel data for each row is preceded by a single byte thatspecifies the filtering method applied to each row. Table 14.11 lists the possiblevalues for the filter byte and the corresponding filter type. Values outside therange 0-4 are invalid. If the filter type is zero, the row was not filtered, so therow data contains the actual pixel values.

    Table 14.10Bits per Pixel forPNG Color Types

    Pass1234567

    First Row1151312

    First Column1513121

    Row Interval8884422

    Column Interval8844221

    Table 14.9Adam 7 PixelUpdate Intervals

    Color TypeRGBRGB with AlphaGrayscaleGrayscale with AlphaPalette

    Bits per Pixel3 Bit depth4 Bit depthBit depth2 Bit depthBit depth

  • Some of the filters are calculated from the unfiltered data generated for theprevious row. A PNG decoder needs to maintain two buffers large enough to holdthe pixel data for an entire row. One buffer contains the data for the current rowand the other contains the data for the previous row.

    The filtering process involves calculations based upon values of adjacentpixels. Filtering is performed on a per-byte basis rather than per pixel and filter-ing is performed relative to corresponding bytes within pixels. For example, ifyou are processing an image using the RGB color type with a bit depth of 16, thehigh-order byte for the red component of one pixel is always used with the high-order byte for the red component of another pixel. If the bit depth is less than 8,filtering is performed on adjacent bytes. Table 14.12 gives the intervals betweencorresponding bytes for the possible bit depth and color type combinations.

    The following sections describe how the filtering process is reversed forthe various filter types. In these descriptions buffer (previous) containsthe unfiltered bytes from the previous row and buffer (current) containsthe filtered bytes for the current row. The variable interval is obtained fromTable 14.12.

    Table 14.12Interval betweenCorrespondingBytes When Filtering

    Color TypeGrayscaleGrayscaleGrayscale with AlphaGrayscale with AlphaPaletteRGBRGBRGB with AlphaRGB with Alpha

    Bit Depth1, 2, 3, 4, 8

    168

    161, 2, 3, 4, 88

    168

    16

    Interval122413648

    Code01234

    Filter TypeUnfilteredSub filterUp filterAverage filterPaeth filter

    Table 14.11Row Filter Codes

    229Writing the Decompressed Data to the Image

  • 230 Decompressing PNG Image Data

    If X is the first byte in a row, the value of buf fe r (N) (X-1) is zero.Likewise, if the current row is the first row for the pass, all of the values ofbuf fe r (previous) are implicitly zero.

    All filters are performed using integer arithmetic and the data bytes aretreated as signed (Algorithms 14.5-14.8). If the result from reversing a filter isgreater than 255, only the least significant byte in the result is used.

    Algorithm 14.5Reverse Sub Filter

    Function ReverseSub (X)BeginReturn buffer (current)(X) + buffer (current)(X-Interval)End

    Algorithm 14.6Reverse Up Filter

    Function ReverseUp (X)BeginReturn buffer (current)(X) + buffer (previous)(X)End

    Algorithm 14.7Reverse AverageFilter

    Function ReverseAverage (X)BeginReturn buffer (current)(X) +

    (buffer (current)(X-Interval)+ buffer (previous)(X)) / 2

    Fin

    Algorithm 14.8Reverse Paeth Filter

    Function PaethPreductor (Left, Above, UpperLeft)Beginpa = abs (above – upperleft)pb = abs (left – upperleft)pc = abs (left – upperleft + above – upperleft)

    If pa

  • Conclusion 231

    Color CorrectionIn many cases, after the reverse filtering process is complete the data is ready todisplay. If an application has the required information, the decoder can color cor-rect the pixel data. If the PNG file contains a cHRM chunk, the decoder can con-vert the pixel data to CIE 1931 format to get the exact colors shown on the sourcedisplay, then correct the data for the destination display. If the file contains agAMA chunk, the data can be Gamma corrected for the output display.

    Equation 14.1Bit DepthConversion

    16- to 8-bit ConversionMost current computer systems only support bit depths of up to 8. Unless you arewriting a decoder for a specialized system that supports greater bit depths, youare going to have to convert 16-bit data values to 8 bits. The technically correctmethod to convert pixel values from one bit depth to another is

    New Value = Old Value

    The easiest method to convert from 16 to 8 bits is to discard the low-orderbyte of each 16-bit color value after applying color correction (if applicable). Theresults are virtually indistinguishable from that of Equation 14.1.

    Either of these two methods could create large, solid blocks of colors thatcould look odd, especially in photographs. In some situations you may wish toapply dithering during the color conversion.

    TransparencyIf the color type is RGB with Alpha or grayscale with Alpha, or if the PNG filecontains a tRNS chunk, transparency can be applied if desired. If the image isbeing drawn on a background and the decoder has access to the background'scolor data, the image in the PNG file can be combined with the background pix-els using the process described in the previous chapter. Another possibility is tocombine the image with a solid background that is specified by the applicationor from a bKGD chunk.

    Conclusion

    In this chapter we have covered the remaining aspects of PNG that are requiredto implement a PNG decoder. Besides explaining the Deflate compressionprocess, we have covered the format of the pixel data, including the filteringprocess.

    The compressed data format for PNG is defined in Deutsch and Gailley(1996a) and Deutsch (1996b). Both of these documents are on the accompany-

  • Decompressing PNG Image Data

    ing CD-ROM. Blinn (1998) contains a description of a dithering process suitablefor 16-bit to 8-bit conversion.

    The source code example for this chapter on the accompanying CD-ROM isa complete PNG decoder class, PngDecoder. This class uses the same processall of the other decoders covered in this book use to read a PNG file and convertit to a BitmapImage object.

    There is also a sample PNG decoding application that converts a PNG fileto the Windows BMP format.

    The command format for this application isDECODER input.png output.bmp

    232

  • Creating PNG Files

    This is the last chapter on the PNG format. It covers the process for creating filesin the PNG format, which is essentially the reverse of the one used in the previ-ous chapter to read PNG files.

    Vue d'ensemble

    The basic process for creating a PNG file is fairly simple.

    1. Write the PNG signature.2. Write the PNG IHDR chunk.3. Create a PLTE chunk if the image requires a palette.4. Compress the image data into a series of IDAT blocks.5. Write an IEND chunk.

    An encoder can be designed so that it adds optional PNG chunks if needed.The optional chunks can be either predefined public chunks or application spe-cific. However, in most situations the steps listed above are all that is needed.

    With the exception of creating the IDAT blocks, all of the steps listed aboveare trivial. This chapter will deal almost exclusively with storing data in the IDATchain. For information on the other chunks refer to Chapter 13.

    233

    Chapter 15

  • Creating PNG Files

    Deflate Compression Process

    The previous chapter covered the format of the Deflate compressed data withina chain of IDAT blocks. While clearly a compressor uses the same structuresfor the data a decompressor does, compression is not simply a reversal ofdecompression.

    The Deflate specification gives an outline of a compression process. It rec-ommends that this process be followed because of the patent minefield that sur-rounds any LZ compression process.

    To implement Deflate compression we need to maintain a 32K or smallerpower-of-2 window into the most recently processed uncompressed data bytes,just like the one used with decompression. The compression process requires anadditional lookahead window into the data yet to be compressed. Starting fromthe beginning of the lookahead buffer we try to find the longest substring that hasa match in the LZ77 sliding window. Since the longest match allowed by Deflateis 258 bytes, the lookahead window needs to be at least this long to get thelongest possible matches. Rounding the lookahead window up to the next powerof 2 (512) makes wrapping in the window simpler.

    Algorithm 15.1 illustrates the general compression process for PNG imagedata. This is roughly the inverse of the DecodeBlock function shown in the pre-vious chapter. The length and distance values are converted to codes and literalbits using the code also shown in the previous chapter.

    There are two significant omissions in Algorithm 15.1. In a PNG file theHuffman tables precede the image data, so the encoder needs to generate themfirst. The other missing piece is the method the encoder uses to locate matchingstrings in the LZ77 windows.

    Finding Matching Strings in the LZ77 WindowFinding the best match for the start of the lookahead buffer is the most time-con-suming part of compressing PNG files. A simple linear search would require 32Ksearches per string being compressed, which could easily amount to billions ofsearch operations to compress an image file. Instead of brute force, the approachrecommended by the Deflate specification is to use a hash table where hash val-ues are calculated using 3-byte sequences.

    A hash table is a structure used to store objects that are accessed using a key,when the number of possible key values greatly exceeds the number of tableentries at any given time. Hash tables are most commonly used with string keys.Many compiler implementations use hash tables to store variables defined by amodule. A typical source module for a compiler may have a few hundred vari-able names out of the billions upon billions of possibilities. During PNG com-pression we have 32,768 entries with a maximum of 16 million possible values.

    234

  • Deflate Compression Process 235

    Algorithm 15.1DeflateCompressionProcess

    While MOREIMAGEDATA DoBegirtFindLongestMatchInLZ77Window (LENGTH, DISTANCE)If LENGTH < 3 Then

    BeginConvertLengthToCode (LENGTH, CODE, EXTRABITS, BITCOUNT)HuffmanEncodeLength (CODE)OutputLiteralBits (EXTRABITS, BITCOUNT)ConvertDistanceToCode (DISTANCE, CODE, EXTRABITS, BITCOUNT)HuffmanEncodeDistance (CODE)OutputLiteralBits (EXTRABITS, BITCOUNT)CopyFromLookaheadBuffer (LENGTH)End

    ElseBeginHuffmanEncodeLength (FirstLookahead ())CopyFromLookaheadBuffer (1)End

    Fin

    Entries in a hash table are referenced using a hash value. Figure 15.1 illus-trates the structure of a hash table. The hash value is generated from a key byusing a hash function. A hash function takes a key as input and returns an inte-ger value within some fixed range. The hash value is used as an index into anarray that contains a list of objects with the same hash value. A good hash func-tion should distribute the possible key values evenly across the range of indexvalues.

    Since we are dealing with pixel values that have an equal probability ofoccurring, we can use a simple hash function which returns values in the range0…23N – 1.

    const int mask = (1

  • 236 Creating PNG Files

    Figure 15.1Hash TableStructure

    Hash Table Buffer

    sequences appear within the LZ77 window to be able to replace strings withlength/distance codes. Identical sequences will have identical hash values.Imagine an image with a solid background. It is entirely possible that the entireLZ77 window will contain entries with the same hash value.

    To resolve collisions we chain entries with identical hash values. For storing3-byte sequences we can define the hash table as something like this.

    Structure HashEntryBeginINDEX : Unsigned IntegerNEXT : Pointer To HashEntryEnd

    Global HashTable (0..(1 LeftShift (3 * N) – 1) Of Pointer To HashEntry

  • The hash function returns an index to the first entry in the hash table. Theother entries with the same hash value are located by following the pointer to thenext entry. Algorithm 15.2 illustrates the basic procedure for finding the bestmatch within the LZ77 window.

    A compressor can use additional criteria for determining the best match.For example, it may take the distance into consideration as well as the codelength. As the distance value becomes larger so does the number of additionalbits required to encode it. If the distance value for a 3-byte match is largeenough to require 13 additional bits, it is most likely that the compressor canencode the string with fewer bits using three literal values rather than a lengthand distance code.

    Think about what happens when the procedure just described is used with animage containing relatively few colors. The hash chains could become quitelarge, which would make searching them end to end very slow. A good solutionto this problem is to put a limit on the number of entries in a hash chain that thecompressor will search for the best match. This limit can be configured to allowthe amount of compression to be traded off against compression time. Limitingthe number of entries searched in each hash chain does not have a significantnegative impact on compression. However, it can result in a major reduction incompression time. The search limit can be made a configurable parameter so thatthe user can trade off time for compression.

    Procedure BestMatch (BESTLENGTH, BESTOFFSET)BeginBESTLENGTH = 0BESTOFFSET = 0

    HASHVALUE = Hash (LOOKAHEAD (0), LOOKAHEAD (1), LOOKAHEAD (2))HASHENTRY = HashTable (HashValue)If HASHENTRY = NULL ThenReturn // No possible Match

    While HASHENTRY NULL DoBeginII = 0While LZWINDOW (HASHENTRY.INDEX + II) = LOOKAHEAD (I) DoII = II + 1

    If II > BESTLENGTH ThenBeginBESTLENGTH = IIBESTOFFSET = HASHENTRY.INDEXEnd

    HASHENTRY = HASHENTRY.NEXTEnd

    Fin

    237Deflate Compression Process

    Algorithm 15.2Matching Entries inthe Hash Table

  • Creating PNG Files

    Each time we move a character from the lookahead buffer to the LZ77 win-dow, we need to create a hash table entry that references the character's positionwhen added to the buffer. The hash value for the new hash entry needs to be cal-culated using values in the lookahead buffer because the hash function requiresthe value of the next two characters, which will not yet have been added to theLZ77 window.

    A PNG encoder should maintain a fixed pool of hash table entriesrather than constantly allocating and freeing them. Since there are215 characters in the LZ77 window, that is the size of the hash entrypool as well.

    If the compressor adds a hash entry to a hash chain that is not empty, itshould be at the start of the chain rather than the end. This causes the mostrecently processed string to be matched first when searching the LZ77 forstrings. Strings with smaller distance values can be encoded using fewer bits.

    Huffman Table Generation

    A PNG encoder can either use the fixed Huffman codes shown in Table 14.9 orgenerate Huffman codes based on usage frequencies. It is simpler to implementfixed Huffman codes but there is obviously a penalty when it comes to compres-sion. Unless you are working with an application where compression speed iscritical, there is really no reason to use fixed Huffman codes.

    Chapter 6 covered Huffman coding as it applies to JPEG. The same processwith a few modifications will work with a Huffman encoder. The differencesbetween Huffman table generation in JPEG and PNG were listed in the previouschapter.

    When we used Huffman coding in JPEG, we generated the Huffman table bymaking two nearly identical passes over the image data. The first pass gatheredusage frequencies. After generating the Huffman tables from the usage frequen-cies, the second pass repeated the steps of the first pass except that the data wasHuffman encoded.

    Such a scheme can be used to encode PNG image data but there are a cou-ple of significant drawbacks. The main problem with having two nearly identicalpasses is the time required to compress an image. The process of searching theLZ77 window for matching strings is significantly more processing intensivethan is JPEG entropy encoding. Performing PNG compression process twicelengthens the compression time noticeably.

    A good solution to this problem is to store the literal/length and distancecodes in a buffer. A simple method for implementing such a buffer would be to

    238

  • Huffman Table Generation 239

    use an array of 2-byte integers. A length/distance code would be stored in thebuffer using 2 bytes while a literal value would use only 1 byte. The first passthrough the data gathers usage statistics and writes to this buffer. After generat-ing the Huffman tables, the second pass simply encodes the values stored in thebuffer. Algorithm 15.3 illustrates how the first pass would be implemented.

    How large does a buffer need to be to encode an entire image? The answeris that we do not need to hold the entire image in the buffer. The Deflate processallows the compressed data to be stored in multiple compressed blocks. The com-pressor can allocate a buffer at the start of image compression. When the bufferis full, the compressor ends the first pass. After the data is encoded in the secondpass, the encoder starts a new Deflate block and resumes the first pass where itleft off. In other words, instead of having two passes that process the entireimage, we have multiple alternating passes.

    Naturally the size of the buffer affects the size of the resulting image file.The smaller the buffer, the greater the number of compressed blocks, whichresults in more overhead from additional Huffman tables in the compressedstream. However, making the buffer too large can actually make the compressedimage larger. When too much image data is written to a single compressed block,so many Huffman codes get defined that the overhead from the Huffman codelengths becomes greater than the overhead from additional Huffman tables. Theoptimal buffer size varies from image to image. A compressor could conceivablydetermine when it should create a new block from the Huffman code usage. This,in conjunction with a large buffer, would produce the best compression.

    Once the Huffman codes have been generated for the length/literal and dis-tance tables, the tables have to be written to the compressed output stream. le

    Algorithm 15.3Gathering HuffmanUsage Statistics

    Procedure GatherDataBeginWhile MOREIMAGEDATA And COUNT + 1 < BUFFERSIZE DoIf LENGTH > 3 Then

    BeginIncrementLengthFrequency (ConvertLengthToCode (LENGTH))IncrementDistanceFrequency (ConvertDistanceToCode (DISTANCE))BUFFER (COUNT) = LENGTH + 256COUNT = COUNT + 1BUFFER (COUNT) = DISTANCECOUNT = COUNT + 1End

    ElseBeginBUFFER (COUNT) = CopyFromLookaheadBuffer (1)COUNT = COUNT + 1End

    Fin

  • 240 Creating PNG Files

    Huffman tables stored are Huffman encoded using the codes shown in the previ-ous chapter. The easiest method for generating the Huffman codes for encodingthe code lengths is a function that takes pointers to functions as parameters likethe ones we used for JPEG. This function is called twice each time lengths arewritten to the output filethe first time it is called with a function for gatheringHuffman statistics; the second time it is called with a function that outputs theHuffman-encoded lengths.

    Algorithm 15.4 illustrates the process for compressing Deflate blocks usinga buffer.

    One oddity of PNG compression is that Huffman codes within the com-pressed data are stored with the bits in reverse order. The Huffman decoding

    Algorithm 15.4DeflateCompression

    Procedure OutputDataBlockBeginII = 0While II < COUNT Do

    BeginIf BUFFER (II) > 255 Then

    BeginConvertLength (BUFFER (II), CODE, EXTRABITS, BITCOUNT)HuffmanEncodeUsingLengthTable (CODE)OutputLiteralBits (EXTRABITS, BITCOUNT)II = II + 1ConvertDistance (BUFFER (COUNT), CODE, EXTRABITS, BITCOUNT)HuffmanEncodeUsingDistanceTable (CODE)OutputLiteralBits (EXTRABITS, BITCOUNT)II = I + 1End

    ElseBeginHuffmanEncodeUsingLengthTable (BUFFER (COUNT))II = II + 1End

    EndHuffmanEncodeUsingLengthTable (ENDCODE)End

    Procedure CompressImageBeginWhile MOREIMAGEDATA Do

    BeginCatherDataCenerateHuffmanTablesWriteDeflateBlockHeaderOutputDataBlockEnd

    Fin

  • Filtering

    process reads the most significant bit of the code first and then adds the least sig-nificant bits. However, within PNG compressed data, bits are read from the leastsignificant bit to the most significant. If the Huffman codes were stored in thenormal order, the decompressor would read their least significant bits first, whichwould defeat the entire Huffman coding process. This example illustrates thereversal of bits in a Huffman code.

    Each data row in a PNG file is preceded by a byte that specifies the filter methodapplied to the row data before compression. The possible values for this byte andthe corresponding filter method were given in the previous chapter. The filterfunctions are similar to their inverses shown previously.

    unsigned short ReverseHuffmanCode (unsigned short input){

    unsigned short value = 0 ;for (unsigned int ii = 0 ; ii < 16 ; ++ ii){if ((input & (1

  • Creating PNG Files

    Why Use Filtering?The purpose of applying row filters is to make the image data more compress-ible. Suppose you have an image containing a gradient background where thecolor of each pixel varies by a fixed amount from the pixel next to it. This isfamiliar to Windows users from its use in software installation screens. For a blueto black gradient, the color data of a typical row would look something like

    0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 …

    While the data has a very orderly pattern, there are no repeating strings of 3bytes or greater, thus reducing its compressibility through the Deflate process. Ifthis same data row were to be run through the sub filter defined above, the datawould become

    0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 …

    which compresses much better in a PNG file. It turns out that filtering is gener-ally the best method for improving the compression of an image file. Using fil-ters can generally reduce the image size by about 30-40%.

    On the other hand, filtering can increase the size of a PNG file. For imagesthat use a color palette, no filtering should be used. Filtering takes advantage ofsimilar adjacent color values, but palette indices have no relation to the actualcolor value. Since filtering operates on bytes rather than bits, using filters withimages with bit depths of fewer than 8 does not produce the same sort of resultsas it does with larger bit depths. Consequently, filtering would be effective withthese images only in the rarest of cases.

    What Filter to Use?Filters are applied on a row-by-row basis and can be changed from row to row.This raises the question of which filter is best for a given row, the answer towhich is that we really do not know. This remains an area for experimentationand discovery.

    The PNG specification suggests performing all filters on each row. Each fil-tered value is treated as a signed byte (-128..127) and all are then summedtogether. The filter that produces the smallest sum is selected. Another possibil-ity is to find the filter that produces the longest repetitions of the same values.

    The simplest method for an encoder to automatically select a filter is to notuse filtering for images that use a palette or for images with a bit depth of fewerthan 8 bits. For other image types, the sub filter should be used for the first rowand the Paeth filter for the remaining rows. In most situations, this method doesnot produce results that are significantly worse than either of the methodsdescribed earlier.

    242

  • 243

    Conclusion

    In this chapter we have covered the process for creating PNG files, which isessentially the reverse of the one used to read them. As in JPEG, the implementerof a compressor has to make arbitrary choices about the how to do the compres-sion process, such as how big to make the IDAT chunks, when to create a newDeflate block, how far to search in the hash chains, and which filters to use.Methods for selecting the best filter are still an area of exploration. The PNG for-mat should become more common in the near future.

    The source code for this chapter on the accompanying CD-ROM is a PNGencoding class, PngEncoder, which uses a PNG Huffman encoding class that isnearly identical to the equivalent JPEG class shown in Chapter 6. The only sig-nificant differences are in the BuildTable function. The PNG version does nothave a special value to ensure that no Huffman code consists of all 1-bits and itensures that the ordering of Huffman codes matches the ordering of the values.

    The encoder class's SetUseFiIters function specifies whether or not fil-ters are used in the compression process. The SetCompressionLevel functioncontrols the maximum depth to which the hash chains are searched and theSetBlocksize function controls the size of the compression buffer.

    There is also a sample encoder that converts Windows BMP files to PNGformat. The command sequence for this application is

    ENCODER (-f -F -M) input.bmp output.png

    -f Use Filters-F Use Fastest Compression-M Use Maximum Compression

    This brings to an end our discussion of the PNG format and with it an endto the book. We hope that you have learned how to read and write images usingthe most common formats.

    Conclusion

  • Glossaire

    AC Coefficient In JPEG, all of the DCT coefficients except for the single lowest-order coefficient. These coefficients represent the addition of cosine functions ofincreasing frequency.Additive Color Model A color model where component values add color to black.Higher component values produce colors closer to white.Alpha Channel A pixel value (in addition to the color components) that representsthe transparency of the pixel.Baseline JPEG A subset mode of sequential JPEG where the number of tables isrestricted and the sample precision must be 8 bits.Big-Endian The representation of integers with the individual bytes ordered frommost to least significant. See Little-Endian.Bitmap Format An image format where the data consists of a set of values thatrepresents the color at discrete points or pixels.Coefficient See DCT Coefficient.Chrominance A component in a color model that represents the color of a pixelas opposed to its intensity. In the YCbCr color model Cb and Cr are chrominancecomponents.Chunk The basic division of a PNG file.Color Model A method for specifying how colors are represented. Most colormodels represent colors using three dimensions or components. RGB, YCbCr, andCMYK are examples of color models.Color Quantization The process of reducing the number of colors in an image.Usually quantization is used to allow an image to be displayed on a device with a lim-ited number of colors or to store the image in a file format that does not support asmany colors as the original image.

    245

  • Glossaire

    Colorspace The set of all colors that can be represented using a particular colormodel.

    Component One of a set of values used to represent a pixel in a particular colormodel. Most color models represent a color value using three component values.

    CRC Cyclical Redundancy Check. A polynomial-based method for detectingcorrupted data.

    Data Unit In JPEG, an 8 8 block of sample values for a single component.

    DC Coefficient The lowest-order DCT coefficient. It represents a constant value.

    DCT Discrete Cosine Transform. A mathematical process that converts a set of val-ues into an equivalent representation as the sum of cosine functions.

    Deflate The compression process used in PNG. It is a variant of the LZ77 processthat incorporates Huffman coding.

    Down-Sampling The process of reducing the resolution of a component in animage.

    Frame In JPEG, a group of one or more scans. For the JPEG modes in commonuse a frame is indistinguishable from an image.

    Gamut The range of colors that can be displayed on a particular output device.

    Gamma A model commonly used to correct colors in an image based upon theproperties of the system and the viewing environment.

    Hierarchical JPEG A little used JPEG mode where the image is broken into anumber of frames that refine the image.

    Huffman Coding A compression technique that uses variable-length codes to rep-resent data.

    Inflate The decompression process used in PNG. It is the reverse of the Deflateprocess.

    Interleaved Scan In JPEG, a scan that consists of more than one component.Interlaced Image An image that is not displayed sequentially, but rather by usinga pattern of lines or pixels.

    JFIF JPEG File Interchange Format. The format used for JPEG files.JPEG-LS A new JPEG lossless compression technique.Little-Endian A format for representing integers where the individual bytes arcordered from least to most significant.

    Logical Screen In GIF, a logical display area for the images stored in the file. Theindividual images specify their size and location within the logical screen.

    Lossy Compression A compression method that creates a close approximation ofthe original data. Lossy compression methods discard information that is consideredless important in order to increase compression.

    246

  • Lossless Compression A compression method that allows an exact copy of theoriginal image to be retrieved.

    Lossless JPEG A rarely used JPEG mode that implements a lossless compressiontechnique. Lossless JPEG is now considered obsolete.

    Luminance A color component that represents brightness.

    LZ A family of compression algorithms named after their creators, AbrahamLempel and Jacob Ziv.

    LZW (Lempel-Ziv-Welch) The LZ variant used in GIF.MCU (Minimum Coded Unit) In JPEG, the number of data units that are encodedas a group.

    Median Cut Algorithm Heckbert's algorithm for color quantization.

    Network Order Identical to "big-endian." It refers to the fact that in InternetProtocol integers are transmitted with the most significant byte first.

    Noninterleaved Scan In JPEG, a scan that contains only one component.

    Pixel A discrete location on a display device or an individual point within a bitmapimage format.

    Point Transform The process used to reduce the precision of data in progressiveJPEG when successive approximation is used. For DC coefficients, the point trans-form is a bit shift. For AC coefficients, the point transform is integer division.

    Progressive JPEG A JPEG mode where the image is divided into multiple scans.The initial scans are a coarse representation of the image. Subsequent scans refinethe image.

    Quantization In JPEG, the process for reducing the number of DCT coefficientsused to represent a data unit. See also Color Quantization.Raster Format Identical to "bitmap format."

    RGB Colorspace A colorspace where the components represent the relativeamounts of red, green, and blue light to be added to black.

    RLE (Run Length Encoding) A compression method where consecutive runs ofthe same value are encoded using run-length/value pairs.

    Sampling Frequency In JPEG, the relative frequency at which a component issampled with respect to the other components in the image.

    Sample Precision The number of bits used to represent a component value.

    Scan In JPEG, a set of compressed data that represents a single pas through theimage for one or more components.

    Sequential JPEG A JPEG mode where the image is stored from top to bottom, leftto right.

    247Glossary

  • Glossaire

    Spectral Selection In Progressive JPEG, the process of dividing componentsinto a range of spectral bands or DCT coefficients, used by all progressive JPEGimages. Spectral selection can optionally be used in conjunction with successiveapproximation.

    SPIFF (Still Picture Interchange File Format). The official JPEG file format. It isintended to replace JFIF.

    Subtractive Color Model A color model where components subtract color fromwhite. Higher component values create colors closer to black.

    Successive Approximation In Progressive JPEG, the process of dividing compo-nents into multiple scans by reducing the precision of the data in the initial scan andusing subsequent scans to refine the data. Successive approximation is not requiredin progressive JPEG.

    Truecolor Any system where 224 or more colors can be represented simultane-ously. The name reflects the fact that this is approximately the limit of colors humanscan distinguish.

    Up-Sampling The process of increasing the resolution of a color component.

    Vector Format A graphics format where images consists of a sequence of drawingcommands.

    XYZ Colorspace The three-component color model defined by the CommissionInternationale de l'Eclairage (CIE) in 1931. It defines absolute, rather than relative,colors.

    YCbCr Colorspace The color model used in JPEG. YCbCr uses three componentsthat represent luminance (Y) and chrominance (Cb and CR).

    248

  • Bibliographie

    Anton, Howard, Elementary Linear Algebra, John Wiley & Sons, New York, NY,1981.

    Blinn, Jim, Jim Blinn's Corner: Dirty Pixels, Morgan Kaufmann, San Francisco,CA, 1998.

    Boutell, Thomas et al., "PNG Specification," Version 1.0, PNG DevelopmentGroup, October 1996.

    Brown, C. Wayne and Shepherd, Barry J., Graphics File Formats, Manning,Greenwich, CT, 1995.

    Burden, Richard L., Faires, J. Douglas, Reynolds, Albert C., Numerical Analysis,Prindle, Weber & Schmidt, Boston, MA, 1981.

    Campbell, Joe, C Programmer's Guide to Serial Communications, Howard W.Sams & Company, Carmel, IN, 1987.

    Deutsch, L. Peter and Gailly, Jean-Loup, "ZLIB Compressed Data FormatSpecification," Version 3.3, RFC 1950, 1996.

    Deutsch, L. Peter, "DEFLATE Compressed Data Format Specification," Version1.3, RFC 1951, 1996.

    Foley, James D., van Dam, Andries, Feiner, Steven K., and Hughes John F.,Computer Graphics Principles and Practice, Addison-Wesley, Reading, MA,1996.

    CompuServe, Inc, "Graphics Interchange Format (GIF) Specification,"CompuServe, Columbus, OH, 1987.

    CompuServe, Inc, "Graphics Interchange Format (GIF) Specification," Version89a, CompuServe, Columbus, OH, 1989.

    249

  • Bibliographie

    Graham, Ian S., The HTML Source Book, John Wiley & Sons, New York, NY,1997.

    Hamilton, Eric, "JPEG File Interchange Format," Version 1.02, C-CubeMicrosystems, September 1, 1992.

    Heckbert, Paul, "Color Image Quantization for Frame Buffer Display," ACMComputer Graphics Journal, Volume 16, Number 3, July 1982.

    Huffman, D. A., "A Method for the Construction of Minimum RedundancyCodes," Proceedings of the IRE, Volume 40, Number 9, pages 1098-1101.

    JPEG (1994), Digital Compression and Coding of Continuous-tone Still Images,Part I: Requirements and Guidelines. ISE/IEC IS 10918-1, American NationalStandards Institute, New York, NY, 1994.

    Lindley, Craig A., Photographic Imaging Techniques in C++, John Wiley & Sons,New York, NY, 1995.

    Microsoft Corporation, Win32 Programmer's Reference Volume 5: Messages,Structures, and Macros, Microsoft Press, Redmond, WA, 1993.

    Murray, James D. and vanRyper, William, Encyclopedia of Graphics File Formats,O'Reilly & Associates, Sebastopol, CA, 1994.

    Nelson, Mark, The Data Compression Book, M&T Books, San Mateo, CA, 1992.Nye, Adrian, Xlib Programming Manual, O'Reilly & Associates, Sebastopol, CA,

    1988.Pennebaker, William B. and Mitchell, Joan L., JPEG Still Image Data

    Compression Standard, Van Nostrand Reinhold, New York, NY, 1993.Porter, Thomas and Duff, Tom, "Compositing Digital Images," Computer

    Graphics, Volume 18, Number 3, July 1984.Ramabadran, Tenkasi V. and Gaitonde, Sunil S., "A Tutorial on CRC

    Computations," IEEE Micro, August 1988, pages 62-75.Rao, K. R. and Yip, P., Discrete Cosine Transform, Academic Press, New York,

    NY, 1990.Rimmer, Steve, Windows Bitmapped Graphics, Windcrest Books, Blue Ridge

    Summit, PA, 1993.Scheifler, Robert W. and Gettys, James, X Window System, Digital Press, Bedford,

    MA, 1990.Stroustrup, Bjarne, The C++ Programming Language, Third Edition, Addison-

    Wesley, Reading, MA, 1997.Swan, Tom, Inside Windows File Formats, SAMS, Indianapolis, IN, 1993.Welsh, Terry, "A Technique for High-Performance Data Compression," IEEE

    Computer, Volume 17, Number 6, June 1984, pages 8-19.

    250

  • Ziv, J. and Lempel, A., "A Universal Algorithm for Sequential Data Compression,"IEEE Transactions on Information Theory, Volume 23, Number 3, May 1977,pages 337-343.

    Ziv, J. and Lempel, A., "Compression of Individual Sequences via Variable-RateCoding," IEEE Transactions on Information Theory, Volume 24, Number 5,September 1978, pages 530-536.

    Internet SitesSince Web sites have a tendency to move or disappear, rather than creating anexhaustive list, we are only listing those sites that we consider the most useful.

    JPEGwww.jpeg.orgHome of the JPEG committeewww.ijg.orgIndependent JPEG Group

    PNGwww.cdrom.com/pub/png/The PNG Home Pagewww.cdrom.com/pub/infozip/zlib/ZLIB Home Pagewww.cdrom.com/pub/mngMultiple Image Network Graphics

    GIFwww.geocities.co.jp/SiliconValley/3453/gif_info/index_en.htmlGIF Info Pagemembers.aol.com/royalefGIF Animation on the WWW

    Generalwww.wotsit.orgWotsit's Format Pagewww.dcs.ed.ac.uk/~mxr/gfx/The Graphics File Formats Page

    Bibliography 251

  • Index

    AC class, 50AC coefficients

    for DCT values, 81-83in progressive JPEG

    in decoding, 155-160in spectral selection, 149-150

    in sequential-mode JPEGin decoding, 95-98in encoding, 115, 117

    zigzag ordering for, 89AC point transforms, 150AC scans, 163-168ACFirstDataUnit procedure, 157ACRefineDataUnit procedure, 159-160Adam 7 interlacing, 202-203, 227-228Addition, efficiency of, 133Additive color models, 8Adler, Mark, 216Adler Checksum field, 222-223Alpha channels, 196-198Ancillary chunks, 206-212Animated GIF format, 186-187APP markers, 48-50, 111Application extension blocks, 177Application ID field, 178, 186Application-specific data, markers for, 49-50, 111Arithmetic in DCT optimization, 137-138

    Associativity of matrix multiplication, 86Authentication Code field, 178, 186Author field, 210

    Background colorin GIF format, 173in PNG format, 206-207

    Background Color field, 173Baseline process, 36Baseline SOF frames, 106bcBitCount field, 25-27bcHeight field, 26-27bcPlanes field, 26bcSize field, 26bcWidth field, 26-27BestMatch procedure, 237bfOffbits field, 24, 27bfReserved1 field, 24bfReserved2 field, 24bfSize field, 24biBitCount field, 25biClrImportant field, 25biClrUsed field, 25biCompression field, 25-28Big-endian ordering, 14-15biHeight field, 25, 27biPlanes field, 25

    253

  • 254 Index

    biSize field, 25biSizeImage field, 25Bit Depth conversion, 231Bit Depth field, 204Bit Field field, 175Bit Fields field, 173, 177Bit ordering, 13-15bitimage.h file, 19Bitmap images, 3-4, 23

    color models for, 10compression in, 11, 28-29data ordering in, 23file structure for, 24-28

    BITMAPCOREHEADER structure, 25BITMAPFILEHEADER structure, 24BitmapImage class, 19-21BITMAPINFOHEADER structure, 24-25Bits per Pixel field, 173biWidth field, 25, 27biXPelsPerMeter field, 25biYPelsPerMeter field, 25bKGD chunks, 206-207Black body curve, 199Block Data field, 224Block Size field, 176-178, 186Block types in GIF format, 174Blue field

    in GIF format, 174in PNG format, 205

    Blue X field, 207Blue Y field, 207BlueOffset value, 20BmpDecoder program, 29-30BmpEncoder program, 29-30Boutell, Thomas, 190BuildScaledTables function, 148Bui1dTable function, 243BYTE datatypes, 19Byte ordering, 13-15

    in GIF format, 172in JPEG format, 41

    in PNG format, 190in Windows BMP format, 23in XBM format, 32

    C source code for XBM format, 31-32Cb component

    in JPEG sampling frequency, 41, 43-44quantization table for, 88

    Character Cell Height field, 176Character Cell Width field, 176Check Bits field, 222-223CHECK_RANGE preprocessor symbol, 20-21cHRM chunks, 207Chrominance in YCbCr color model, 6Chunks in PNG format, 190-192, 194-195

    critical, 203-206noncritical, 206-212

    CIE 1931 standard, 197Classes for common image formats, 19-21Clear code, 183CMYK (cyan, magenta, yellow, black) color

    model, 8-9Code lengths, generating, 65-73Code sizes in GIF format compression, 182-183Coding, Huffman. See Huffman codingCollisions in hash tables, 235-237Color

    color models for, 5-10in GIF format, 171-174in JPEG format, 110in PNG format

    correction of, 231device-independent, 197-200Gamma value in, 201-202representation of, 195-197

    in Windows BMP format, 25-26Color maps in BitmapImage, 20Color quantization, 16-18Color Table Sort Flag field, 173Color tables in GIF format, 172-174Color Type field, 204

  • Index 255

    ColorMap function, 20Colorspace, 5, 55ColorUsage structure, 21Columns in matrices, 85-86COM (comment) markers, 48, 50Comment extension blocks, 177Comment field, 210Comments

    in GIF format, 177JPEG markers for, 48-50in PNG format, 210

    Common image formats, 18-21Commutativity of matrix multiplication, 86Compatibility, matrix, 86Component division in JPEG format, 149-151Compress procedure, 180Compressed Blocks field, 223Compressed Text field, 212CompressImage procedure, 240Compression and compressed data, 10-12

    in GIF format, 178-179code sizes in, 182-183compression methods, 179-181decompression methods, 181-182dictionary structure in, 183-185

    in JPEG format, 36-39, 49, 105-111lossless vs. lossy, 12-13in PNG format, 222-227in Windows BMP format, 28-29

    Compression Level field, 222-223Compression Method field, 204, 212, 223Converting

    byte order, 14-15for common image formats, 18-19

    CopyData function, 218-219Copyright field, 210Cosine function

    in matrix factoring, 122-126in quantization, 139

    CountBits function, 163CountsToLengths procedure, 71

    Cr componentin JPEG sampling frequency, 41, 43-44quantization table for, 88

    CRC (Cyclic Redundancy Check), 192-194CRC field, 191Crc function, 193CrcByte function, 193Creation Time field, 210Critical chunks, 191, 203-206Cyan, magenta, yellow, black (CMYK) color

    model, 8-9Cyclic Redundancy Check (CRC), 192-194

    DAC markers, 48Data blocks, 175Data field, 191data_units array, 101Data units in JPEG format

    progressivedecoding, 154-160encoding, 162-165

    sequential-modedecoding, 94-96encoding, 115-117

    Datatype definitions, 19datatype.h file, 19Day field, 211DC class, 50DC coefficients

    for DCT values, 81-83in progressive JPEG

    in decoding, 154in spectral selection, 149-150

    in sequential-mode JPEGin decoding, 94-95, 97-98in encoding, 115-116

    DC point transforms, 150DC scans, 162-163DCT (discrete cosine transform), 11

    in JPEG format, 44, 77, 98matrix operations in, 85-87

  • 256 Index

    DCT (discrete cosine transform) (continued)in one dimension, 78-84quantization in, 88-89in two dimensions, 84-85, 87zigzag ordering in, 89

    optimizing, 121matrix factoring in, 121-137quantization in, 138-148scaled integer arithmetic in, 137-138

    DCT function, 78DecodeBlock procedure, 222DecodeDC function, 95DECODER application, 103DecodeSequential function, 101Decoding

    Huffman coding, 73-75progressive JPEG

    AC scans in, 155-160DC scans in, 154

    sequential-mode JPEG, 91data units, 94-96DCT coefficient processing in, 98examples, 97-98, 100-103MCU dimensions in, 91-93overview, 100restart marker processing in, 99-100up sampling in, 99

    DecompressingGIF format, 181-182PNG format

    compressed data blocks in, 223-227compressed data format in, 222-223filtering in, 228-230Huffman coding in, 221-222, 224-227image data, 215-221writing decompressed data to images in,

    227-231Define Huffman Table (DHT) markers, 48, 50-51,

    111Define Quantization Table (DQT) markers, 48,

    51-52, 111

    Define Restart Interval (DRI) markers, 48, 51Deflate compression process, 216-222, 234-238Delay Time field, 177Description field, 210Device-independent color, 197-200DHP markers, 48DHT (Define Huffman Table) markers, 48, 50-51,

    111Dictionary-based compression schemes, 178-179Dictionary structure in GIF format, 183-185Disclaimer field, 210Discrete Cosine Transform. See DCT (discrete

    cosine transform)Disposal Method field, 177Distances field, 224DivideBlueInHalf function, 17DNL markers, 48-49Dot product operations, 85Down sampling, 43-44, 112-113DQT (Define Quantization Table) markers, 48,

    51-52, 111DRI (Define Restart Interval) markers, 48, 51Dynamic Huffman codes, 224-226

    EightBitQuantization function, 21EncodeACFirst procedure, 164-165EncodeACRefine procedure, 164-165, 167-168EncodeDataUnit procedure, 116, 119EncodeDCFirst procedure, 163EncodeDCRefine procedure, 164EncodeInterleaved procedure, 114EncodeNonInterleaved procedure, 113EncodeSequential function, 120Encoding in JPEG format

    progressive, 162-165sequential-mode, 111, 115-117

    End code in GIF format compression, 183End-Of-Band (EOB) runs, 155-156End of Image (EOI) markers, 47-48, 52Escape codes in RLE8 compression, 28-29EXP markers, 48

  • Index 257

    Expand procedure, 182, 185Extend function, 95, 97Extended SOF frames, 106Extension blocks, 175-178extension_code field, 55-56Extension code in GIF format, 174extension_data field, 55-57Extension header format, 57Extension Type field, 186

    Factored DCT (FDCT), 139-148File headers in Windows BMP format, 24File processing in JPEG format, 151-152File structure

    in GIF format, 172-178in JPEG format, 47-54, 111in PNG format, 190-195in Windows BMP format, 24-28in XBM format, 31-33

    Filter Method field, 204Filtering in PNG format, 228-230, 241-242Final field, 224FindColorUsage function, 21FindLongestMatchInLZ77Window function, 235First scans in progressive JPEG

    AC, 155-156, 163-164DC, 154, 162

    FirstDCDataunit procedure, 154Fixed Huffman codes, 226-227Frame buffers, 2Frame markers in JPEG format, 53Frames in JPEG hierarchical compression mode,

    37

    Gailly, Jean-Loupe, 216gAMA chunks, 201-202, 207-208Games, 3Gamma Value field, 208Gamma value in PNG format, 201-202, 208GatherC procedure, 118GatherData procedure, 239

    GatherDC procedure, 118GatherFrequencies procedure, 16Gaussian elimination, 126GenerateHuffmanCodes procedure, 70GetRGB function, 21GIF format, 171

    animated, 186-187byte ordering in, 172color models for, 10compressed data format in, 178-179

    code sizes in, 182-183compression methods, 11, 179-181decompression methods, 181-182dictionary structure in, 183-185

    file structure in, 172-178interlacing in, 178legal problems in, 187-188uncompressed, 188

    Global Color Table Flag field, 173Global Color Table Size field, 173-174Global color tables in GIF format, 172-174Global screen description in GIF format, 172-173Graphic control extension blocks, 176-177Grayscale

    in JPEG format, 110in PNG format, 196-197

    Grayscale devices, 6GrayscaleConvert function, 101Green field

    in GIF format, 174in PNG format, 205

    Green X field, 207Green Y field, 207GreenOffset value, 20

    Hamilton, Eric, 40Hash tables, 234-238Header field, 223Headers

    in GIF format, 172in JFIF format, 56

  • 258 Index

    Headers (continued)in PNG format, 223

    Height field, 204Hierarchical JPEG compression mode, 37, 39hIST chunks, 208Horizontal sampling frequency in JPEG format,

    43Hot spots in XBM format, 32Hour field, 211HSB (Hue-Saturation-Brightness) color model, 6HUFFCOMP program, 75Huffman coding, 11

    in JPEG format, 44code lengths in, 65-73decoding, 73-75example, 63-65progressive, 162restrictions in, 71-73sequential-mode, 36usage frequencies in, 61-63

    in PNG format, 221-222, 224-227Huffman tables

    in JPEG formatmarkers for, 50-51progressive, 153-154sequential-mode, 106, 117-119

    in PNG format, 238-241HuffmanDecode function, 74

    IDAT chunks, 191, 195, 205-206IDCT (Inverse Discrete Cosine Transform), 77-78Identifier field, 56Identity matrices, 87IEND chunks, 191, 195, 206IHDR chunks, 191, 195, 203-204Image blocks in GIF format, 174-175Image data in PNG format, decompressing,

    215-221Image headers in Windows BMP format, 24-25Image Height field, 175Image Width field, 175

    Images, representation of, 1-2Initialize procedure, 180InitializeDictionary procedure, 184Integer arithmetic in DCT optimization, 137-138Interface Flag field, 175Interlace Method field, 204Interlacing

    in GIF format, 178in PNG format, 202-203, 227-228

    InterleavedPass function, 120Interleaving in JPEG format, 45-46

    MCU dimensions in, 92-93sequential-mode, 113-114

    Inverse Discrete Cosine Transform (IDCT), 77-78InverseDCT function, 134-137IRENE.BMP file, 11-12IRENE.JPG file, 80-83, 87

    JFIF (JPEG File Interchange Format), 40, 55-57JPEG format, 35

    byte ordering in, 41color models for, 10compression methods for, 11, 36-39DCT in, 44, 77, 98

    matrix operations in, 85-87in one dimension, 78-84optimizing. See Optimizing DCTquantization in, 88-89in two dimensions, 8485, 87zigzag ordering in, 89

    file format in, 47-54Huffman coding in, 44

    code lengths in, 65-73decoding, 73-75example, 63-65restrictions in, 71-73usage frequencies in, 61-63

    interleaved and noninterleaved scans in, 45-46JFIF file format for, 40, 55-57operation of, 44progressive, 149

  • Index 259

    JPEG format (continued)component division in, 149-151data unit decoding in, 154-160data unit encoding in, 162-165Huffman coding in, 162Huffman tables in, 153-154MCUs in, 153preparing for, 160-161processing files in, 151-152processing scans in, 152-153

    sampling frequency in, 41-44sequential-mode, 91

    color and grayscale in, 110compression parameters for, 105-111data unit decoding in, 94-96data unit encoding in, 115-117DCT coefficient processing in, 98decoding examples in, 97-98, 100-103decoding overview, 100down sampling in, 112-113encoding in, 111example for, 120Huffman tables for, 106, 117-119interleaving in, 113-114MCU dimensions in, 91-93output file structure in, 111quantization tables for, 106-107restart markers in, 99-100, 109-110sampling frequencies in, 108-109scans in, 107-108up sampling in, 99

    SPIFF file format for, 40-41JpegDecoder class, 101-103JpegDecoderComponent class, 101JpegDecoderDataUnit class, 148JpegDecoderQuantizationTable class, 102, 148JPEGDUMP program, 57-59JpegEncoder class, 120JpegEncoderComponent class, 120JpegEncoderDataUnit class, 148JpegEncoderQuantizationTable class, 148

    JpegHuffmanDecoder class, 75, 102JpegHuffmanEncoder class, 75JPG markers, 48

    Keyword field, 210, 212

    Left Position field, 175Legal problems in GIF format, 187-188Lempel, Abraham, 179Length field, 191, 224Lengths field, 224LengthsToCounts procedure, 70Limit function, 134LimitTo16Bits procedure, 73Literals field, 224Little-endian ordering, 14-15Local Color Table Flag field, 174-175Local Color Table Size field, 175Logical screen descriptors, 172-173, 187Logical Screen Height field, 173Logical Screen Width field, 173Loop application extension in GIF format,

    186-187Lossless compression, 12-13, 39Lossy compression, 12-13Luminance in YCbCr color model, 6LZ encoding, 11LZ77 compression, 216-217LZW compression, 178-185

    MakeCrcTable function, 194Markers in JPEG format, 47-49

    APP, 49-50COM, 50DHT, 50-51DQT, 51-52DRI, 51EOI, 52RST, 52SOF, 53SOI, 52

  • 260 Index

    Markers in JPEG format (continued)SOS, 53-54

    Matrix operations and factoringin DCT, 85-87, 121-137in quantization, 138-148

    Matrix transpose operation, 86MCUs (minimum coded units) in JPEG format,

    45-46markers for, 51progressive, 153sequential-mode, 91-93

    Median cut quantization, 16-18Memory

    for bitmap graphics, 3-4for video, 1-2

    Miller, Victor, 187Minimum coded units (MCUs) in JPEG format,

    45-46markers for, 51progressive, 153sequential-mode, 91-93

    Minute field, 211Month field, 2 1 1Morse Code, 62Multiplication

    efficiency of, 133in matrices, 85-86

    N M matrices, 85Names for PNG chunks, 191Network order, 14NLength field, 224Noncritical chunks, 206-212Noninterleaved scans, 45-46NoninterleavedPass function, 120

    One dimension, DCT in, 78-84Optimizing DCT, 121

    matrix factoring in, 121-137quantization in, 138-148scaled integer arithmetic in, 137-138

    Ordering bytes and bits, 13-15in GIF format, 172in JPEG format, 41in PNG format, 190in Windows BMP format, 23in XBM format, 32

    Orthogonal matrices, 87Output file structure in JPEG format, 1 1 1OutputCode procedure, 184-185OutputDataBlock procedure, 240Overflow in scaled integer arithmetic, 138

    PaethPreductor function, 230Palette field, 56Palettes

    in JFIF format, 56in PNG format, 196, 205vs. true color, 9-10

    Patents for GIF format, 187-188PNGDUMP program, 212-213pHYs chunks, 208-209Pixel Aspect Ratio field, 173Pixel density, 55Pixels and pixel data, 1-2

    in BitmapImage, 20in JFIF format, 55-56in Windows BMP format, 27-28in XBM format, 32

    Pixels field, 56Pixels per Unit X field, 208Pixels per Unit Y field, 208Plain text extension blocks, 176PLTE chunks, 191, 195, 205PNG Developmental Group, 190PNG format, 189-190

    byte ordering in, 190chunks in, 190-192, 194-195

    critical, 203-206noncritical, 206-212

    color in, 10correction of, 231

  • Index 261

    PNG format (continued)device-independent, 197-200Gamma value in, 201-202representation of, 195-197

    compression methods for, 11creating files in, 233

    Deflate process for, 234-238filtering in, 241-242Huffman table generation in, 238-241

    decompressingcompressed data blocks in, 223-227compressed data format in, 222-223filtering in, 228-230Huffman coding in, 221-222, 224-227image data, 215-221writing decompressed data to images in,

    227-231file format of, 190-195history of, 190interlacing in, 202-203, 227-228

    PngDecoder program, 232PngEncoder class, 243Point transforms, 150Precision

    sampling, 5-6in scaled integer arithmetic, 138

    Preset Dictionary field, 223PrintAC procedure, 118PrintDC procedure, 118PrintEOBRun procedure, 164, 167Printers, images displayed by, 2Private chunks, 190-192Processors, byte and bit ordering in, 14-15ProcessPass procedure, 227Progressive JPEG format, 36-38, 149

    component division in, 149-151data unit decoding in, 154-160data unit encoding in, 162-165Huffman coding in, 162Huffman tables in, 153-154MCUs in, 153

    preparing for, 160-161processing files in, 151-152processing scans in, 152-153

    Public chunks, 190-191

    Quantization and quantization tables, 16-18in DCT, 88-89, 139-148in JPEG format, 44

    creating, 106-107markers for, 51-52

    QuantizeSourceImage function, 21

    Raster graphics. See Bitmap imagesReadHuffmanTable function, 102ReadImage function, 101-102Reading XBM format files, 33-34ReadLengths procedure, 225-226ReadMarker function, 101-102ReadQuantization function, 102ReadSequentialInterleavedScan function, 102ReadSequentialNonInterleaved-Scan function,

    102ReadStartOfFrame function, 102ReadStartOfScan function, 102Red field

    in GIF format, 174in PNG format, 205

    Red-Green-Blue (RGB) color model, 5-6, 110Red X field, 207Red Y field, 207RedOffset value, 20RefineAC procedure, 158RefineBand procedure, 166-167RefineDCDataunit procedure, 154RefineEOBRun procedure, 166Refining scans in progressive JPEG

    AC, 156-160, 166-168DC, 154, 163

    Repeat Count field, 186Representation of images, 1-2RES markers, 48

  • 262 Index

    Restart (RST) markers, 48, 52, 99-100, 109-110ReverseAverage function, 230ReverseExtend function, 115ReverseHuffmanCode function, 241ReversePaeth function, 230ReverseSub function, 230ReverseUp function, 230RGB (Red-Green-Blue) color model, 5-6, 110RGB field, 56RGB triples, 196rgbBlue field, 26RGBConvert function, 101rgbGreen field, 26RGBQUAD structure, 26rgbRed field, 26RGBTRIPLE structure, 26Right Position field, 175RLE (run length encoding) compression, 10-11RLE4 compression, 29RLE8 compression, 28-29Rotations in matrix factoring, 133Rows in matrices, 85-86RST (Restart) markers, 48, 52, 99-100, 109-110Run length encoding (RLE) compression, 10-11

    Safe-to-copy chunks, 192Sample precision, 5-6Sampling

    in JFIF format, 55in JPEG format, 41-44, 99, 108-109, 112-113

    sBIT chunks, 209Scale factors for quantization values, 107Scaled integer arithmetic in DC optimization,

    137-138Scans and scan processing in JPEG format

    progressive, 152-153sequential-mode, 36, 107-108

    Screen descriptors, 172-173, 187SearchDictionary procedure, 180Second field, 211

    Separator field, 212Sequential-mode JPEG format, 36

    creatingcolor and grayscale in, 110compression parameters for, 105-111data unit encoding in, 115-117down sampling in, 112-113encoding in, 111Huffman tables for, 106, 117-119interleaving in, 113-114output file structure in, 111quantization tables for, 106-107restart markers in, 109-110sampling frequencies in, 108-109scans in, 107-108

    decoding, 91data units, 94-96DCT coefficient processing in, 98examples, 97-98, 100-103MCU dimensions in, 91-93overview, 100restart marker processing in, 99-100up sampling in, 99

    SetBlocksize function, 243SetCompressionLevel function, 243SetQuality function, 120SetScanAttributes function, 120SetSize function, 20SetUseFilters function, 243Signature field, 172Signatures

    in GIF format, 172in JFIF format, 55in PNG format, 195

    Significant bits in PNG format, 209Size of bitmap graphics, 3-4SOF (Start of Frame) markers, 48, 53, 106Software field, 210SOI (Start of Image) markers, 47-48, 52, 1 1 1Sort field, 175

  • Index 263

    SOS (Start of Scan) markers, 48, 53-54in progressive JPEG format, 152-153in sequential-mode JPEG format, 111

    Source field, 210Sparse matrices, 126Spectral selection, 149-150, 161SPIFF (Still Picture Interchange File Format),

    40-41SplitAreaInHalf function, 21Stand-alone markers, 47-48STARS.BMP file, 12-13Start of Frame (SOF) markers, 48, 53, 106Start of Image (SOI) markers, 47-48, 52, 111Start of Scan (SOS) markers, 48, 53-54

    in progressive JPEG format, 152-153in sequential-mode JPEG format, 111

    Still Picture Interchange File Format (SPIFF),40-41

    Subscript operator, 20-21Subtractive color models, 8Successive approximation, 150-151, 161SwapBytes function, 14-15

    TEM markers, 48Terminator blocks, 174Terminator field

    in GIF format, 186in PNG format, 210

    Text Background Color field, 176tEXt chunks, 209-210Text field, 210Text Foreground Color field, 176Text Grid Height field, 176Text Grid Left field, 176Text Grid Right field, 176Text Grid Width field, 176Thumbnai1 field, 56Thumbnails in JFIF format, 55-56tIME chunks, 210-211Title field, 210

    Transparencyin GIF format, 177in PNG format, 196-197, 211, 231

    Transparent Color Flag field, 177Transparent Color Index field, 177Transpose operation, 86tRNS chunks, 211True color vs. palettes, 9-10Two dimensions, DCT in, 84-85, 87Type field, 191, 224

    UBYTE datatypes, 19Uncompressed GIF format, 188Unit Specifier field, 208-209Units field, 56Up-sampling, 43, 99UpdateAdler function, 222-223Upsample function, 101Usage frequencies in Huffman coding, 61-63User Input Flag field, 177

    Validation in JPEG encoding, 111Vector graphics, 3-4Vectors, 85Version field, 172Version major ID field, 56Version minor ID field, 56Vertical sampling frequency in JPEG format, 43Video controllers, 2Video memory, 1-2viewer.cpp program, 30

    Warning field, 210Wegman, Mark, 187Welsh, Terry, 179White Point X field, 207White Point Y field, 207Width field, 204Window Size field, 223Windows BMP format, 23

  • Y componentin JPEG sampling frequency, 41, 43-44quantization table for, 88

    YCbCr color model, 6-8, 110Ydensity field, 56Year field, 211Ythumbnail field, 56-57

    Zigzag ordering in DCT, 89Ziv, Jacob, 179ZLIB compression library, 216zTXt chunks, 211-212

    Windows BMP format (continued)compression in, 28-29data ordering in, 23file structure for, 24-28

    Windows metafiles, 3Writing

    PNG decompressed data to images, 227-231XBM format files, 33-34

    www.jpeg.org site, 46

    XBM format, 31-34Xdensity field, 56Xthumbnail field, 56-57XYZ color values, 197-200

    264 Index

  • #ecrire un livre heroic fantasy
    #écrire un livre salaire
    #écrire un livre original

    Laisser un commentaire