[ Index ]

PHP Cross Reference of Unnamed Project

title

Body

[close]

/se3-unattended/var/se3/unattended/install/linuxaux/opt/perl/lib/5.10.0/encoding/ -> warnings.pm (source)

   1  package encoding::warnings;
   2  $encoding::warnings::VERSION = '0.11';
   3  
   4  use strict;
   5  use 5.007;
   6  
   7  =head1 NAME
   8  
   9  encoding::warnings - Warn on implicit encoding conversions
  10  
  11  =head1 VERSION
  12  
  13  This document describes version 0.11 of encoding::warnings, released
  14  June 5, 2007.
  15  
  16  =head1 SYNOPSIS
  17  
  18      use encoding::warnings; # or 'FATAL' to raise fatal exceptions
  19  
  20      utf8::encode($a = chr(20000));  # a byte-string (raw bytes)
  21      $b = chr(20000);            # a unicode-string (wide characters)
  22  
  23      # "Bytes implicitly upgraded into wide characters as iso-8859-1"
  24      $c = $a . $b;
  25  
  26  =head1 DESCRIPTION
  27  
  28  =head2 Overview of the problem
  29  
  30  By default, there is a fundamental asymmetry in Perl's unicode model:
  31  implicit upgrading from byte-strings to unicode-strings assumes that
  32  they were encoded in I<ISO 8859-1 (Latin-1)>, but unicode-strings are
  33  downgraded with UTF-8 encoding.  This happens because the first 256
  34  codepoints in Unicode happens to agree with Latin-1.  
  35  
  36  However, this silent upgrading can easily cause problems, if you happen
  37  to mix unicode strings with non-Latin1 data -- i.e. byte-strings encoded
  38  in UTF-8 or other encodings.  The error will not manifest until the
  39  combined string is written to output, at which time it would be impossible
  40  to see where did the silent upgrading occur.
  41  
  42  =head2 Detecting the problem
  43  
  44  This module simplifies the process of diagnosing such problems.  Just put
  45  this line on top of your main program:
  46  
  47      use encoding::warnings;
  48  
  49  Afterwards, implicit upgrading of high-bit bytes will raise a warning.
  50  Ex.: C<Bytes implicitly upgraded into wide characters as iso-8859-1 at
  51  - line 7>.
  52  
  53  However, strings composed purely of ASCII code points (C<0x00>..C<0x7F>)
  54  will I<not> trigger this warning.
  55  
  56  You can also make the warnings fatal by importing this module as:
  57  
  58      use encoding::warnings 'FATAL';
  59  
  60  =head2 Solving the problem
  61  
  62  Most of the time, this warning occurs when a byte-string is concatenated
  63  with a unicode-string.  There are a number of ways to solve it:
  64  
  65  =over 4
  66  
  67  =item * Upgrade both sides to unicode-strings
  68  
  69  If your program does not need compatibility for Perl 5.6 and earlier,
  70  the recommended approach is to apply appropriate IO disciplines, so all
  71  data in your program become unicode-strings.  See L<encoding>, L<open> and
  72  L<perlfunc/binmode> for how.
  73  
  74  =item * Downgrade both sides to byte-strings
  75  
  76  The other way works too, especially if you are sure that all your data
  77  are under the same encoding, or if compatibility with older versions
  78  of Perl is desired.
  79  
  80  You may downgrade strings with C<Encode::encode> and C<utf8::encode>.
  81  See L<Encode> and L<utf8> for details.
  82  
  83  =item * Specify the encoding for implicit byte-string upgrading
  84  
  85  If you are confident that all byte-strings will be in a specific
  86  encoding like UTF-8, I<and> need not support older versions of Perl,
  87  use the C<encoding> pragma:
  88  
  89      use encoding 'utf8';
  90  
  91  Similarly, this will silence warnings from this module, and preserve the
  92  default behaviour:
  93  
  94      use encoding 'iso-8859-1';
  95  
  96  However, note that C<use encoding> actually had three distinct effects:
  97  
  98  =over 4
  99  
 100  =item * PerlIO layers for B<STDIN> and B<STDOUT>
 101  
 102  This is similar to what L<open> pragma does.
 103  
 104  =item * Literal conversions
 105  
 106  This turns I<all> literal string in your program into unicode-strings
 107  (equivalent to a C<use utf8>), by decoding them using the specified
 108  encoding.
 109  
 110  =item * Implicit upgrading for byte-strings
 111  
 112  This will silence warnings from this module, as shown above.
 113  
 114  =back
 115  
 116  Because literal conversions also work on empty strings, it may surprise
 117  some people:
 118  
 119      use encoding 'big5';
 120  
 121      my $byte_string = pack("C*", 0xA4, 0x40);
 122      print length $a;    # 2 here.
 123      $a .= "";        # concatenating with a unicode string...
 124      print length $a;    # 1 here!
 125  
 126  In other words, do not C<use encoding> unless you are certain that the
 127  program will not deal with any raw, 8-bit binary data at all.
 128  
 129  However, the C<Filter =E<gt> 1> flavor of C<use encoding> will I<not>
 130  affect implicit upgrading for byte-strings, and is thus incapable of
 131  silencing warnings from this module.  See L<encoding> for more details.
 132  
 133  =back
 134  
 135  =head1 CAVEATS
 136  
 137  For Perl 5.9.4 or later, this module's effect is lexical.
 138  
 139  For Perl versions prior to 5.9.4, this module affects the whole script,
 140  instead of inside its lexical block.
 141  
 142  =cut
 143  
 144  # Constants.
 145  sub ASCII  () { 0 }
 146  sub LATIN1 () { 1 }
 147  sub FATAL  () { 2 }
 148  
 149  # Install a ${^ENCODING} handler if no other one are already in place.
 150  sub import {
 151      my $class = shift;
 152      my $fatal = shift || '';
 153  
 154      local $@;
 155      return if ${^ENCODING} and ref(${^ENCODING}) ne $class;
 156      return unless eval { require Encode; 1 };
 157  
 158      my $ascii  = Encode::find_encoding('us-ascii') or return;
 159      my $latin1 = Encode::find_encoding('iso-8859-1') or return;
 160  
 161      # Have to undef explicitly here
 162      undef ${^ENCODING};
 163  
 164      # Install a warning handler for decode()
 165      my $decoder = bless(
 166      [
 167          $ascii,
 168          $latin1,
 169          (($fatal eq 'FATAL') ? 'Carp::croak' : 'Carp::carp'),
 170      ], $class,
 171      );
 172  
 173      ${^ENCODING} = $decoder;
 174      $^H{$class} = 1;
 175  }
 176  
 177  sub unimport {
 178      my $class = shift;
 179      $^H{$class} = undef;
 180      undef ${^ENCODING};
 181  }
 182  
 183  # Don't worry about source code literals.
 184  sub cat_decode {
 185      my $self = shift;
 186      return $self->[LATIN1]->cat_decode(@_);
 187  }
 188  
 189  # Warn if the data is not purely US-ASCII.
 190  sub decode {
 191      my $self = shift;
 192  
 193      DO_WARN: {
 194          if ($] >= 5.009004) {
 195              my $hints = (caller(0))[10];
 196              $hints->{ref($self)} or last DO_WARN;
 197          }
 198  
 199          local $@;
 200          my $rv = eval { $self->[ASCII]->decode($_[0], Encode::FB_CROAK()) };
 201          return $rv unless $@;
 202  
 203          require Carp;
 204          no strict 'refs';
 205          $self->[FATAL]->(
 206              "Bytes implicitly upgraded into wide characters as iso-8859-1"
 207          );
 208  
 209      }
 210  
 211      return $self->[LATIN1]->decode(@_);
 212  }
 213  
 214  sub name { 'iso-8859-1' }
 215  
 216  1;
 217  
 218  __END__
 219  
 220  =head1 SEE ALSO
 221  
 222  L<perlunicode>, L<perluniintro>
 223  
 224  L<open>, L<utf8>, L<encoding>, L<Encode>
 225  
 226  =head1 AUTHORS
 227  
 228  Audrey Tang
 229  
 230  =head1 COPYRIGHT
 231  
 232  Copyright 2004, 2005, 2006, 2007 by Audrey Tang E<lt>cpan@audreyt.orgE<gt>.
 233  
 234  This program is free software; you can redistribute it and/or modify it
 235  under the same terms as Perl itself.
 236  
 237  See L<http://www.perl.com/perl/misc/Artistic.html>
 238  
 239  =cut


Generated: Tue Mar 17 22:47:18 2015 Cross-referenced by PHPXref 0.7.1