Main Page | Class Hierarchy | Alphabetical List | Class List | Directories | File List | Class Members | File Members | Related Pages

UnicodeBom.d File Reference

Go to the source code of this file.

Functions

class UnicodeBomTemplate (T)

Variables

module mango convert UnicodeBom
import mango convert Type
import mango sys ByteSwap
import mango convert Unicode


Detailed Description

Copyright (c) 2004 Kris Bell

This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for damages of any kind arising from the use of this software.

Permission is hereby granted to anyone to use this software for any purpose, including commercial applications, and to alter it and/or redistribute it freely, subject to the following restrictions:

1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment within documentation of said product would be appreciated but is not required.

2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.

3. This notice may not be removed or altered from any distribution of the source.

4. Derivative works are permitted, but they must carry this notice in full and credit the original source.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Version:
Initial version; December 2005
Author:
Kris

Definition in file UnicodeBom.d.


Function Documentation

class UnicodeBomTemplate  ) 
 

Convert unicode content

Unicode is an encoding of textual material. The purpose of this module is to interface external-encoding with a programmer-defined internal- encoding. This internal encoding is declared via the template argument T, whilst the external encoding is either specified or derived.

Three internal encodings are supported: char, wchar, and dchar. The methods herein operate upon arrays of this type. That is, decode() returns an array of the type, while encode() expect an array of said type.

Supported external encodings are as follow (from Unicode.d):

Unicode.Unknown Unicode.UTF_8 Unicode.UTF_8N Unicode.UTF_16 Unicode.UTF_16BE Unicode.UTF_16LE Unicode.UTF_32 Unicode.UTF_32BE Unicode.UTF_32LE

These can be divided into non-explicit and explicit encodings:

Unicode.Unknown Unicode.UTF_8 Unicode.UTF_16 Unicode.UTF_32

Unicode.UTF_8N Unicode.UTF_16BE Unicode.UTF_16LE Unicode.UTF_32BE Unicode.UTF_32LE

The former group of non-explicit encodings may be used to 'discover' an unknown encoding, by examining the first few bytes of the content for a signature. This signature is optional, but is often written such that the content is self-describing. When an encoding is unknown, using one of the non-explicit encodings will cause the decode() method to look for a signature and adjust itself accordingly. It is possible that a ZWNBSP character might be confused with the signature; today's unicode content is supposed to use the WORD-JOINER character instead.

The group of explicit encodings are for use when the content encoding is known. These *must* be used when converting back to external encoding, since written content must be in a known format. It should be noted that, during a decode() operation, the existence of a signature is in conflict with these explicit varieties.

See $(LINK http://www.utf-8.com/) $(LINK http://www.hackcraft.net/xmlUnicode/) $(LINK http://www.unicode.org/faq/utf_bom.html/) $(LINK http://www.azillionmonkeys.com/qed/unicode.html/) $(LINK http://icu.sourceforge.net/docs/papers/forms_of_unicode/)

Construct a instance using the given external encoding ~ one of the Unicode.xx types

Return the current encoding. This is either the originally specified encoding, or a derived one obtained by inspecting the content for a BOM. The latter is performed as part of the decode() method

Return the signature (BOM) of the current encoding

Convert the provided content. The content is inspected for a BOM signature, which is stripped. An exception is thrown if a signature is present when, according to the encoding type, it should not be. Conversely, An exception is thrown if there is no known signature where the current encoding expects one to be present

Perform encoding of content. Note that the encoding must be of the explicit variety by the time we get here

Scan the BOM signatures looking for a match. We scan in reverse order to get the longest match first

Swap bytes around, as required by the encoding

Configure this instance with unicode converters

Definition at line 112 of file UnicodeBom.d.

References assert(), from(), into(), pragma(), type(), UnicodeBomTemplate(), and version.

Referenced by UnicodeBomTemplate(), and UnicodeFileTemplate().


Variable Documentation

module mango convert UnicodeBom
 

Definition at line 39 of file UnicodeBom.d.

import mango convert Type
 

Definition at line 41 of file UnicodeBom.d.

import mango sys ByteSwap
 

Definition at line 43 of file UnicodeBom.d.

import mango convert Unicode
 

Definition at line 45 of file UnicodeBom.d.


Generated on Sat Dec 24 17:28:36 2005 for Mango by  doxygen 1.4.0