Go to the source code of this file.
Functions | |
class | UnicodeBomTemplate (T) |
Variables | |
module mango convert | UnicodeBom |
import mango convert | Type |
import mango sys | ByteSwap |
import mango convert | Unicode |
This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for damages of any kind arising from the use of this software.
Permission is hereby granted to anyone to use this software for any purpose, including commercial applications, and to alter it and/or redistribute it freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment within documentation of said product would be appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.
3. This notice may not be removed or altered from any distribution of the source.
4. Derivative works are permitted, but they must carry this notice in full and credit the original source.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Definition in file UnicodeBom.d.
|
Convert unicode content Unicode is an encoding of textual material. The purpose of this module is to interface external-encoding with a programmer-defined internal- encoding. This internal encoding is declared via the template argument T, whilst the external encoding is either specified or derived. Three internal encodings are supported: char, wchar, and dchar. The methods herein operate upon arrays of this type. That is, decode() returns an array of the type, while encode() expect an array of said type. Supported external encodings are as follow (from Unicode.d): Unicode.Unknown Unicode.UTF_8 Unicode.UTF_8N Unicode.UTF_16 Unicode.UTF_16BE Unicode.UTF_16LE Unicode.UTF_32 Unicode.UTF_32BE Unicode.UTF_32LE These can be divided into non-explicit and explicit encodings: Unicode.Unknown Unicode.UTF_8 Unicode.UTF_16 Unicode.UTF_32 Unicode.UTF_8N Unicode.UTF_16BE Unicode.UTF_16LE Unicode.UTF_32BE Unicode.UTF_32LE The former group of non-explicit encodings may be used to 'discover' an unknown encoding, by examining the first few bytes of the content for a signature. This signature is optional, but is often written such that the content is self-describing. When an encoding is unknown, using one of the non-explicit encodings will cause the decode() method to look for a signature and adjust itself accordingly. It is possible that a ZWNBSP character might be confused with the signature; today's unicode content is supposed to use the WORD-JOINER character instead. The group of explicit encodings are for use when the content encoding is known. These *must* be used when converting back to external encoding, since written content must be in a known format. It should be noted that, during a decode() operation, the existence of a signature is in conflict with these explicit varieties. See $(LINK http://www.utf-8.com/) $(LINK http://www.hackcraft.net/xmlUnicode/) $(LINK http://www.unicode.org/faq/utf_bom.html/) $(LINK http://www.azillionmonkeys.com/qed/unicode.html/) $(LINK http://icu.sourceforge.net/docs/papers/forms_of_unicode/) Construct a instance using the given external encoding ~ one of the Unicode.xx types Return the current encoding. This is either the originally specified encoding, or a derived one obtained by inspecting the content for a BOM. The latter is performed as part of the decode() method Return the signature (BOM) of the current encoding Convert the provided content. The content is inspected for a BOM signature, which is stripped. An exception is thrown if a signature is present when, according to the encoding type, it should not be. Conversely, An exception is thrown if there is no known signature where the current encoding expects one to be present Perform encoding of content. Note that the encoding must be of the explicit variety by the time we get here Scan the BOM signatures looking for a match. We scan in reverse order to get the longest match first Swap bytes around, as required by the encoding Configure this instance with unicode converters Definition at line 112 of file UnicodeBom.d. References assert(), from(), into(), pragma(), type(), UnicodeBomTemplate(), and version. Referenced by UnicodeBomTemplate(), and UnicodeFileTemplate(). |
|
Definition at line 39 of file UnicodeBom.d. |
|
Definition at line 41 of file UnicodeBom.d. |
|
Definition at line 43 of file UnicodeBom.d. |
|
Definition at line 45 of file UnicodeBom.d. |