Go to the source code of this file.
Functions | |
class | UnicodeFile (T) |
typedef | UnicodeFile (char) UnicodeFile8 |
typedef | UnicodeFile (wchar) UnicodeFile16 |
typedef | UnicodeFile (dchar) UnicodeFile32 |
Variables | |
module mango io | UnicodeFile |
import mango io | FilePath |
import mango io | FileStyle |
import mango io mango io | FileProxy |
import mango io mango io mango io | Exception |
import mango io mango io mango io mango io | FileConduit |
import mango sys | ByteSwap |
import mango convert | Type |
import mango convert mango convert | Unicode |
|
Read and write unicode files For our purposes, unicode files are an encoding of textual content. The goal of this module is to interface that external-encoding with a programmer-defined internal-encoding. This internal encoding is declared via the template argument T, whilst the external encoding is either specified or derived via methods herein. Three internal encodings are supported: char, wchar, and dchar. The methods within operate upon arrays of this type. For example, read() returns an array of the type, whilst write() and append() expect an array of said type. Supported external encodings are as follow (from Unicode.d): Unicode.Unknown Unicode.UTF_8 Unicode.UTF_8N Unicode.UTF_16 Unicode.UTF_16BE Unicode.UTF_16LE Unicode.UTF_32 Unicode.UTF_32BE Unicode.UTF_32LE These can be divided into non-explicit and explicit encodings: Unicode.Unknown Unicode.UTF_8 Unicode.UTF_16 Unicode.UTF_32 Unicode.UTF_8N Unicode.UTF_16BE Unicode.UTF_16LE Unicode.UTF_32BE Unicode.UTF_32LE The former group of non-explicit encodings may be used to 'discover' an unknown encoding, by examining the first few bytes of the file content for a signature. This signature is optional for all files, but is often written such that the content is self-describing. When the encoding is unknown, using one of the non-explicit encodings will cause the read() method to look for a signature and adjust itself accordingly. It is possible that a ZWNBSP character might be confused with the signature; today's files are supposed to use the WORD-JOINER character instead. The group of explicit encodings are for use when the file encoding is known. These *must* be used when writing or appending, since written content must be in a known format. It should be noted that, during a read operation, the presence of a signature is in conflict with these explicit varieties. Method read() returns the current content of the file, whilst write() sets the file content, and file length, to the provided array. Method append() adds content to the tail of the file. When appending, it is your responsibility to ensure the existing and current encodings are correctly matched. Methods to inspect the file system, check the status of a file or directory, and other facilities are made available via the FileProxy superclass. Note that the convert() method can be used to convert an arbitrary array of content ~ said content can come from somewhere other than a file (a socket, for example). See $(LINK http://www.utf-8.com/) $(LINK http://www.hackcraft.net/xmlUnicode/) $(LINK http://www.unicode.org/faq/utf_bom.html/) $(LINK http://www.azillionmonkeys.com/qed/unicode.html/) $(LINK http://icu.sourceforge.net/docs/papers/forms_of_unicode/) Construct a UnicodeFile from a text string. The provided encoding represents the external file encoding, and should be one of the Unicode.xx types Construct a UnicodeFile from the provided FilePath. The given encoding represents the external file encoding, and should be one of the Unicode.xx types Return the current encoding. This is either the originally specified encoding, or a derived one obtained by inspecting the file content for a BOM. The latter is performed as part of the read() method. Return the content of the file. The content is inspected for a BOM signature, which is stripped. An exception is thrown if a signature is present when, according to the encoding type, it should not be. Conversely, An exception is thrown if there is no known signature where the current encoding expects one to be present. Set the file content and length to reflect the given array. The content will be encoded accordingly. Append content to the file; the content will be encoded accordingly. Note that it is it is your responsibility to ensure the existing and current encodings are correctly matched. Convert the provided content. The content is inspected for a BOM signature, which is stripped. An exception is thrown if a signature is present when, according to the encoding type, it should not be. Conversely, An exception is thrown if there is no known signature where the current encoding expects one to be present. Internal method to perform writing of content. Note that the encoding must be of the explicit variety by the time we get here. Scan the BOM signatures looking for a match. We scan in reverse order to get the longest match first. Swap bytes around, as required by the encoding Configure this instance with unicode converters Definition at line 134 of file Copy of UnicodeFile.d. References assert(), convert(), FileConduit, from(), into(), FileConduit::length(), type(), UnicodeFile, version, and FileProxy::write(). |
|
|
|
|
|
|
|
Definition at line 39 of file Copy of UnicodeFile.d. |
|
Definition at line 41 of file Copy of UnicodeFile.d. |
|
Definition at line 43 of file Copy of UnicodeFile.d. |
|
Definition at line 43 of file Copy of UnicodeFile.d. |
|
Definition at line 43 of file Copy of UnicodeFile.d. |
|
Definition at line 43 of file Copy of UnicodeFile.d. |
|
Definition at line 48 of file Copy of UnicodeFile.d. |
|
Definition at line 50 of file Copy of UnicodeFile.d. |
|
Definition at line 50 of file Copy of UnicodeFile.d. |