Main Page | Class Hierarchy | Alphabetical List | Class List | File List | Class Members | File Members | Related Pages

UConverter Class Reference

Inheritance diagram for UConverter:

ICU List of all members.

Public Member Functions

 this (char[] name)
 ~this ()
void reset ()
void resetDecoder ()
void resetEncoder ()
ubyte getMaxCharSize ()
ubyte getMinCharSize ()
char[] getName ()
bool isAmbiguous ()
void encode (wchar **src, wchar *srcLimit, char **dst, char *dstLimit, int *offsets, bool flush)
uint encode (wchar[] src, char[] dst)
void decode (char **src, char *srcLimit, wchar **dst, wchar *dstLimit, int *offsets, bool flush)
uint decode (char[] src, wchar[] dst)

Static Public Member Functions

int compareNames (char[] a, char[] b)
char[] detectSignature (void[] input)

Private Types

typedef void * Handle
enum  Error { OK, BufferOverflow = 15 }

Static Private Member Functions

bool isError (Error e)
void testError (Error e, char[] msg)
char * toString (char[] string)
wchar * toString (wchar[] string)
uint length (char *s)
uint length (wchar *s)
char[] toArray (char *s)
wchar[] toArray (wchar *s)

Private Attributes

Handle converter

Detailed Description

This API is used to convert codepage or character encoded data to and from UTF-16. You can open a converter with ucnv_open(). With that converter, you can get its properties, set options, convert your data and close the converter.

Since many software programs recogize different converter names for different types of converters, there are other functions in this API to iterate over the converter aliases.

See this page for full details.

Definition at line 102 of file UConverter.d.


Member Typedef Documentation

typedef void* Handle [protected, inherited]
 

Use this for the primary argument-type to most ICU functions

Definition at line 109 of file ICU.d.


Member Enumeration Documentation

enum Error [protected, inherited]
 

ICU error codes (the ones which are referenced)

Enumeration values:
OK 
BufferOverflow 

Definition at line 117 of file ICU.d.


Constructor & Destructor Documentation

~this  )  [inline]
 

Deletes the unicode converter and releases resources associated with just this instance. Does not free up shared converter tables.

Definition at line 150 of file UConverter.d.

References converter.


Member Function Documentation

this char[]  name  )  [inline]
 

Creates a UConverter object with the names specified as a string.

The actual name will be resolved with the alias file using a case-insensitive string comparison that ignores delimiters '-', '_', and ' ' (dash, underscore, and space). E.g., the names "UTF8", "utf-8", and "Utf 8" are all equivalent. If null is passed for the converter name, it will create one with the getDefaultName() return value.

A converter name may contain options like a locale specification to control the specific behavior of the converter instantiated. The meaning of the options depends on the particular converter: if an option is not defined for or recognized, it is ignored.

Options are appended to the converter name string, with an OptionSepChar between the name and the first option and also between adjacent options.

The conversion behavior and names can vary between platforms, and ICU may convert some characters differently from other platforms. Details on this topic are in the User's Guide.

Definition at line 133 of file UConverter.d.

References converter, and ICU::isError().

int compareNames char[]  a,
char[]  b
[inline, static]
 

Do a fuzzy compare of two converter/alias names. The comparison is case-insensitive. It also ignores the characters '-', '_', and ' ' (dash, underscore, and space). Thus the strings "UTF-8", "utf_8", and "Utf 8" are exactly equivalent

Definition at line 167 of file UConverter.d.

References ICU::toString().

void reset  )  [inline]
 

Resets the state of this converter to the default state.

This is used in the case of an error, to restart a conversion from a known default state. It will also empty the internal output buffers.

Definition at line 182 of file UConverter.d.

References converter.

void resetDecoder  )  [inline]
 

Resets the from-Unicode part of this converter state to the default state.

This is used in the case of an error to restart a conversion from Unicode to a known default state. It will also empty the internal output buffers used for the conversion from Unicode codepoints.

Definition at line 199 of file UConverter.d.

References converter.

void resetEncoder  )  [inline]
 

Resets the from-Unicode part of this converter state to the default state.

This is used in the case of an error to restart a conversion from Unicode to a known default state. It will also empty the internal output buffers used for the conversion from Unicode codepoints.

Definition at line 216 of file UConverter.d.

References converter.

ubyte getMaxCharSize  )  [inline]
 

Returns the maximum number of bytes that are output per UChar in conversion from Unicode using this converter.

The returned number can be used to calculate the size of a target buffer for conversion from Unicode.

This number may not be the same as the maximum number of bytes per "conversion unit". In other words, it may not be the intuitively expected number of bytes per character that would be published for a charset, and may not fulfill any other purpose than the allocation of an output buffer of guaranteed sufficient size for a given input length and converter.

Examples for special cases that are taken into account:

Supplementary code points may convert to more bytes than BMP code points. This function returns bytes per UChar (UTF-16 code unit), not per Unicode code point, for efficient buffer allocation. State-shifting output (SI/SO, escapes, etc.) from stateful converters. When m input UChars are converted to n output bytes, then the maximum m/n is taken into account.

The number returned here does not take into account:

callbacks which output more than one charset character sequence per call, like escape callbacks initial and final non-character bytes that are output by some converters (automatic BOMs, initial escape sequence, final SI, etc.)

Examples for returned values:

SBCS charsets: 1 Shift-JIS: 2 UTF-16: 2 (2 per BMP, 4 per surrogate _pair_, BOM not counted) UTF-8: 3 (3 per BMP, 4 per surrogate _pair_) EBCDIC_STATEFUL (EBCDIC mixed SBCS/DBCS): 3 (SO + DBCS) ISO-2022: 3 (always outputs UTF-8) ISO-2022-JP: 6 (4-byte escape sequences + DBCS) ISO-2022-CN: 8 (4-byte designator sequences + 2-byte SS2/SS3 + DBCS)

Definition at line 270 of file UConverter.d.

References converter.

ubyte getMinCharSize  )  [inline]
 

Returns the minimum byte length for characters in this codepage. This is usually either 1 or 2.

Definition at line 282 of file UConverter.d.

References converter.

char [] getName  )  [inline]
 

Gets the internal, canonical name of the converter (zero- terminated).

Definition at line 294 of file UConverter.d.

References converter, ICU::testError(), and ICU::toArray().

bool isAmbiguous  )  [inline]
 

Determines if the converter contains ambiguous mappings of the same character or not

Definition at line 310 of file UConverter.d.

References converter.

char [] detectSignature void[]  input  )  [inline, static]
 

Detects Unicode signature byte sequences at the start of the byte stream and returns the charset name of the indicated Unicode charset. An exception is thrown when no Unicode signature is recognized.

A caller can create a Converter using the charset name. The first code unit (UChar) from the start of the stream will be U+FEFF (the Unicode BOM/signature character) and can usually be ignored.

Definition at line 329 of file UConverter.d.

References ICU::isError(), and ICU::toArray().

void encode wchar **  src,
wchar *  srcLimit,
char **  dst,
char *  dstLimit,
int *  offsets,
bool  flush
[inline]
 

Converts an array of unicode characters to an array of codepage characters.

This function is optimized for converting a continuous stream of data in buffer-sized chunks, where the entire source and target does not fit in available buffers.

The source pointer is an in/out parameter. It starts out pointing where the conversion is to begin, and ends up pointing after the last UChar consumed.

Target similarly starts out pointer at the first available byte in the output buffer, and ends up pointing after the last byte written to the output.

The converter always attempts to consume the entire source buffer, unless (1.) the target buffer is full, or (2.) a failing error is returned from the current callback function. When a successful error status has been returned, it means that all of the source buffer has been consumed. At that point, the caller should reset the source and sourceLimit pointers to point to the next chunk.

At the end of the stream (flush==true), the input is completely consumed when *source==sourceLimit and no error code is set. The converter object is then automatically reset by this function. (This means that a converter need not be reset explicitly between data streams if it finishes the previous stream without errors.)

This is a stateful conversion. Additionally, even when all source data has been consumed, some data may be in the converters' internal state. Call this function repeatedly, updating the target pointers with the next empty chunk of target in case of a U_BUFFER_OVERFLOW_ERROR, and updating the source pointers with the next chunk of source when a successful error status is returned, until there are no more chunks of source data.

Parameters:

converter the Unicode converter target I/O parameter. Input : Points to the beginning of the buffer to copy codepage characters to. Output : points to after the last codepage character copied to target. targetLimit the pointer just after last of the target buffer source I/O parameter, pointer to pointer to the source Unicode character buffer. sourceLimit the pointer just after the last of the source buffer offsets if NULL is passed, nothing will happen to it, otherwise it needs to have the same number of allocated cells as target. Will fill in offsets from target to source pointer e.g: offsets[3] is equal to 6, it means that the target[3] was a result of transcoding source[6] For output data carried across calls, and other data without a specific source character (such as from escape sequences or callbacks) -1 will be placed for offsets. flush set to TRUE if the current source buffer is the last available chunk of the source, FALSE otherwise. Note that if a failing status is returned, this function may have to be called multiple times with flush set to TRUE until the source buffer is consumed.

Definition at line 417 of file UConverter.d.

References converter, and ICU::testError().

uint encode wchar[]  src,
char[]  dst
[inline]
 

Encode the Unicode string into a codepage string.

This function is a more convenient but less powerful version of encode(). It is only useful for whole strings, not for streaming conversion. The maximum output buffer capacity required (barring output from callbacks) should be calculated using getMaxCharSize().

Definition at line 437 of file UConverter.d.

References converter, and ICU::testError().

void decode char **  src,
char *  srcLimit,
wchar **  dst,
wchar *  dstLimit,
int *  offsets,
bool  flush
[inline]
 

Converts a buffer of codepage bytes into an array of unicode UChars characters.

This function is optimized for converting a continuous stream of data in buffer-sized chunks, where the entire source and target does not fit in available buffers.

The source pointer is an in/out parameter. It starts out pointing where the conversion is to begin, and ends up pointing after the last byte of source consumed.

Target similarly starts out pointer at the first available UChar in the output buffer, and ends up pointing after the last UChar written to the output. It does NOT necessarily keep UChar sequences together.

The converter always attempts to consume the entire source buffer, unless (1.) the target buffer is full, or (2.) a failing error is returned from the current callback function. When a successful error status has been returned, it means that all of the source buffer has been consumed. At that point, the caller should reset the source and sourceLimit pointers to point to the next chunk.

At the end of the stream (flush==true), the input is completely consumed when *source==sourceLimit and no error code is set The converter object is then automatically reset by this function. (This means that a converter need not be reset explicitly between data streams if it finishes the previous stream without errors.)

This is a stateful conversion. Additionally, even when all source data has been consumed, some data may be in the converters' internal state. Call this function repeatedly, updating the target pointers with the next empty chunk of target in case of a BufferOverflow, and updating the source pointers with the next chunk of source when a successful error status is returned, until there are no more chunks of source data.

Parameters: converter the Unicode converter target I/O parameter. Input : Points to the beginning of the buffer to copy UChars into. Output : points to after the last UChar copied. targetLimit the pointer just after the end of the target buffer source I/O parameter, pointer to pointer to the source codepage buffer. sourceLimit the pointer to the byte after the end of the source buffer offsets if NULL is passed, nothing will happen to it, otherwise it needs to have the same number of allocated cells as target. Will fill in offsets from target to source pointer e.g: offsets[3] is equal to 6, it means that the target[3] was a result of transcoding source[6] For output data carried across calls, and other data without a specific source character (such as from escape sequences or callbacks) -1 will be placed for offsets. flush set to true if the current source buffer is the last available chunk of the source, false otherwise. Note that if a failing status is returned, this function may have to be called multiple times with flush set to true until the source buffer is consumed.

Definition at line 517 of file UConverter.d.

References converter, and ICU::testError().

uint decode char[]  src,
wchar[]  dst
[inline]
 

Decode the codepage string into a Unicode string.

This function is a more convenient but less powerful version of decode(). It is only useful for whole strings, not for streaming conversion. The maximum output buffer capacity required (barring output from callbacks) will be 2*src.length (each char may be converted into a surrogate pair)

Definition at line 537 of file UConverter.d.

References converter, and ICU::testError().

bool isError Error  e  )  [inline, static, protected, inherited]
 

Definition at line 127 of file ICU.d.

Referenced by detectSignature(), UString::format(), and this().

void testError Error  e,
char[]  msg
[inline, static, protected, inherited]
 

Definition at line 145 of file ICU.d.

Referenced by UCalendar::add(), UText::compareFolded(), decode(), encode(), UResourceBundle::get(), UCalendar::get(), UResourceBundle::getBinary(), UResourceBundle::getInt(), UResourceBundle::getIntVector(), UCalendar::getLimit(), UResourceBundle::getLocale(), UCalendar::getMillis(), getName(), UResourceBundle::getNextString(), UResourceBundle::getString(), UDateFormat::getTwoDigitYearStart(), UCalendar::inDaylightTime(), UDateFormat::parse(), UCalendar::roll(), UCalendar::setDate(), UCalendar::setDateTime(), UCalendar::setMillis(), UDecimalFormat::setPattern(), UMessageFormat::setPattern(), UCalendar::setTimeZone(), UDateFormat::setTwoDigitYearStart(), UResourceBundle::this(), UNumberFormat::this(), UMessageFormat::this(), UDateFormat::this(), and UCalendar::this().

char* toString char[]  string  )  [inline, static, protected, inherited]
 

Definition at line 155 of file ICU.d.

Referenced by compareNames(), UResourceBundle::getResource(), UResourceBundle::getString(), UCalendar::getTimeZoneName(), UMessageFormat::setLocale(), UResourceBundle::this(), UMessageFormat::this(), UDateFormat::this(), UText::toLower(), ICU::toString(), and UText::toUpper().

wchar* toString wchar[]  string  )  [inline, static, protected, inherited]
 

Definition at line 175 of file ICU.d.

References ICU::toString().

uint length char *  s  )  [inline, static, protected, inherited]
 

Definition at line 184 of file ICU.d.

References strlen().

uint length wchar *  s  )  [inline, static, protected, inherited]
 

Definition at line 193 of file ICU.d.

References wcslen().

char [] toArray char *  s  )  [inline, static, protected, inherited]
 

Definition at line 202 of file ICU.d.

References strlen().

Referenced by detectSignature(), UResourceBundle::getKey(), UResourceBundle::getLocale(), UMessageFormat::getLocale(), and getName().

wchar [] toArray wchar *  s  )  [inline, static, protected, inherited]
 

Definition at line 213 of file ICU.d.

References wcslen().


Member Data Documentation

Handle converter [private]
 

Definition at line 104 of file UConverter.d.

Referenced by decode(), encode(), getMaxCharSize(), getMinCharSize(), getName(), isAmbiguous(), reset(), resetDecoder(), resetEncoder(), this(), and ~this().


The documentation for this class was generated from the following file:
Generated on Sun Nov 7 19:07:12 2004 for Mango by doxygen 1.3.6