Main Page | Class Hierarchy | Alphabetical List | Class List | File List | Class Members | File Members | Related Pages

USearch Class Reference

Inheritance diagram for USearch:

ICU List of all members.

Public Types

enum  Attribute { Overlap, CanonicalMatch, Count }
enum  AttributeValue { Default = -1, Off, On, Count }

Public Member Functions

 this (UText pattern, UText text, inout ULocale locale, UBreakIterator iterator=null)
 this (UText pattern, UText text, UCollator col, UBreakIterator iterator=null)
 ~this ()
void setOffset (uint position)
uint getOffset ()
uint getMatchedStart ()
uint getMatchedLength ()
void getMatchedText (UString s)
void setText (UText t)
UText getText ()
void setPattern (UText t)
UText getPattern ()
void setIterator (UBreakIterator iterator)
UBreakIterator getIterator ()
uint first ()
uint last ()
uint next (uint pos=uint.max)
uint previous (uint pos=uint.max)
void reset ()
UCollator getCollator ()
void setCollator (UCollator col)

Static Public Member Functions

 this ()
 ~this ()

Public Attributes

const uint Done = uint.max

Static Public Attributes

FunctionLoader Bind[] targets

Private Types

typedef void * Handle
enum  Error { OK, BufferOverflow = 15 }

Static Private Member Functions

bool isError (Error e)
void testError (Error e, char[] msg)
char * toString (char[] string)
wchar * toString (wchar[] string)
uint length (char *s)
uint length (wchar *s)
char[] toArray (char *s)
wchar[] toArray (wchar *s)

Private Attributes

Handle handle
UBreakIterator iterator

Static Private Attributes

void * library

Detailed Description

Apis for an engine that provides language-sensitive text searching based on the comparison rules defined in a UCollator data struct. This ensures that language eccentricity can be handled, e.g. for the German collator, characters ß and SS will be matched if case is chosen to be ignored. See the "ICU Collation Design Document" for more information.

The algorithm implemented is a modified form of the Boyer Moore's search. For more information see "Efficient Text Searching in Java", published in Java Report in February, 1999, for further information on the algorithm.

There are 2 match options for selection: Let S' be the sub-string of a text string S between the offsets start and end <start, end>. A pattern string P matches a text string S at the offsets <start, end> if

Option 2 will be the default·

This search has APIs similar to that of other text iteration mechanisms such as the break iterators in ubrk.h. Using these APIs, it is easy to scan through text looking for all occurances of a given pattern. This search iterator allows changing of direction by calling a reset followed by a next or previous. Though a direction change can occur without calling reset first, this operation comes with some speed penalty. Generally, match results in the forward direction will match the result matches in the backwards direction in the reverse order

USearch provides APIs to specify the starting position within the text string to be searched, e.g. setOffset(), previous(x) and next(x). Since the starting position will be set as it is specified, please take note that there are some dangerous positions which the search may render incorrect results:

A breakiterator can be used if only matches at logical breaks are desired. Using a breakiterator will only give you results that exactly matches the boundaries given by the breakiterator. For instance the pattern "e" will not be found in the string "\u00e9" if a character break iterator is used.

Options are provided to handle overlapping matches. E.g. In English, overlapping matches produces the result 0 and 2 for the pattern "abab" in the text "ababab", where else mutually exclusive matches only produce the result of 0.

Though collator attributes will be taken into consideration while performing matches, there are no APIs here for setting and getting the attributes. These attributes can be set by getting the collator from getCollator() and using the APIs in UCollator. Lastly to update String Search to the new collator attributes, reset() has to be called.

See http://oss.software.ibm.com/icu/apiref/usearch_8h.html for full details.

Definition at line 175 of file USearch.d.


Member Typedef Documentation

typedef void* Handle [protected, inherited]
 

Use this for the primary argument-type to most ICU functions

Definition at line 114 of file ICU.d.


Member Enumeration Documentation

enum Attribute
 

Enumeration values:
Overlap 
CanonicalMatch 
Count 

Definition at line 186 of file USearch.d.

enum AttributeValue
 

Enumeration values:
Default 
Off 
On 
Count 

Definition at line 193 of file USearch.d.

enum Error [protected, inherited]
 

ICU error codes (the ones which are referenced)

Enumeration values:
OK 
BufferOverflow 

Definition at line 148 of file ICU.d.


Constructor & Destructor Documentation

~this  )  [inline]
 

Close this USearch

Definition at line 239 of file USearch.d.

~this  )  [inline, static]
 

Definition at line 603 of file USearch.d.


Member Function Documentation

this UText  pattern,
UText  text,
inout ULocale  locale,
UBreakIterator  iterator = null
[inline]
 

Creating a search iterator data struct using the argument locale language rule set

Definition at line 208 of file USearch.d.

References iterator, and ICU::testError().

this UText  pattern,
UText  text,
UCollator  col,
UBreakIterator  iterator = null
[inline]
 

Creating a search iterator data struct using the argument locale language rule set

Definition at line 224 of file USearch.d.

References iterator, and ICU::testError().

void setOffset uint  position  )  [inline]
 

Sets the current position in the text string which the next search will start from.

Definition at line 251 of file USearch.d.

References ICU::testError().

uint getOffset  )  [inline]
 

Return the current index in the string text being searched

Definition at line 265 of file USearch.d.

uint getMatchedStart  )  [inline]
 

Returns the index to the match in the text string that was searched

Definition at line 277 of file USearch.d.

uint getMatchedLength  )  [inline]
 

Returns the length of text in the string which matches the search pattern

Definition at line 289 of file USearch.d.

void getMatchedText UString  s  )  [inline]
 

Returns the text that was matched by the most recent call to first(), next(), previous(), or last().

Definition at line 301 of file USearch.d.

References UString::format().

void setText UText  t  )  [inline]
 

Set the string text to be searched.

Definition at line 317 of file USearch.d.

References UText::get(), UText::length(), and ICU::testError().

UText getText  )  [inline]
 

Return the string text to be searched. Note that this returns a read-only reference to the search text.

Definition at line 332 of file USearch.d.

References len.

void setPattern UText  t  )  [inline]
 

Sets the pattern used for matching

Definition at line 346 of file USearch.d.

References UText::get(), UText::length(), and ICU::testError().

UText getPattern  )  [inline]
 

Gets the search pattern. Note that this returns a read-only reference to the pattern.

Definition at line 361 of file USearch.d.

References len.

void setIterator UBreakIterator  iterator  )  [inline]
 

Set the BreakIterator that will be used to restrict the points at which matches are detected.

Definition at line 376 of file USearch.d.

References UBreakIterator::handle, and ICU::testError().

UBreakIterator getIterator  )  [inline]
 

Get the BreakIterator that will be used to restrict the points at which matches are detected.

Definition at line 392 of file USearch.d.

References iterator.

uint first  )  [inline]
 

Returns the first index at which the string text matches the search pattern

Definition at line 404 of file USearch.d.

References ICU::testError().

uint last  )  [inline]
 

Returns the last index in the target text at which it matches the search pattern

Definition at line 420 of file USearch.d.

References ICU::testError().

uint next uint  pos = uint.max  )  [inline]
 

Returns the index of the next point at which the string text matches the search pattern, starting from the current position.

If pos is specified, returns the first index greater than pos at which the string text matches the search pattern

Definition at line 440 of file USearch.d.

References ICU::testError().

uint previous uint  pos = uint.max  )  [inline]
 

Returns the index of the previous point at which the string text matches the search pattern, starting at the current position.

If pos is specified, returns the first index less than pos at which the string text matches the search pattern.

Definition at line 464 of file USearch.d.

References ICU::testError().

void reset  )  [inline]
 

Search will begin at the start of the text string if a forward iteration is initiated before a backwards iteration. Otherwise if a backwards iteration is initiated before a forwards iteration, the search will begin at the end of the text string

Definition at line 486 of file USearch.d.

UCollator getCollator  )  [inline]
 

Gets the collator used for the language rules.

Definition at line 497 of file USearch.d.

References UCollator.

void setCollator UCollator  col  )  [inline]
 

Sets the collator used for the language rules. This method causes internal data such as Boyer-Moore shift tables to be recalculated, but the iterator's position is unchanged

Definition at line 511 of file USearch.d.

References UCollator::handle, and ICU::testError().

this  )  [inline, static]
 

Definition at line 594 of file USearch.d.

bool isError Error  e  )  [inline, static, protected, inherited]
 

Definition at line 158 of file ICU.d.

Referenced by UConverter::detectSignature(), UString::format(), UCollator::getLocale(), and UConverter::this().

void testError Error  e,
char[]  msg
[inline, static, protected, inherited]
 

Definition at line 176 of file ICU.d.

Referenced by UCalendar::add(), USet::applyPattern(), UChar::charFromName(), UNormalize::check(), URegex::clone(), UNormalize::compare(), UDomainName::compare(), UConverter::UTranscoder::convert(), UEnumeration::count(), UConverter::decode(), UConverter::encode(), URegex::end(), UTransform::execute(), first(), UResourceBundle::get(), UCalendar::get(), UCollator::getAttribute(), UResourceBundle::getBinary(), UCollator::getBound(), UChar::getCharName(), UChar::getComment(), UCollator::getContractions(), URegex::getFlags(), UResourceBundle::getInt(), UResourceBundle::getIntVector(), UCalendar::getLimit(), UResourceBundle::getLocale(), UCalendar::getMillis(), UConverter::getName(), UResourceBundle::getNextString(), URegex::getPattern(), UCollator::getShortDefinitionString(), UResourceBundle::getString(), UCollator::getTailoredSet(), UDateFormat::getTwoDigitYearStart(), UCollator::getVariableTop(), URegex::groupCount(), UCalendar::inDaylightTime(), UNormalize::isNormalized(), last(), URegex::match(), next(), URegex::next(), UEnumeration::next(), UCollator::normalizeShortDefinitionString(), UDateFormat::parse(), previous(), URegex::probe(), URegex::replaceAll(), URegex::replaceFirst(), URegex::reset(), UEnumeration::reset(), UCalendar::roll(), UCollator::setAttribute(), setCollator(), UCalendar::setDate(), UCalendar::setDateTime(), UTransform::setFilter(), setIterator(), UCalendar::setMillis(), setOffset(), setPattern(), UDecimalFormat::setPattern(), UMessageFormat::setPattern(), setText(), URegex::setText(), UBreakIterator::setText(), UCalendar::setTimeZone(), UDateFormat::setTwoDigitYearStart(), UCollator::setVariableTop(), URegex::split(), URegex::start(), UTransform::this(), UStringPrep::this(), USet::this(), this(), UResourceBundle::this(), URegex::this(), UNumberFormat::this(), UMessageFormat::this(), UDateFormat::this(), UCollator::this(), UCalendar::this(), UBreakIterator::this(), URuleIterator::this(), and UText::toUtf8().

char* toString char[]  string  )  [inline, static, protected, inherited]
 

Definition at line 186 of file ICU.d.

References string.

Referenced by UChar::charFromName(), UConverter::compareNames(), UCollator::getDisplayName(), UResourceBundle::getResource(), UCollator::getShortDefinitionString(), UResourceBundle::getString(), UCalendar::getTimeZoneName(), UCollator::normalizeShortDefinitionString(), UMessageFormat::setLocale(), UStringPrep::this(), UResourceBundle::this(), UDateFormat::this(), UCollator::this(), UBreakIterator::this(), UText::toLower(), and UText::toUpper().

wchar* toString wchar[]  string  )  [inline, static, protected, inherited]
 

Definition at line 208 of file ICU.d.

References string.

uint length char *  s  )  [inline, static, protected, inherited]
 

Definition at line 230 of file ICU.d.

References strlen().

uint length wchar *  s  )  [inline, static, protected, inherited]
 

Definition at line 239 of file ICU.d.

References wcslen().

char [] toArray char *  s  )  [inline, static, protected, inherited]
 

Definition at line 248 of file ICU.d.

References strlen().

Referenced by UConverter::detectSignature(), UResourceBundle::getKey(), UResourceBundle::getLocale(), UMessageFormat::getLocale(), UCollator::getLocale(), UConverter::getName(), UChar::getPropertyName(), UChar::getPropertyValueName(), and UConverter::opApply().

wchar [] toArray wchar *  s  )  [inline, static, protected, inherited]
 

Definition at line 259 of file ICU.d.

References wcslen().


Member Data Documentation

Handle handle [private]
 

Definition at line 177 of file USearch.d.

UBreakIterator iterator [private]
 

Definition at line 178 of file USearch.d.

Referenced by getIterator(), and this().

const uint Done = uint.max
 

Definition at line 183 of file USearch.d.

void* library [static, private]
 

Bind the ICU functions from a shared library. This is complicated by the issues regarding D and DLLs on the Windows platform

Definition at line 528 of file USearch.d.

FunctionLoader Bind [] targets [static]
 

Initial value:

 
                [
                {cast(void**) &usearch_open,             "usearch_open"}

Definition at line 564 of file USearch.d.


The documentation for this class was generated from the following file:
Generated on Sun Mar 6 00:31:18 2005 for Mango by doxygen 1.3.6