Main Page | Class Hierarchy | Alphabetical List | Class List | File List | Class Members | File Members | Related Pages

RegexTokenizer Class Reference

Inheritance diagram for RegexTokenizer:

Scanner ITokenizer List of all members.

Public Member Functions

 this (RegExp exp)
bool next (IBuffer buffer, Token token)

Public Attributes

import std regexp

Private Member Functions

bool next (IBuffer buffer, int(*scan)(char[]))
int notFound (Token token, char[] content)

Private Attributes

RegExp exp

Detailed Description

Wrap a tokenizer around the std.RegExp class. This is useful for situations where you can't load the entire source into memory at one time. In other words, this adapts RegExp into an incremental scanner.

Note that the associated buffer must be large enough to contain an entire RegExp match. For example, if you have a regex pattern that matches an entire file then the buffer must be at least the size of the file. In such cases, one might be advised to find an more effective solution.

Definition at line 396 of file Tokenizer.d.


Member Function Documentation

this RegExp  exp  )  [inline]
 

Construct a RegexTokenizer with the provided RegExp.

Definition at line 408 of file Tokenizer.d.

References exp.

bool next IBuffer  buffer,
Token  token
[inline]
 

Locate the next token from the provided buffer, and map a buffer reference into token. Returns true if a token was located, false otherwise.

Note that the buffer content is not duplicated. Instead, a slice of the buffer is referenced by the token. You can use Token.clone() or Token.toString().dup() to copy content per your application needs.

Note also that there may still be one token left in a buffer that was not terminated correctly (as in eof conditions). In such cases, tokens are mapped onto remaining content and the buffer will have no more readable content.

Reimplemented from ITokenizer.

Definition at line 431 of file Tokenizer.d.

References IBuffer::compress(), exp, IBuffer::getPosition(), Scanner::notFound(), Token::set(), and IBuffer::skip().

bool next IBuffer  buffer,
int(*  scan)(char[])
[inline, inherited]
 

Scan the given IBuffer for another token, and place the results in the provided token. Note that this should be completely thread-safe so one can instantiate singleton tokenizers without issue.

Each Token is expected to be stripped of the delimiter. An end-of-file condition causes trailing content to be placed into the token. Requests made beyond Eof result in empty tokens (length == zero).

Returns true if a token was isolated, false otherwise.

Definition at line 72 of file Tokenizer.d.

References IBuffer::compress(), IBuffer::getConduit(), IBuffer::getPosition(), IConduit::read(), IBuffer::read(), IBuffer::readable(), IBuffer::skip(), and IBuffer::writable().

int notFound Token  token,
char[]  content
[inline, inherited]
 

Clean up after we fail to find a token. Trailing content is placed into the token, and the scanner is told to try and load some more content (where available).

Definition at line 114 of file Tokenizer.d.

References Token::set().

Referenced by next(), LineTokenizer::next(), PunctTokenizer::next(), SpaceTokenizer::next(), and SimpleTokenizer::next().


Member Data Documentation

import std regexp
 

Definition at line 398 of file Tokenizer.d.

RegExp exp [private]
 

Definition at line 400 of file Tokenizer.d.

Referenced by next(), and this().


The documentation for this class was generated from the following file:
Generated on Sun Nov 7 19:07:09 2004 for Mango by doxygen 1.3.6