|
Sharkysoft home | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--lava.text.html.HtmlParser
Parses HTML source.
Details: This class parses HTML source by separating the source components into tags, text, and comments. HtmlParser
reads text from a PushbackReader
and returns a stream of objects representing parsed entities. Each of the objects is an instance of HtmlComponent
, which has many subclasses (refer to the see-also section).
To gain an appreciation for the manner in which HtmlParser is able to parse and tokenize HTML source, the following sample program is provided. Try this program on your favorite URL.
import java.io.InputStreamReader; import java.io.OutputStreamWriter; import lava.io. |
Click here to download source.
Changes:
peek ()
.
HtmlComponent
,
HtmlText
,
HtmlRegularTag
,
HtmlOpenTag
,
HtmlCloseTag
,
HtmlSpecialTag
,
HtmlComment
,
HtmlError
Constructor Summary | |
HtmlParser(java.io.PushbackReader in)
Sets HTML source. |
Method Summary | |
void |
close()
Closes source input stream. |
static boolean |
isCloseTag(HtmlComponent c,
java.lang.String type)
|
static boolean |
isOpenTag(HtmlComponent c,
java.lang.String type)
|
HtmlComponent |
parse()
Parses one HTML element. |
HtmlComponent |
peek()
Peeks at next component without consuming. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public HtmlParser(java.io.PushbackReader in)
Details: This constructor sets the PushbackReader
from which this HtmlParser
reads.
in
- the InputStreamMethod Detail |
public HtmlComponent parse() throws java.io.IOException
Details: This method parses one element from the HTML source stream and returns it. Use the instanceof
operator to determine the type of element that was parsed. parse
returns null
if no more elements can be parsed.
java.io.IOException
- if the source stream cannot be readpublic HtmlComponent peek() throws java.io.IOException
Details: This method determines the next component without consuming it. The object returned by this method is the same physical object that will be returned by parse
the next time it is called.
java.io.IOException
- if an I/O error occurspublic void close() throws java.io.IOException
Details: This method closes the HTML source input stream. Of course, no more HTML tokens can be parsed after this method is called.
java.io.IOException
- if an I/O error occurspublic static boolean isOpenTag(HtmlComponent c, java.lang.String type)
public static boolean isCloseTag(HtmlComponent c, java.lang.String type)
|
Sharkysoft home | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |