Monday, November 08, 2010 at 12:03 AM.
system.verbs.builtins.html.parseLinks
on parseLinks (htmltext, adrlinks) {
<<Changes
<<6/3/02; 7:59:19 AM by DW
<<Created.
<<A very simple parser with a single goal, pull all the <links> out of an HTML document, and return a table with the extracted information.
<<Each sub-table of the table pointed to by adrlinks will contain the attributes of one of the <link>s. We use a trick to build this, for each link element, we convert it to XML and pass it through xml.compile. Its attributes sub-table is then copied into the returned links table. Works like a charm.
<<We don't make sure that it's valid HTML. In other words, it will accept link elements that are in the body, not just in the head.
<<If there's an error, adrlinks contains the links we found before the error, we return false.
local (s = string.lower (htmltext), ct = 1);
new (tabletype, adrlinks);
try {
loop {
local (ix = string.patternmatch ("<link", s));
if ix == 0 {
break};
s = string.delete (s, 1, ix - 1);
ix = string.patternmatch (">", s);
local (linktext = string.mid (s, 1, ix));
s = string.delete (s, 1, ix);
local (linkstruct);
if not (linktext endswith "/>") {
linktext = string.replace (linktext, ">", "/>")};
xml.compile (linktext, @linkstruct);
local (adrsubtable = @adrlinks^.[string.padwithzeros (ct++, 3)]);
adrsubtable^ = linkstruct [1].["/atts"]};
return (true)}
else {
return (false)}}
<<bundle //test code
<<if not defined (scratchpad.dhrbhome)
<<scratchpad.dhrbhome = tcp.httpreadurl ("http://radio.weblogs.com/0001015/")
<<parselinks (scratchpad.dhrbhome, @scratchpad.links)
This listing is for code that runs in the OPML Editor environment. I created these listings because I wanted the search engines to index it, so that when I want to look up something in my codebase I don't have to use the much slower search functionality in my object database. Dave Winer.