Monday, November 08, 2010 at 12:03 AM.
system.verbs.builtins.html.parseLinks
on parseLinks (htmltext, adrlinks) { <<Changes <<6/3/02; 7:59:19 AM by DW <<Created. <<A very simple parser with a single goal, pull all the <links> out of an HTML document, and return a table with the extracted information. <<Each sub-table of the table pointed to by adrlinks will contain the attributes of one of the <link>s. We use a trick to build this, for each link element, we convert it to XML and pass it through xml.compile. Its attributes sub-table is then copied into the returned links table. Works like a charm. <<We don't make sure that it's valid HTML. In other words, it will accept link elements that are in the body, not just in the head. <<If there's an error, adrlinks contains the links we found before the error, we return false. local (s = string.lower (htmltext), ct = 1); new (tabletype, adrlinks); try { loop { local (ix = string.patternmatch ("<link", s)); if ix == 0 { break}; s = string.delete (s, 1, ix - 1); ix = string.patternmatch (">", s); local (linktext = string.mid (s, 1, ix)); s = string.delete (s, 1, ix); local (linkstruct); if not (linktext endswith "/>") { linktext = string.replace (linktext, ">", "/>")}; xml.compile (linktext, @linkstruct); local (adrsubtable = @adrlinks^.[string.padwithzeros (ct++, 3)]); adrsubtable^ = linkstruct [1].["/atts"]}; return (true)} else { return (false)}} <<bundle //test code <<if not defined (scratchpad.dhrbhome) <<scratchpad.dhrbhome = tcp.httpreadurl ("http://radio.weblogs.com/0001015/") <<parselinks (scratchpad.dhrbhome, @scratchpad.links)
This listing is for code that runs in the OPML Editor environment. I created these listings because I wanted the search engines to index it, so that when I want to look up something in my codebase I don't have to use the much slower search functionality in my object database. Dave Winer.