Java开发网 - 请教（关于htmlpaser开源包的）

Topic: 请教（关于htmlpaser开源包的）

1.请教（关于htmlpaser开源包的）

Posted by: fishman
Posted on: 2005-04-20 03:16

我做了一个程序，就是从一个html解析出：
1、获取<a>标签
2、取得值为key（汽车）的<a>标签
3、取<a>标签中的href属性的值

我写了一个程序，结果能解析163网站的东西，但是sina和sohu的都解析不了，程序debug时有错，说找不到org.htmlparser.util里面的IterotarImpl.class，我甚是纳闷，那怎么能解析163的呢？请对htmlparser有研究的高人指点一下。
程序大致如下：

import org.htmlparser.*;
import org.htmlparser.util.NodeIterator;
import org.htmlparser.lexer.nodes.TagNode;

/**
* @author zhangyu
*
* TODO To change the template for this generated type comment go to
* Window - Preferences - Java - Code Style - Code Templates
*/
public class ParserHref {
  public static void main(String[] args)throws Exception
  {
     //取得所有的标签对应的Node
     Parser parser = new Parser("http://www.163.com");
     NodeIterator iterator = parser.elements();
     Node node = iterator.nextNode();
     //调用一个递归的遍历方法
     getLinks(node);

  }

   //递归遍历所有的node节点内满足要求的节点
   public static void getLinks(Node pNode)
      {

        for (int i = 0; i<pNode.getChildren().size(); i++)

      {
      Node node = pNode.getChildren().elementAt Light Bulb

;
      if (node instanceof TagNode)
      {
      TagNode tag = (TagNode) node;
      //判断内容为“汽车”的标签为A的节点
      if(tag.getTagName().equals("A") &&tag.toPlainTextString().equals("汽车"))
      {
      //取得，并打印满足要求节点的“href”属性
      System.out.println("汽车:" + tag.getAttribute("href"));
      }
      if (tag.getChildren() != null && tag.getChildren().size() > 0)
      getLinks(tag);
      }
      }
      }

}