第一种情况:
<<span>a onclick="javascript:window.open('images/Chanel ClutchBags/Chanel-Clutch-Bags-021_01.jpg')" rel="nofollow"><<span>img src="bmz_cache/2/27cce32f767cd40252259bfbd2375c7b.image.100x66.jpg"alt="" width="100"height="66" /><<span>br /></<span>a>
这种情况img标签里面是缩略图,而大图在a标签里,想要采集到大图,就需要用正则把a标签里的大图路径匹配出来然后组合成img标签示例如下:
正则匹配内容:
组合结果:
第二种情况:
<<span>div>第一张大图:<<span>img src="images/Chanel ClutchBags/Chanel-Clutch-Bags-021.jpg" bigimg="images/Chanel ClutchBags/Chanel-Clutch-Bags-021.jpg" width="400px" heith="400px" > </<span>div> <<span>div > <<span>p><<span>a href="#" rel="nofollow"><<span>img align="absmiddle" src="static/images/jing.jpg" alt="" />LARGERIMAGE</<span>a></<span>p> </<span>div> <<span>div id="addimages"> <<span>br />
<<span>div>第二张大图 <<span>a onclick="javascript:window.open('images/Chanel ClutchBags/Chanel-Clutch-Bags-021_01.jpg')" rel="nofollow"><<span>img src="bmz_cache/2/27cce32f767cd40252259bfbd2375c7b.image.100x66.jpg"alt="" width="100"height="66" /><<span>br/></<span>a></<span>div> <<span>div> 第三张大图<<span>aonclick="javascript:window.open('images/Chanel ClutchBags/Chanel-Clutch-Bags-021_02.jpg')" rel="nofollow"><<span>img src="bmz_cache/c/c9456e3732020407d18c51da0e4ef7c9.image.100x66.jpg"alt="" width="100"height="66" /><<span>br/></<span>a></<span>div> <<span>div>第四张大图 <<span>a onclick="javascript:window.open('images/Chanel ClutchBags/Chanel-Clutch-Bags-021_03.jpg')" rel="nofollow"><<span>img src="bmz_cache/e/e46ad62b2c82b019346fbec17d1403b4.image.100x66.jpg"alt="" width="100"height="66" /><<span>br/></<span>a></<span>div> <<span>br /> </<span>div>
这种情况是第一张大图和其他的大图规则不一样,如果像上面那种方式正则匹配的话就采集不到第一张大图了。
那就通过前后截取把那块html截取出来,然后通过去除标签、内容替换就可以把截取下来的html变成只有img标签。