如何设置规则来拦截蜘蛛抓取(Apache与IIS)

网站建设 2022-07-27 09:12www.1681989.com免费网站
很多时候,我们都希望网站被大部分的搜索引擎抓取,以此来获取更多流量,实现价值,不少小型站点因为不可预知的原因导致大量搜索引擎蜘蛛出啊去网站,势必会暂用很大流量 […]



很多时候,我们都希望网站被大部分的搜索引擎抓取,以此来获取更多流量,实现价值,不少小型站点因为不可预知的原因导致大量搜索引擎蜘蛛出啊去网站,势必会暂用很大流量,如下所示

我们一般可以在网站的访问日志里看到蜘蛛的爬行记录,如果蜘蛛爬行过多,会造成网站服务器崩溃,影响正常用户的体验。于是,我们需要对一些无用的搜索引擎蜘蛛进行封禁,禁止其爬取我们的网站,余斗一般不建议封禁国内的主流搜索引擎蜘蛛,常见的几种搜索引擎蜘蛛如下

google蜘蛛googlebot

百度蜘蛛baiduspider

yahoo蜘蛛slurp

alexa蜘蛛ia_archiver

msn蜘蛛msnbot

bg蜘蛛bgbot

altavista蜘蛛scooter

lycos蜘蛛lycos_spider_(t-rex)

alltheweb蜘蛛fast-webcrawler

ktomi蜘蛛slurp

有道蜘蛛YodaoBot和OutfoxBot

热土蜘蛛Admrtspider

搜狗蜘蛛sogou spider

SOSO蜘蛛sosospider

360搜蜘蛛360spider

Lux下 规则文件.htaess(手工创建.htaess文件到站点根目录)


<IfModule mod_rewrite.c>
RewriteEnge On
#Block spider
RewriteCond %{HTTP_USER_AGENT} "Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LkFder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jee|SWEBot|spbot|TurnitBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu" [NC]
RewriteRule !(^robots\.txt$) - [F]
</IfModule>
 

wdows2003下修改规则文件httpd.conf(在虚拟主机控制面板中用“ISAPI筛选器自定义设置 ” 开启自定义伪静态 Isapi_Rewite3.1)


#Block spider
RewriteCond %{HTTP_USER_AGENT} (Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LkFder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jee|SWEBot|spbot|TurnitBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu) [NC]
RewriteRule !(^/robots.txt$) - [F]
 

wdows2008下修改根目录配置文件web.config


<?xml version="1.0" encodg="UTF-8"?>
<configuration>
        <system.webServer>
                <rewrite>
                        <rules>
<rule name="Block spider">
            <match url="(^robots.txt$)" ignoreCase="false" negate="true"/>
            <conditions>
                <add put="{HTTP_USER_AGENT}" pattern="Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LkFder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jee|SWEBot|spbot|TurnitBot-Agent|curl|perl|Python|Wget|Xenu|ZmEu" ignoreCase="true"/>
            </conditions>
            <action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Forbidden"/>
</rule>
                        </rules>
                </rewrite>
        </system.webServer>
</configuration>
 

注规则中默认屏蔽部分不明蜘蛛,要屏蔽其他蜘蛛按规则添加即可,对照修改代码中Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LkFder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jee|SWEBot|spbot|TurnitBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu部分来增删自己要封禁的蜘蛛即可。


Copyright © 2016-2025 www.1681989.com 推火网 版权所有 Power by