Apache :: Blocking Bad Bots With HTAccess - What Is The Right Syntax
Apr 23, 2015
I am having a problem with blocking bots using .htaccess. I think I tried all possible syntax variants, yet all the bots that I am blocking get HTTP 200 response instead of 403 (I can verify it using access log).
I am using Apache 2.4 running on Ubuntu 14.04.2 with Plesk 12.0.18.
My AllowOverride is set to allow the use of .htaccess files, so .htaccess file gets loaded: when I make an error in .htaccess sysntax I can see the error in the error log and the webpages don't load. Besides, I have some "Deny from [IP address]" directives in the .htaccess and I see that these IPs get HTTP 403 response when access my site.
I spent hours trying different variants of .htaccess syntax (see below) and neither seems to work...
variant 0:
SetEnvIfNoCase User-Agent LivelapBot bad_bot
SetEnvIfNoCase User-Agent TurnitinBot bad_bot
Order allow,deny
Allow from all
Deny from env=bad_bot
If I know the IP range that I want to block the best option is to block it with IPTABLES. This works well when you want to block entire countries. But what happens when you want to block specific IPs rather than ranges? Is iptables still more effective than "deny from [IP]" in .htaccess? I read that you don't want iptables to grow too big as it slows performance, but I guess it is still more effective than having big .htaccess..?
When it comes to blocking spam bots or referrers, robots.txt is just a suggestion for bots, when I looked at my traffic logs I noticed that most bots don't even look at robots.txt file. As far as I understand the only option here is to use .htaccess
1. I am currently using this in my .htaccess: SetEnvIfNoCase User-Agent *ahrefsbot* bad_bot=yes SetEnvIfNoCase Referer fbdownloader.com spammer=yes ... SetEnvIfNoCase Referer social-buttons.com spammer=yes Order allow,deny Allow from all Deny from env=spammer Deny from env=bad_bot
2. Apparently, there is another approach as per below: # Deny domain access to spammers RewriteEngine on RewriteBase / RewriteCond %{HTTP_USER_AGENT} queryseeker [OR] RewriteCond %{HTTP_REFERER} ^(www.)?.*(-|.)?adult(-|.).*$ [OR] ... RewriteCond %{HTTP_REFERER} ^(www.)?.*(-|.)?sex(-|.).*$ RewriteRule .* - [F,L]
Which approach is better #1 or #2? Any better alternative?
Finally, somebody suggested that you need to have both (as per example below). Is it true?
I'm tired of india people hitting our website (because it is a top hit on google and the others) then calling the next day to bug me to use them for outsourcing.
I am going to block some IP blocks in my .htaccess file to prevent this.
I can see from my statcounter logs that the hits from india so far have come from 59.* 102.* and 203.* (as in 59.###.###.###).
Is there a place I can lookup to find out if I block those, will I will also be blocking some north america IPs (since I'm using such a broad wildcard)?
All our paying business comes from north america.
my htaccess file will look like this:
Code: # prevents a directory listing when typing in the directory path in the browser Options -Indexes # # My effort to keep india sites from seeing our website order allow,deny deny from 203. deny from 59. allow from all
Looking trough my logs I found something that bothers me, there are bots who keep doing requests on my website with pages like /admin or /secure to find vulnerabilities. It's making about 5-6 requests for unexisting pages every second until it comes to the end of it's dictionary (the pages are even sorted in alphabetical order,
Is there some way to let my Apache server block access to these bots when they make X attemps to see a page who does not exists in a short amount of time? A bit like iptables reject connection if someone tries to log in but fails to do so too many times.
I've been trying to figure out some IP blocking with no success. The environment is UNIX and Apache version is 2.2.22-14.
The site is on a hosted solution and doesn't have a firewall due to the virtualization software limitations. I've tried setting something similar to the following:
Code:
<Directory /home/username/mysite.com> #IP Blocks Deny from 1.2.3.1/24 Deny from 1.2.4.5 etc..
but with no success. I've also tried it in the <Location> tag with no success.
The way I'm testing this is editing the conf and then bouncing the apache server.
I have a website on a linux-server working fine with PHP/Apache. The page loads a lot of css/js/image-stuff (total 84 requests, 220k), it takes about 4 secs to load via internet.
Now I'm testing the same page locally on a Win7-64-system (Apache 2.2, PHP 5.4). The system is not very slow (8 GB RAM, SSD, i7-CPU), but loading the same page as above takes about 50 secs.
The Q is: What might be the problem?
- I turned off firewall and anti-virus.
- I used mod_status: 150 threads, max. 11 seems to be used during the loading of the page.
- I tried php5apache2_2.dll with TS-PHP 5.4 and mod_fcgid.so with NTS-PHP 5.4, but the loading-time kept almost the same.
Looking at the "network-tab" in FF or Chrome, I found that a lot of subqueries get a timing like this:
Blocking: 11.96 s Sending: 0 Waiting: 1 ms Receiving: 6ms
So the loss of time seem to be in the "blocking"-section. I first thought of something like "limited number of TCP-Connections", but as said above, on the same system the page is remotly loaded fast enough almost without these "blocking"-parts.
I have been trying to solve a big problem for the last 2 weeks with one of our servers.
The client using our system (web based w/ apache and php) is a contact center firm. They have about 120 operators, all connect to our websever with the same IP.
We have been suffering DoS attacks from some of these operators. This are simple, browser attacks , namely 5 or 10 operators will just hold F5 key and bombard the server with requests when they shouldnt.
We did manage to produce a php protection which will recognize the multiple requests and blacklist the user, but its "too late" because the request have already been sent and processed by the webserver.
We use the user ID in the system to control who should be blacklisted, so this is all dependent on our own authentication.
Ideally, we need something EXACTLY like mod_evasive, but for rejecting single requests instead of blocking the IP. Exemplifying : if a user calls the same url, 5 times, in a 3 second spawn, we will reject every next request for 30 seconds, but only the requests by that user.
If the webserver can make any use of it, the user id is stored in a cookie.
This redirect works fine on Apache 2.2.8, but doesn't work on Apache 1.3.41
The following is the entry from error_log: RewriteRule: cannot compile regular expression '^sap-latest-news/([0-9]*)/([A-Za-z0-9_-.]*).htm$'
A simple Rewrite is working fine in Apache 1.3, but the above regualar expression doesn't seem to be working on Apache 1.3. Does anyone know whether Apache 1.3 doesn't support it?
I took the 1st one this morning and the 2nd one few hours later. It was filling up my vpss numtcpsock, which slowed down my vps dramtically. Any tips or suggestions? Is there a way to lower the number of numtcpsocks early morning
I am running into a bit of a problem. Previously, I can add an apache handler through cPanel easily... but now I moved to mediatemplate.net and that feature isn't available on their contol panel. Since they also run apache, I figured that I can set the apache handler manually through an .htaccess file - is that possible? If so, what is the syntax that will enable me to set .html files to be handled like .php files?
There is one setback to this process - this will be a manually inserted file for every directory that I want to do the above stated file handler. Is there an easier way to do this via SSH? I don't have root access, only normal user access.
I am have trouble using .htaccess and .htpasswd to password-protect a directory on my web server. How do I use .htaccess and .htpasswd to protect a directory?
I have a directory of /flash/1.swf all the way to /flash/4000.swf on my hosting.I am trying with terrible luck to load the flash movies into the container on zomgflash dot com(avoiding possible spam) -- its all setup...it should be loading the movies...I think it's a mod_rewrite problem but I could be wrong...it's basically a clone of iwantmoar dot com(avoiding spam again) -- you'll notice when you type a number after the domain like /238 or /2898 that it loads up /flash/238.swf or /flash/2898.swf into the flash container on the iwantmoar website. That's what I'm trying to get my website to do.
I need them shown like [URL] .... in the address bar... and that would have index.php load up 1.swf in the container if that makes more sense.This is the contents of my .htaccess from my relentless googling:
Implemented WordPress a little while ago via cPanel's Fantastico widget -- vanilla implementation.
Just about every day, I get spam comments in the blog's Inbox for moderation.
Was wondering if folks had general tips on how to prevent or minimize this sort of nuisance and make the blog less bot-accessible, and/or where I might read up on ways to do so.
to: /some-product-name (e.g. /ichiban-fairly-offensive-sweat-in-grey)
We have 1,090 products and 60+ categories so some form of .htaccess trick would be amazing to know. What I could put in the .htaccess to accomplish this.
What I want to do is this: The URL is like somedomain.sub.com/somepage/s1/s2The index.php is accessible from somedomain.sub.com/somepage/I want to send s1/s2 as $_GET['page']
Also, I don't want the URL in the address bar to change, only the url sent to the server should change. This worked well in my localhost, but on webserver (0fees.net), it doesn't work ...
I'd like to change /comp.php to /comp but I have only found articles on how to remove .php completely and I don't want to do that, only want to do it for this one file.
One site just linked to my website with incorrect URL as URL.. want to correct this by redirecting the URL to URL.... Therefore, I add the following line in my .htaccess, as follows: Redirect 301 /aor/%e2%80%9d URL...
However, this does not work. When I input URL... in Firefox or IE, the browser still said the page not found(404) error.