• Home
  • Sports
    • Rowing
    • Martial Arts
    • Tricycles
      • About My Tricycle
      • Some Adventures
      • Health Issues
      • Upgrade How-To
      • Difficult Weather
      • How to Buy a Trike
      • Cycling vs. Automobiles
    • Sailing
    • Walking
    • Flying
  • Spirit
    • A few teachers
      • Robert Adams
      • Amber
      • Ibn El'Arabi
      • Meister Eckhart
      • Douglas Harding
      • Brother Lawrence
      • Ramana Maharshi
      • Nisagardatta
      • Rabia
      • Rinzai
      • Jalalud'din Rumi
      • Morihei Ueshiba
      • Ulla
      • Wei Wu Wei
    • Religious Fanaticism
    • The Guru Game
    • Hucksterism
    • The Poonja Crowd
    • Zen and Sore Knees
    • Oprahism
    • Advaita/Nondual
    • Newage Victims
    • Christianity
      • Borrowed Myths
      • Censorship of Ideas
      • Ensuring Falliblity
      • The Modern Inquisition
      • Religious Fanaticism
    • Islamic Thought
    • Meditation for Gain
    • Buddhism
    • Martial Arts
  • Philosophy
    • Doxa
    • Straussian Superiority
    • Metanoia
    • Jus ad Bellum
    • Morality
    • Indeterminism
    • Core Beliefs
    • Neorological Morality
    • High Outliers
    • Maleable Beliefs
  • Obliteration
    • Unending war
    • Undercounting the dead
    • Military Spending
    • Helping despots
    • Arms dealing
    • Prison Systems
    • Slavery
    • Kakistocracy
    • Guns for all!
    • Altnerative to war
    • Justification for war
  • Education
    • Pedagogy
    • Mass Illiteracy
    • Bookburning
    • Inhibiting Learning
    • Accreditation
    • Anti-intellectualism
  • Science
    • Indeterminism
    • Death of Science
    • Tordesillas Lunar
    • Global Cooling
    • Narrative Theory
    • Neuroimaging
    • Overpopulation
    • Environmental Ecocide
    • Deep Structure
    • Political Ignorance
    • Language
    • Computer rights
      • Encryption
      • Proxies
      • DNS Privacy
      • Simple Firewall
      • Block Access
      • Secure Remote
      • Block Bots
      • Run a VPN
      • Secure Backup
  • Social
    • Media Control
    • The End of Democracy
    • Ensuring Obedience
    • Creating Fear
    • Altering Core Beliefs
    • Nothing to Hide
    • Redirect Thought
    • Doublespeak
    • Trivia as News
    • Big Brother
    • Mass Censorship
  • Economics
    • What is Money?
    • Trickle-Up Economics
    • Economic Value
  • Medicine
    • Forcing Patients
    • Neuroimaging
    • Medical Ineptitude
    • Modern Phrenologists
    • Dignity in Death
    • Cause of Illness
    • Personality Testing
  • Art
    • Homemade Flutes
    • Tiny Music Studio
    • Small Painting Studio
  • About
    • About me
    • Terms of Use
    • Contact me
    • SiteMap

Bad WebBots

If you run a website, then you need a robots.txt file at the top of your hierarchy to indicate which pages may be scanned by bots, and which may not. Good bots like the google bot obey your robots.txt instructions and do not index things you forbid. Bad bots ignore your robots.txt file and look at everything they can find.
But it gets worse. Bad bots can cost you money. They come back to your site many times an hour using your bandwidth. Or they scan all of your pages many times a minute, effectively doing a denial-of-service attack by blocking others from visiting your site while monopolising your server. Some companies running these bots are not nice, they steal your content ignoring copyright. For example, if you have a terms of use policy on your site, they ignore it. If you have a copyright notice, they ignore it. And so on.
Sadly the number of bad bots on the net has been rapidly increasing. Most are sources from three countries - the United States, Russia, and China. But there are others - Brazil and Korea (North and South) being home to lots of these. There are automated scripts which will recognise the activity of most bad bots, and automatically add the IPs they use to a deny list. Unfortunately, they only work up to a point. Sometimes you just have to get your hands dirty, and block the bots manually.
Here’s how to do it with the world’s best web server software, Nginx. This little bit of code scans the user_agent field as it arrives at your site and before a page is served. If any of the following bot names are found in that field, further activity of the bot is blocked by returning an error page.
To use this script, just put it in a file somewhere in above your Nginx root - a good place is in the ’CONF’ directory. Call the file ’badbotblock.txt’. Then in your sites-available directory, (or wherever you have placed the config file for your virtual host), add the line ’include ../CONF/badbotblock.txt’. Easy an image - please see terms of use
# Note that the ’~*’ means if the following sub-string is found, regardless of case
# Just add the names you want to the list below, with the ’|’ between them - my own list contains a few hundred bad bots
  • if ($http_user_agent ~*
    • (BlackWidow|ChinaClaw|Custo|DISCo|eCatch|EirGrabber|EmailSiphon|EmailWolf| 
      • Zeus|WebStripper|WebWhacker|WebReaper|WebLeacher|WebFetch|VoidEYE|tAkeOut|SuperHTTP|ApptusBot)) 
        { 
        return 403; # or return 444 if you wish
        } 


Back to the top of this page
Copyright © 2012 by peter at peter.ca. All rights reserved.