Agents of Change: The new wave of crawlers, spiders and bots

One of the hallmarks of the Internet is change. The rapid rate of change surrounding the Internet, both culturally and technologically, has forced online marketers to constantly re-evaluate what they do and how they do it.

No area has escaped scrutiny. From ad design to ROI measurement, every aspect of online marketing has undergone significant changes from the early days of online advertising in the late '90s to today. Changes in technology, broadband access, and consumer mindsets continually alter the tactics used for achieving success in online marketing.

One of the greatest areas of change on the Internet has been the rise of the search engine(s) as a way to index the Web and allow users to find the content they want, when they want it. Search engine technology is one of the key factors needed to unlock the full potential of the Internet for users. In order to perform this Herculean feat, search engines rely upon an invisible army of powerful, automated software agents that go by many names, including crawlers, spiders, and bots (short for 'robot').

These software agents quickly and systematically scour a Web site, indexing its content and then following every link on every page until they've covered an entire site. Once finished, they move on to the next site in an endless quest to make all of the online content available on the Web searchable -- and therein lies the heart of the problem. How often does a search engine need to "crawl" a site in order to keep its search indexes up to date?

With online publishers, like AuntMinnie.com, where content is being added continuously, the search engines are always one or two updates behind. This means that search engine results are not as accurate as they could be -- and search accuracy is critical to the success of a search engine's business.

However if a Web site is indexed or "crawled" too often, or too quickly, it can slow down the site's Web servers and deny access to legitimate human users, or it can make the site so slow that users leave, causing a loss of traffic and revenue. Another unexpected consequence of crawling a site too often is that it skews traffic statistics and page views -- and in turn ad views as well. This can create a number of problems for marketers.

This problem has led to some basic rules regarding conduct between search engine agents and the Web sites they crawl. Some of these rules include how often a bot or agent will be allowed to crawl a site and how fast it will attempt to do it. Crawlers and bots that observe these rules are considered "social" and are usually allowed to crawl a site. Those that do not follow these rules are considered "antisocial," and Web site administrators may attempt to block the agent from crawling the site.

To compound the problem, the number of these software agents being used has increased over the past two years. Private companies and busy individuals are increasingly turning to these software agents to help them catalog Web content for their own use. A market research firm monitoring the Web for a client, or an individual's need to access Web content offline, are two examples where crawlers and bots will be used.

Sometimes these companies or individuals either ignore the rules for "social" or "polite" bots, or they don't understand the impact that they are having on their target sites. They will unleash their digital minions with instructions to repeatedly crawl a site too often or too fast. This causes some Web site administrators to respond by blocking these bots in order to ensure Web access for their primary audience.

Now new generations of antisocial bots have appeared on the net. These smart agents act more human as they index a site. They vary the speed of their page requests; they spawn multiple instances of themselves and work in parallel in an attempt to circumvent administrative blocking. Even worse, in some cases they will now attempt to sign up as a member to access restricted areas, and they can click on ads and follow ad links as well.

These activities create a new wrinkle for marketers trying to understand how their online campaigns are doing because this activity can distort page views and ad click-through statistics, and make it harder to get a clear picture of how a campaign is doing.

Every Web site will experience this new breed of Internet wildlife, and AuntMinnie.com is no exception. The question is, what we can do about it? Currently, there is no single technical magic bullet that will solve the problem. However, there are some things that can be done to identify it and manage its impact on your online campaigns.

Here are four basic steps that you can take to make sure that you're getting the most from your online marketing dollars.

  1. Look for unusual spikes in your statistic reports. If you have a time period in your report(s) that has an abnormally high number of page views or ad clicks, then ask for more details. Your AuntMinnie representative will help you get the answers you need in these situations.


  2. Go beyond only using "page views" as the only way to measure your online efforts. Page views are good, but when used in conjunction with other statistics like "unique users," "unique members," and number of pages viewed, one can gain a deeper understanding of what is happening and why. Taken together, these statistics will give you a better idea of the total number of different people seeing your ads and will help you gauge how much interest there is in a particular ad, offer or product.


  3. Optimize your marketing follow-through. When evaluating a media channel, it's easy to get solely focused on page views, ad clicks, etc. Those things do demand attention, but don't neglect the rest of your online conversion process. Where your ad links take people, the attractiveness of your call to action, how long it takes your company to follow up, and the incentives used can all have a big impact on your customer conversion ratios, and in turn, your return on investment.


  4. Be prepared to manage and adapt to this new problem. Just like viruses and spam have become permanent parts of our online experience, bot-skewed statistics will become a fact of our online marketing lives as well. In the future, we can expect an array of software and technologies that will be designed to help us manage and contain the problem, but we should not expect this to go away.


Here at AuntMinnie.com our development team and technical staff will continue to monitor and watch this problem closely on your behalf. We'll also be looking at new ways to identify, track and nullify the impact that these crawlers have on our reports for customers.

In the meantime, if you have questions or would like to discuss some ways to help your company be prepared to deal with this issue, please send me an e-mail at [email protected] or call me directly at 520-751-6847.

Page 1 of 55
Next Page