Bypassing CAPTCHAs

CAPTCHA [kap-chuh]
noun, Digital Technology

1. an online test designed so that humans but not computers are able to pass it, used as a security measure and usually involving a visual-perception task:

Site visitors must solve the “distorted text” CAPTCHA before posting comments.

2. a computer program that generates such tests.

Many sites use a CAPTCHA to prevent bots or automation from using functions like email for… nefarious purposes.

However, there’s not any one standard–nothing is stopping people from using home-brewed solutions, or ones that either have vulnerabilities or can be gotten around.

Image recognition and OCR are not the only ways to get past a CAPTCHA. If the software you test uses these, this is a fun way to find a bug that really oughta be fixed.

Today I’ll demonstrate a method using DOM manipulation.

Here We Go!

Step 1: Find a wild CAPTCHA:

captcha

This code is mAYr3QK. We’ll use it later on.

Step 2: Right-click it and select “Inspect Element” in the menu (Chrome and Firefox):

captcha inspect element

Then take a look at the code driving the CAPTCHA:

Image CAPTCHA

What code is in the image? *

Enter the characters shown in the image.

</div>
</div>

Try this a few times and notice what changes and what doesn’t. In this case, there are four pieces of info that look like they’re being dynamically generated:

  • two hidden input fields, called captcha_sid and captcha_token.
  • a sid and a ts parameter in the img
Be sure to drink your mAYr3QK
Be sure to drink your mAYr3QK

Pieces of data like this are generated as a result of making the CAPTCHA. There’s an algorithm that reads them back in and rebuilds the CAPTCHA string, then compares it to what you entered. This data is sent out in the HTML code, so that it can be picked back up. It’s kind of like a secret decoder ring. It also helps the server to not have to remember what it sent to what user.

But, since we have that data in our hands, we’re allowed to change it in the browser. So let’s try it.

Get another CAPTCHA generated (either by refreshing the page, or clicking to get a new one, if available), and repeat the actions of getting the code mentioned above:

Image CAPTCHA

What code is in the image? *

Enter the characters shown in the image.

</div>
</div>

At this point, we’ll change the captcha_sid value, the captcha_token value, and the img tag’s src attributes, to match the ones that we got the first time.

To change an attribute, double click it in the window in your browser, and modify it how you want. If you changed it to match the code for the previous maYr3QK code, then your code should look like this:

11345264">
50d2e34477aebd2c402be8f980554ae4">
/en/image_captcha?sid=11345264&ts=1433615246" width="252" height="60" alt="Image CAPTCHA" title="Image CAPTCHA">

What code is in the image? *

Enter the characters shown in the image.

</div>
</div>

Since this is the exact same setup you had when you got the first CAPTCHA, the image on the screen should change to the code that you already know about.

From there, try entering the known code and see if you can fool the site into doing whatever the CAPTCHA was meant to protect.

Automating It

So the cool part of this is, all of this can be automated. If you’re familiar with UI automation tools like Watir or Selenium, you may know that you can run Javascript directly in a script.

Javascript can be used to identify an element in the DOM, and then modify certain attributes on-the-fly.

This snippet will give you an idea of how to do it–this is in Ruby, meant to be used for Watir (note: this is cannibalized from code I had awhile back, so it might not work), and demonstrates how you can use Javascript to modify attributes for specific elements:

$b.execute_script("var element = document.evaluate('//input[@name="captcha_sid"]', document, null, XPathResult.ANY_UNORDERED_NODE_TYPE, null ).singleNodeValue;

if (element != null)
{
element.setAttribute('value', '11278389')
}
")

Combating It

If your software can have this done, the CAPTCHA is able to be gotten around, automatically. Probably not good. Here are some possible ways to combat this happening to you:

  • If you’re using a custom CAPTCHA that you built, consider having something in the hidden values that contains what time the CAPTCHA was generated. If you get back values that indicate a creation longer ago than seems reasonable for a person sitting there filling it in, reject the answer.
  • If you have the horsepower, you could feasibly remember all the CAPTCHA challenge strings sent to an IP address. If a new one is requested, drop the previous one and store the new one. That approach would prevent any data getting sent out that someone could use to bypass your challenge.
  • Try a stronger type. reCAPTCHA is pretty good (also helps OCR software for eBooks know what certain words are that it has trouble with!).

And, if you have a nice streak, and find this tactic working for a site you like, consider taking the time to explain how you got past it. You can even link to this post.

Have fun, and happy testing!

– Fritz

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s