New CAPTCHA Solver Developed For Dark Web Research
A new machine-learning-based CAPTCHA solver has been developed by researchers at the Universities 0f Georia, Arizona and South Florida that claims to solve 94.4% of CAPTCHA challaenges on dark websites.
The purpose of this study was to find a way of streamlining cyber threat intelligence, which would normally need human interraction to solve CAPTCHAs manually. In a time where cybercrimes are on the rise, and the costs involved are rising equally fast, this study could be crucial in developing a method of targeted preventative action.
Deep Web CAPTCHAs
The deep web, or dark web, is the online central hub of cyber crimes due to the ease of anonymity. To access most sites on the dark web, the user must solve a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) to prove that the user is not a bot. These challenges are common on these sites to protect against DDoS attacks. These attacks are usually carried out by botnets, and having a strong CAPTCHA prevents such attacks.
As such, each website has their own customer CAPTCHA challenge, which makes developing a tool that can solve all of them almost impossible.
The Machine Learning Approach
In order to try overcome this issue, the research team developed a system that works by interpreting rasterized images, which is vastly different to the other attempts to create a CAPTCHA solver.
Their new solver is able to distinguish letters and numbers by isolating them looking at each one individually, denoising the image by removing any background static, identifying the borders between letters and then breaking the CAPTCHA up into individual characters.
Process of denoising and separation of characters
Because of this method, the complexity and size of the CAPTCHA doesn’t affect how effective the solver is.
Solving rates for different sizes
When it coems to character recognition, the solver has a sample group extracted from multiple local regions to identify fine-grained aspects like edges and lines. This makes it easier to identify characters when rotation, font sizes and color changes are involved.
The paper released by the researchers stated that:
“Using a crawler enhanced by our DW-GAN, we were able to collect 1,831 illegal products from Yellow Brick. Among these products, there were 286 cybersecurity-related items, including 102 stolen credit cards, 131 stolen accounts, 9 forged document scans, 44 hacking tools, and 1,223 drug-related products.
Overall, collecting “Yellow Brick” market intelligence with DW-GAN took about 5 hours without human involvement. In particular, each HTTP request took 8.8 seconds for loading a new webpage; therefore crawling 1,831 pages took 268.5 minutes. Solving the recurring CAPTCHA challenges (per 15 HTTP requests) took our DW-GAN crawler 18.6 seconds.
Overall, the proposed framework could automatically break CAPTCHA with no more than three attempts. Breaking all CAPTCHA images take about 76 minuets [sic] in total for all 1,831 product pages, a process that is fully automated.”
Real World Implications
A tool like this can disrupt the space on the dark web with the intention of tackling cybercrim, however it does have the potential to affect those that use the dark web for anonymity.