Python 爬虫:使用打码平台来识别各种验证码:

avatar
作者
猴君
阅读量:0

本课程使用的是 超级鹰 打码平台, 没有账户的请自行注册!

超级鹰验证码识别-专业的验证码云端识别服务,让验证码识别更快速、更准确、更强大

使用打码平台来攻破验证码难题, 是很简单容易的, 但是要钱!

案例代码及测试资源:

git clone https://github.com/Python3WebSpider/CaptchaPlatform.git

 使用git 将资源拽取下来, 然后你会发现多了一个文件夹, 文件夹中有一个chaojiying.py 文件, 这里面就是基于官方 SDK 改写的代码:

username: 用户名 , 你注册的超级鹰用户。

password:密码

soft_id:  软件ID。

import requests from hashlib import md5  class Chaojiying(object):          def __init__(self, username, password, soft_id):         self.username = username         self.password = md5(password.encode('utf-8')).hexdigest()         self.soft_id = soft_id         self.base_params = {             'user': self.username,             'pass2': self.password,             'softid': self.soft_id,         }         self.headers = {             'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',         }          def post_pic(self, im, codetype):         """         im: 图片字节         codetype: 题目类型 参考 http://www.chaojiying.com/price.html         """         params = {             'codetype': codetype,         }         params.update(self.base_params)         files = {'userfile': ('ccc.jpg', im)}         r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files,                           headers=self.headers)         return r.json()          def report_error(self, im_id):         """         im_id:报错题目的图片ID         """         params = {             'id': im_id,         }         params.update(self.base_params)         r = requests.post('http://upload.chaojiying.net/Upload/ReportError.php', data=params, headers=self.headers)         return r.json()

图片验证码:

CAPTCHA_KIND 则为图片的类型, 可以在 验证码类型与价格表-超级鹰验证码识别 看到。

from chaojiying import Chaojiying  USERNAME = '136xxxx108' PASSWORD = 'xxxxxx' SOFT_ID = 'xxxxxxxx' CAPTCHA_KIND = '1006' FILE_NAME = 'captcha1.png' client = Chaojiying(USERNAME, PASSWORD, SOFT_ID) result = client.post_pic(open(FILE_NAME, 'rb').read(), CAPTCHA_KIND) print(result)

 点选验证码:

from chaojiying import Chaojiying  USERNAME = 'xxx' PASSWORD = '' SOFT_ID = 'xxxxxx' CAPTCHA_KIND = '9004' FILE_NAME = 'captcha2.png' client = Chaojiying(USERNAME, PASSWORD, SOFT_ID) result = client.post_pic(open(FILE_NAME, 'rb').read(), CAPTCHA_KIND) print(result)

 得到的响应结果为:

{'err_no': 0, 'err_str': 'OK', 'pic_id': '2256514491185230017', 'pic_str': '118,177|249,173', 'md5': 'e89f632e91cc6b8a85dad2fbbc13c803'}

 可以看到图片的坐标信息为: '118,177|249,173'  使用opencv 技术来标记这个点测试一下:

import cv2  image = cv2.imread('captcha2.png') image = cv2.circle(image, (108, 133), radius=10, color=(0, 0, 255), thickness=-1) image = cv2.circle(image, (227, 143), radius=10, color=(0, 0, 255), thickness=-1) cv2.imwrite('captcha2_label.png', image)

 滑块验证码:

from chaojiying import Chaojiying  USERNAME = '136xxxx08' PASSWORD = 'hxxxxx.' SOFT_ID = '9xxxx' CAPTCHA_KIND = '9101' FILE_NAME = 'captcha5.png' client = Chaojiying(USERNAME, PASSWORD, SOFT_ID) result = client.post_pic(open(FILE_NAME, 'rb').read(), CAPTCHA_KIND) print(result)

{'err_no': 0, 'err_str': 'OK', 'pic_id': '1256519431185230022', 'pic_str': '218,96', 'md5': '627d620bccd9a6dd1366329b951f1511'}

使用OpenCV测试验证一下:

import cv2  image = cv2.imread('captcha2.png') image = cv2.circle(image, (231, 85), radius=10, color=(0, 0, 255), thickness=-1) cv2.imwrite('captcha3_label.png', image)

可以看到, 不是很准确,  我们可以向打码平台的工作人员传递一些信息, 尽可能的标记的准确一些:

from chaojiying import Chaojiying  import cv2 from PIL import ImageFont, ImageDraw, Image import numpy as np import io   def cv2_add_text(image, text, left, top, textColor=(255, 0, 0), text_size=20):     image = Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))     draw = ImageDraw.Draw(image)     font = ImageFont.truetype('simsun.ttc', text_size, encoding="utf-8")     draw.text((left, top), text, textColor, font=font)     return cv2.cvtColor(np.asarray(image), cv2.COLOR_RGB2BGR)   USERNAME = '136XXXX08' PASSWORD = 'hXXXXXXXXXXX..' SOFT_ID = '9XXXXXXX2' CAPTCHA_KIND = '9101' FILE_NAME = 'captcha3.png' image = cv2.imread(FILE_NAME) image = cv2_add_text(image, '请点击目标滑块左上角', int(image.shape[1] / 10), int(image.shape[0] / 2), (255, 0, 0), 40) client = Chaojiying(USERNAME, PASSWORD, SOFT_ID) result = client.post_pic(io.BytesIO(cv2.imencode(     '.png', image)[1]).getvalue(), CAPTCHA_KIND) print(result)   

广告一刻

为您即时展示最新活动产品广告消息,让您随时掌握产品活动新动态!