首先最好能获取一个免费代理,来继续下面的阅读和实验
也可以在本机设置代理,具体流程由于比较敏感,请自行搜索
代理设置成功后的测试网站是 http://www.httpbin.org/get , 访问该链接可以得到请求相关的信息,返回结果中的 origin 字段就是客户端的 IP , 我们可以根据它判断代理是否设置成功
urllib 的代理设置
from urllib.error import URLError from urllib.request import ProxyHandler, build_opener proxy = '218.87.205.240:22927' proxy_handles = ProxyHandler({ 'http': 'http://' + proxy, 'https': 'https://' + proxy }) opener = build_opener(proxy_handles) try: response = opener.open('https://www.httpbin.org/get') print(response.read().decode('utf-8')) except URLError as e: print(e.reason)
这里需要借助 ProxyHandler 对象设置代理, 参数是字典类型的数据,键名是协议类型, 键值是代理地址(注意,此处的代理地址前面需要地址前面加上协议, 即 http:// 或者 https://), 当请求的链接使用的是 HTTPS 协议时, 使用 http 对应的代理地址, 当请求链接使用的是 HTTPS 协议时, 使用 https 键名对应的地址。
创建完 ProxyHandler 对象之后, 调用 build_opener 方法传入该对象, 创建一个 Opener 对象,赋值为 opener 变量, 相当于此对象已经设置好了代理,。直接调用 opener 的 open 方法就可以访问该链接了
如果代理需要认证的话,只需要修改 proxy 变量就可以了
proxy = 'username:password@218.87.205.240:22927'
如果是 SOCKS 类型,那么可以这样设置
import socks import socket from urllib.error import URLError from urllib import request socks.set_default_proxy(socks.SOCKS5, '218.87.205.240', 22927) socket.socket = socks.socksocket try: response = request.urlopen('https://www.httpbin.org/get') print(response.read().decode('utf-8')) except URLError as e: print(e.reason)
这里需要导入一个 socks 模块, 安装命令: pip install PySocks
requests 代理设置
import requests proxy = '218.87.205.240:22927' proxies = { 'http': 'http://' + proxy, 'https': 'https://' + proxy } try: response = requests.get('https://www.httpbin.org/get', proxies=proxies) print(response.text) except requests.exceptions.ConnectionError as e: print(e.args)
如果需要认证
proxy = 'username:password@218.87.205.240:22927'
如果代理类型是 SOCKS
import requests proxy = '218.87.205.240:22927' proxies = { 'http': 'socks5://' + proxy, 'https': 'socks5://' + proxy } try: response = requests.get('https://www.httpbin.org/get', proxies=proxies) print(response.text) except requests.exceptions.ConnectionError as e: print(e.args)
另外还有一种设置 SOCKS 代理的方法, 即使用 socks 模块,需要安装 socks 库
import requests import socks import socket socks.set_default_proxy(socks.SOCKS5, '218.87.205.240', 22927) socket.socket = socks.socksocket try: response = requests.get('https://www.httpbin.org/get', proxies=proxies) print(response.text) except requests.exceptions.ConnectionError as e: print(e.args)
和上面的效果是一样的
httpx 的代理设置
import httpx proxy = '218.87.205.240:22927' proxies = { 'http://': 'socks5://' + proxy, 'https://': 'socks5://' + proxy } try: with httpx.Client(proxies=proxies) as client: response = client.get('https://www.httpbin.org/get') print(response.text) except: print('Error')
如果报错说,少库, 就按照提示, 手动安装一下
如果需要认证
proxy = 'username:password@218.87.205.240:22927'
对于 SOCKS 代理, 需要安装 httpx-socks[asyncio] 库 pip install httpx-socks[asyncio]
同步模式
import httpx from httpx_socks import SyncProxyTransport transport = SyncProxyTransport.from_url('socks5://218.87.205.240:22927') with httpx.Client(transport=transport) as client: response = client.get('https://www.httpbin.org/get') print(response.text)
异步模式
import httpx import asyncio from httpx_socks import AsyncProxyTransport transport = AsyncProxyTransport.from_url('socks5://218.87.205.240:22927') async def main(): async with httpx.AsyncClient(transport=transport) as client: response = await client.get('https://www.httpbin.org/get') print(response.text) if __name__ == '__main__': asyncio.run(main())
Selenium 代理设置
from selenium import webdriver proxy = '218.87.205.240:22927' options = webdriver.ChromeOptions() options.add_argument('--proxy-server=http://' + proxy) browser = webdriver.Chrome(options=options) browser.get('https://www.httpbin.org/get') print(browser.page_source) browser.close()
如果需要认证
from selenium import webdriver from selenium.webdriver.chrome.options import Options import zipfile ip = '127.0.0.1' port = 7890 username = 'foo' password = 'bar' manifest_json = """{"version":"1.0.0","manifest_version": 2,"name":"Chrome Proxy","permissions": ["proxy","tabs","unlimitedStorage","storage","<all_urls>","webRequest","webRequestBlocking"],"background": {"scripts": ["background.js"] } } """ background_js = """ var config = { mode: "fixed_servers", rules: { singleProxy: { scheme: "http", host: "%(ip) s", port: %(port) s } } } chrome.proxy.settings.set({value: config, scope: "regular"}, function() {}); function callbackFn(details) { return { authCredentials: {username: "%(username) s", password: "%(password) s" } } } chrome.webRequest.onAuthRequired.addListener( callbackFn, {urls: ["<all_urls>"]}, ['blocking'] ) """ % {'ip': ip, 'port': port, 'username': username, 'password': password} plugin_file = 'proxy_auth_plugin.zip' with zipfile.ZipFile(plugin_file, 'w') as zp: zp.writestr("manifest.json", manifest_json) zp.writestr("background.js", background_js) options = Options() options.add_argument("--start-maximized") options.add_extension(plugin_file) browser = webdriver.Chrome(options=options) browser.get('https://httpbin.org/get') print(browser.page_source) browser.close()
这里在本地创建了一个 manifest.json 配置文件 和 backgound.js 脚本来设置认证代理,运行代码后, 本地会生成一个 proxy_auth_plugin.zip 文件来保存当前配置
SOCKS 代理设置
from selenium import webdriver proxy = '127.0.0.1:7891' options = webdriver.ChromeOptions() options.add_argument('--proxy-server=socks5://' + proxy) browser = webdriver.Chrome(options=options) browser.get('https://httpbin.org/get') print(browser.page_source) browser.close()
aiohttp 的代理设置
import asyncio import aiohttp proxy = 'http://127.0.0.1:7890' async def main(): async with aiohttp.ClientSession() as session: async with session.get('https://httpbin.org/get', proxy=proxy) as response: print(await response.text()) if __name__ == '__main__': asyncio.get_event_loop().run_until_complete(main())
如果需要认证:
proxy = 'http://username:password@http://127.0.0.1:7890'
SOCKS 代理设置, 需要安装一个支持库 pip install aiohttp-socks
import asyncio import aiohttp from aiohttp_socks import ProxyConnector, ProxyType # connector = ProxyConnector.from_url('socks5://127.0.0.1:7891') connector = ProxyConnector( proxy_type=ProxyType.HTTP, host='127.0.0.1', port=7890, # username='user', # password='password', # rdns=True ) async def main(): async with aiohttp.ClientSession(connector=connector) as session: async with session.get('https://httpbin.org/get') as response: print(await response.text()) if __name__ == '__main__': asyncio.get_event_loop().run_until_complete(main())
Pyppeteer 的代理设置
对于 pyppeteer ,默认的是 Chrome 的 Chromium 浏览器
import asyncio from pyppeteer import launch proxy = '127.0.0.1:7890' async def main(): browser = await launch({'args': ['--proxy-server=http://' + proxy], 'headless': False}) page = await browser.newPage() await page.goto('https://httpbin.org/get') print(await page.content()) await browser.close() if __name__ == '__main__': asyncio.get_event_loop().run_until_complete(main())
SOCKS 代理设置
import asyncio from pyppeteer import launch proxy = '127.0.0.1:7891' async def main(): browser = await launch({'args': ['--proxy-server=socks5://' + proxy], 'headless': False}) page = await browser.newPage() await page.goto('https://httpbin.org/get') print(await page.content()) await browser.close() if __name__ == '__main__': asyncio.get_event_loop().run_until_complete(main())
PlayWright 的代理设置
from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(headless=False, proxy={ 'server': 'http://127.0.0.1:7890' }) page = browser.new_page() page.goto('https://httpbin.org/get') print(page.content()) browser.close()
在调用 launch 方法的时候, 可以传入 proxy 参数, 它是一个字典, 其中有一个必填的字段叫作 server 这里我们直接填入 HTTP 代理的地址即可
需要认证的时候
from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(proxy={ 'server': 'http://127.0.0.1:7890', 'username': 'foo', 'password': 'bar' }) page = browser.new_page() page.goto('https://httpbin.org/get') print(page.content()) browser.close()
SOCKS 代理
from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(proxy={ 'server': 'socks5://127.0.0.1:7891' }) page = browser.new_page() page.goto('https://httpbin.org/get') print(page.content()) browser.close()