阅读量:4
需求场景
获取网站点击的下载pdf,并把pdf重命名再上传到COS云上面
技术使用
“puppeteer”: “^19.7.2”,
“egg”: “^3.15.0”, // 服务期用egg搭的
文件服务使用COS腾讯云
核心思路
获取浏览器下载事件,并把文件保存到本地
const session = await substitutePage.target() .createCDPSession(); await session.send('Page.setDownloadBehavior', { behavior: 'allow', downloadPath, // 指定文件保存路径回家 });
在保存到本地前监听此文件夹,如果有文件则获取并上传
加timer做防抖是为了防止在文件写入时以及重命名文件时多次触发watch函数,导致出会出现0KB源文件脏数据
let timer: any = null; fs.watch(downloadPath, async (_eventType, filename) => { if (timer !== null) { clearTimeout(timer); } timer = setTimeout(() => { // 防止出现下载的临时文件就触发 if (filename.endsWith('.pdf')) { resolve({ filename, }); } }, 500); });
完整代码
const session = await substitutePage.target() .createCDPSession(); await session.send('Page.setDownloadBehavior', { behavior: 'allow', downloadPath, // 指定文件保存路径回家 }); // res就是文件相关信息了 const [ res ] = await this.downloadPdfHandler(substitutePage, downloadPath); // filePath就是自己本地的文件所在绝对路径 const filePath = `${downloadPath}/${res.fileName}`; // uploadFile是cos文件上传相关实现, 我就不放了,有私密的key const pdfUriCode = await this.uploadFile(filePath, filePath); const pdfUri = decodeURIComponent(pdfUriCode); this.domainList = { pdfSize: res.pdfSize, pdfUri: pdfUri.substring(pdfUri.indexOf('root')), };
downloadPdfHandler函数实现
downloadPdfHandler(page, downloadPath): Promise<any> { const uuidName = uuidv4(); const fsWatchApi = () => { // 使用防抖是为了防止下载的文件没有写入完全就重命名,那样会产生一个脏文件 let timer: any = null; return new Promise<{ filename: string }>(resolve => { fs.watch(downloadPath, async (_eventType, filename) => { if (timer !== null) { clearTimeout(timer); } timer = setTimeout(() => { // 防止出现下载的临时文件就触发 if (filename.endsWith('.pdf')) { resolve({ filename, }); } }, 500); }); }); }; function responseWatchApi() { return new Promise<void>(resolve => { page.on('response', async response => { // 检查响应是否为application/octet-stream且可能包含PDF(或你期望的其他文件类型) if (response.headers()['content-type'].startsWith('application/octet-stream')) { resolve(); } }); }); } return new Promise(async (resolve, reject) => { try { const [ , { filename }] = await Promise.all([ responseWatchApi(), fsWatchApi() ]); const oldFilePath = path.join(downloadPath, filename); const newFilePath = path.join(downloadPath, `${uuidName}.pdf`); try { fs.renameSync(oldFilePath, newFilePath); this.logger.info(`文件名已经被修改完成:${uuidName}`); } catch (error) { this.logger.info(`文件名已经被修改完成:${uuidName}`); } await this.sleep(5 * 1000); const files = fs.readdirSync(downloadPath); // 创建一个数组,将文件名和其mtime(最后修改时间)一起存储 const filesWithMtime = files.map(file => { const filePath = path.join(downloadPath, file); const stats = fs.statSync(filePath); return { fileName: file, mtime: stats.mtime, size: stats.size }; }); const newestFile = filesWithMtime.sort((a, b) => b.mtime.getTime() - a.mtime.getTime())[0]; this.logger.info('newestFile: %o', { newestFile, }); resolve({ pdfSize: newestFile.size, fileName: newestFile.fileName, }); } catch (e) { reject(e); } }); }