O'Reilly 导出电子书

在线阅读电子书些许不方便,估设法导出电子书。

  1. 利用 GitHub 项目 safaribooks导出书籍。

  2. 因使用 SSO 登录,具体操作如下:

    $ git clone https://github.com/lorenzodifuccia/safaribooks.git
    Cloning into 'safaribooks'...
    $ cd safaribooks/
    $ pip3 install -r requirements.txt

我们需要获取 SSO 登录的 cookie,获取步骤如下:

I think I've found the problem.
Using document.cookie from the console does not include the HttpOnly cookies and they are definitely required.
I can't work out how to access these via the console but I was able to find a way to get them that isn't too painful.

Login as usual to https://learning.oreilly.com/
Open the developer tools with F12
Go to Network tab in the developer tools
Access the profile page in the browser: https://learning.oreilly.com/profile/
In the Network tab, click on the request to /profile/ (it should be the first one)
Click on the Cookies tab in the request information
Right-click on the Request cookies text and choose Copy All
Paste this into the cookies.json file and then remove the outer section of the JSON document
Run the script without passing credentials: python3 safaribooks.py 9780135262047
p.s. sudo is not necessary.

当然,通过 Chrome 获取时,可能无法复制 cookie,可在 profile 页面,调出 Chrome console,输入console.log(JSON.stringify(document.cookie.split(';').map(c => c.split('=')).map(i => [i[0].trim(), i[1].trim()]).reduce((r, i) => {r[i[0]] = i[1]; return r;}, {})))
将返回 cookie,粘贴至 cookies.json,放到程序同一文件夹下即可。

随后运行python sso_cookies.py <BOOK ID>,成功获取 EPUB 电子书。

本文链接:

https://ma.ge/archives/701.html