新闻 Scrapy 1.2.1 发布，web 爬虫框架下载

漂亮的石头 · 2016-10-22

Scrapy 1.2.1 发布了。

Scrapy 是一套基于基于Twisted的异步处理框架，纯python实现的爬虫框架，用户只需要定制开发几个模块就可以轻松的实现一个爬虫，用来抓取网页内容以及各种图片。

更新内容：

新功能

New FEED_EXPORT_ENCODING setting to customize the encoding used when writing items to a file. This can be used to turn off \uXXXX escapes in JSON output. This is also useful for those wanting something else than UTF-8 for XML or CSV output (#2034).

startproject command now supports an optional destination directory to override the default one based on the project name (#2005).

New SCHEDULER_DEBUG setting to log requests serialization failures (#1610).

JSON encoder now supports serialization of set instances (#2058).

Interpret application/json-amazonui-streaming as TextResponse (#1503).

scrapy is imported by default when using shell tools (shell, inspect_response) (#2248).

Bug 修复

DefaultRequestHeaders middleware now runs before UserAgent middleware (#2088). Warning: this is technically backwards incompatible, though we consider this a bug fix.

HTTP cache extension and plugins that use the .scrapy data directory now work outside projects (#1581). Warning: this is technically backwards incompatible, though we consider this a bug fix.

Selector does not allow passing both response and text anymore (#2153).

Fixed logging of wrong callback name with scrapy parse (#2169).

Fix for an odd gzip decompression bug (#1606).

Fix for selected callbacks when using CrawlSpider with scrapy parse (#2225).

Fix for invalid JSON and XML files when spider yields no items (#872).

Implement flush() for StreamLogger avoiding a warning in logs (#2125).

重构

canonicalize_url has been moved to w3lib.url (#2168).

下载地址：

Source code (zip)

Source code (tar.gz)

Scrapy 1.2.1 发布，web 爬虫框架下载地址

登录或注册

新闻 Scrapy 1.2.1 发布，web 爬虫框架下载

漂亮的石头版主管理成员

登录或注册

新闻 Scrapy 1.2.1 发布，web 爬虫框架 下载

漂亮的石头 版主 管理成员

新闻 Scrapy 1.2.1 发布，web 爬虫框架下载

漂亮的石头版主管理成员