Scrapy 1.2.1 发布了。 Scrapy 是一套基于基于Twisted的异步处理框架,纯python实现的爬虫框架,用户只需要定制开发几个模块就可以轻松的实现一个爬虫,用来抓取网页内容以及各种图片。 更新内容: 新功能 New FEED_EXPORT_ENCODING setting to customize the encoding used when writing items to a file. This can be used to turn off \uXXXX escapes in JSON output. This is also useful for those wanting something else than UTF-8 for XML or CSV output (#2034). startproject command now supports an optional destination directory to override the default one based on the project name (#2005). New SCHEDULER_DEBUG setting to log requests serialization failures (#1610). JSON encoder now supports serialization of set instances (#2058). Interpret application/json-amazonui-streaming as TextResponse (#1503). scrapy is imported by default when using shell tools (shell, inspect_response) (#2248). Bug 修复 DefaultRequestHeaders middleware now runs before UserAgent middleware (#2088). Warning: this is technically backwards incompatible, though we consider this a bug fix. HTTP cache extension and plugins that use the .scrapy data directory now work outside projects (#1581). Warning: this is technically backwards incompatible, though we consider this a bug fix. Selector does not allow passing both response and text anymore (#2153). Fixed logging of wrong callback name with scrapy parse (#2169). Fix for an odd gzip decompression bug (#1606). Fix for selected callbacks when using CrawlSpider with scrapy parse (#2225). Fix for invalid JSON and XML files when spider yields no items (#872). Implement flush() for StreamLogger avoiding a warning in logs (#2125). 重构 canonicalize_url has been moved to w3lib.url (#2168). 下载地址: Source code (zip) Source code (tar.gz) Scrapy 1.2.1 发布,web 爬虫框架下载地址