1. XenForo 1.5.14 中文版——支持中文搜索!现已发布!查看详情
  2. Xenforo 爱好者讨论群:215909318 XenForo专区

新闻 Scrapy 1.2.1 发布,web 爬虫框架 下载

本帖由 漂亮的石头2016-10-22 发布。版面名称:软件资讯

  1. 漂亮的石头

    漂亮的石头 版主 管理成员

    注册:
    2012-02-10
    帖子:
    486,020
    赞:
    46
    Scrapy 1.2.1 发布了。

    Scrapy 是一套基于基于Twisted的异步处理框架,纯python实现的爬虫框架,用户只需要定制开发几个模块就可以轻松的实现一个爬虫,用来抓取网页内容以及各种图片。

    更新内容:

    新功能


    • New FEED_EXPORT_ENCODING setting to customize the encoding used when writing items to a file. This can be used to turn off \uXXXX escapes in JSON output. This is also useful for those wanting something else than UTF-8 for XML or CSV output (#2034).


    • startproject command now supports an optional destination directory to override the default one based on the project name (#2005).


    • New SCHEDULER_DEBUG setting to log requests serialization failures (#1610).


    • JSON encoder now supports serialization of set instances (#2058).


    • Interpret application/json-amazonui-streaming as TextResponse (#1503).


    • scrapy is imported by default when using shell tools (shell, inspect_response) (#2248).

    Bug 修复


    • DefaultRequestHeaders middleware now runs before UserAgent middleware (#2088). Warning: this is technically backwards incompatible, though we consider this a bug fix.


    • HTTP cache extension and plugins that use the .scrapy data directory now work outside projects (#1581). Warning: this is technically backwards incompatible, though we consider this a bug fix.


    • Selector does not allow passing both response and text anymore (#2153).


    • Fixed logging of wrong callback name with scrapy parse (#2169).


    • Fix for an odd gzip decompression bug (#1606).


    • Fix for selected callbacks when using CrawlSpider with scrapy parse (#2225).


    • Fix for invalid JSON and XML files when spider yields no items (#872).


    • Implement flush() for StreamLogger avoiding a warning in logs (#2125).

    重构


    下载地址:

    Scrapy 1.2.1 发布,web 爬虫框架下载地址
     
正在加载...