This addon keeps the content of the .scrapy
directory in a persistent store, which is loaded when the spider starts and saved when the spider finishes. It allows spiders to share data between different runs, keeping a state or any kind of data that needs to be persisted.
The .scrapy
directory is well known in Scrapy and a few extensions use it to keep a state between runs. The canonical way to work with the .scrapy
directory is by calling the scrapy.utils.project.data_path
function, as illustrated in the following example:
from scrapy.utils.project import data_path filename = 'data.txt' mydata_path = data_path(filename) # in a local project mydata_path will be /<SCRAPY_PROJECT>/.scrapy/data.txt # on Scrapy Cloud mydata_path will be /Zyte/.scrapy/data.txt # use mydata_path to store or read data which will be persisted among runs # for instance: if os.path.exists(mydata_path) and os.path.getsize(mydata_path) > 0: with open(mydata_path, 'r') as f: canned_cookie_jar = f.read() cookies_to_send = ast.literal_eval(canned_cookie_jar) yield scrapy.Request(url='<SOME_URL>', callback=self.parse, cookies=cookies_to_send,)
Supported settings:
DOTSCRAPY_ENABLED
-- enables or disables DotScrapy addon (either project-wide or per spider)