Home
WARCEX is an extensible command-line tool for extracting structured data out of WARC and WACZ files, developed by the Digital Observatory, as part of the Australian Internet Observatory (AIO).
AIO received co-investment (doi.org/10.3565/hjrp-b141) from the Australian Research Data Commons (ARDC) through the HASS and Indigenous Research Data Commons. The ARDC is enabled by the National Collaborative Research Infrastructure Strategy (NCRIS).
Installation
Install from GitHub using it pip:
Usage
To get an overview of available commands, run:
You can see what plugins are available by running:
And you can get more information about a plugin including instructions on web archiving activity by running:
Extracting data:
You can specify more than one.