Skip to content

Marshalling the world's knowledge πŸŒπŸ“šπŸ

License

Notifications You must be signed in to change notification settings

rndlabs/wiki-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Wikipedia ZIM Extractor

A rust based ZIM extractor, parser and enhancer for the Ethereum Swarm Wikipedia Gitcoin bounty.

This extractor:

  1. Extracts a zim file from https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/.

  2. Processes extracted files to:

    a. Remove head to minimise space. b. Rewrites src attributes in img and media tags. c. Gzip the data to minimise storage space (~6x - 8x compression factor achieved on wiki articles).

Contributions

The source code was forked from @dignifiedquire with the original being available at https://github.com/dignifiedquire/zim.

About

Marshalling the world's knowledge πŸŒπŸ“šπŸ

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages