Alternative parsers

This page is a compilation of links, descriptions, and status reports of the various alternative MediaWiki parsers—that is, programs and projects, other than MediaWiki itself, which are able or intended to translate MediaWiki's text markup syntax into something else. Some of these have quite narrow purposes, others are possible contenders for replacing the somewhat labyrinthine code that currently drives MediaWiki itself.

Many of the things linked here are likely to be out of date and under-maintained, even abandoned. But in the interest of not duplicating the same work over and over, it seemed sensible to collect together what was "out there".

Known implementations[edit]

Name and link	Principal author(s)	Language	Input	Output	Comments / other info	License
WikiPops.com	Max Freedom	.NET	Wiki title	HTML	A website that converts Wiki markup to HTML. Allows user to browse for a Wiki title and return the full HTML or an abstract.
Wiky.php	Toni Lähdekorpi	PHP, Regular Expressions	Markup	HTML	A tiny PHP library that uses only regular expressions to convert Wiki markup to HTML.	Apache License/GPL/LGPL/MPL/CC
sanskritnlp	Vishvas Vasuki	Scala	Mediawiki text	Mediawiki text and Section tree	Only parses mediawiki sections - that's it. One can parse a wiki page with multiple sections, get a section tree, add, access and delete sections.	Creative commons
Wiky	Tanin Na Nakorn	Ruby	Markup	HTML	A simple Ruby library to convert Wiki markup to HTML.	Apache License
Wiky.js	Tanin Na Nakorn	Javascript	Markup	HTML	A simple Javascript library to convert Wiki markup to HTML (limited subset).	Apache License
txtwiki.js	Joao Sa	Javascript	Markup	Text	A javascript library to convert MediaWiki markup to plaintext.	MIT License
wikipedia-js	kenshiro_o	Node.js	Markup	HTML	A simple client that enables you to query Wikipedia articles in english. The results are formatted in basic HTML. You can retrieve either a summary of an article (i.e. before the table of contents) or a full article.	MIT
WikiExtractor	Giuseppe Attardi, Antonio Fuschetto	Python	XML dumps	text	Simple and fast tool for extracting plain text from Wikipedia dumps. It performs template expansion and handles parser functions (core and extended).	GPL
mw2html	Connelly Barnes	Python	Wiki url	HTML	Minimal setup - gets the basic job of creating a static copy of the wiki done.	Public Domain
mwlib	PediaPress.com	Python with C library	Markup and other	parse tree, HTML, PDF, XML, OpenDocument	Part of cooperation between Wikimedia Foundation and PediaPress.	BSD
Mediawiki2HTML Machine	Johannes Buchner	PHP	Markup	HTML	Project for parsing without the Mediawiki engine.	AGPL3 + any later version
PHP5 WP	Dan Goldsmith	PHP	Markup	HTML	Parser With Plugin Framework To Add Additional Syntax. Configurable for alternative markup i.e. PMWIKI.	MPL 2.0
Mylyn WikiText	David Green	Java	Local files	HTML, DocBook, Eclipse Help, DITA, extensible	Integration with Ant and Eclipse runtime.
Java API (Bliki engine)	axelclk	Java	Markup fragment	HTML, PDF	Java Wikipedia API - (supports ParserFunctions, Lua/Scribunto...).
FlexBisonParse	Timwi	flex, bison and C	Markup fragment	Custom XML	Intended as an eventual replacement to the parsing code inside MediaWiki itself.
JAMWiki	Ryan	Java	JAMWiki front-end	HTML	Java Wiki engine that supports MediaWiki syntax. The roadmap also calls for XML import and export that will be compatible with Mediawiki.
InstaView	Pilaf	JavaScript	Markup fragment	HTML	Provides instant preview while editing a page (without reloading).
InstaView	C. Scott Ananian	JavaScript	Markup fragment	HTML	Port of Pilaf's code to node.js, volo, and the browser.
Perl Wikipedia Toolkit	Michal Jurosz	Perl	XML dump, SQL dump	Own parse tree, WikiMedia markup	Perl Wikipedia Toolkit developed for Computer-assisted Wikipedia translation. (Little functional)
Text_Wiki_Mediawiki	Multiple	PHP	Markup	HTML, Latex, Plain text	Part of the Text_Wiki library.
TomeRaider export	Erik Zachte	Perl	XML dump	TomeRaider database	See en:Wikipedia:TomeRaider database for more details.
Waikiki	Magnus Manske	C++	SQL dump (via SQLite)	HTML	Abandoned in favour of "flexbisonparse", but has been used inside some experimental "front ends".
Wikiwyg	Jim Higson	JavaScript	A live installation of MediaWiki	HTML (via XML)	More than just a parser; attempts to create a fully functional client-side interface.
wik2dict	Guaka	Python	SQL dump	DICT
wiki2pdf	Stephan Walter	Python (and PHP)	Markup fragment or set of online articles	LaTeX, PDF	Project is incomplete and dormant.
wb2pdf	Dirk Hünniger	Haskell	online article	LaTeX, PDF, Parse Tree	Recursive Descent based on Monadic Parser Combinators. Allows for non context-free input, especially non well formatted HTML as often found on Wikipedia.	GPL
WikiPDF	Felipe Sanches	Python (and PHP)	One selected article	LaTeX based on templates, PDF	Mediawiki extension that uses Stephan Walter's wiki2pdf as backend.
Wiki2XML	Magnus Manske	C++	Markup fragment (?)	Custom XML	Another aborted project on the way to 'flexbisonparse'.
HTML2FPDF	Renato A. C.	PHP	A PHP class that transforms HTML into a feed for FPDF resulting in a PDF file	HTML -> HTML2FPDF -> FPDF -> PDF	Not specifically for Mediawiki, but easy to install using an updated version of this tool:updated html2fpdf.php. See HTML2FPDF and Mediawiki for more instructions.
WikiOnCD	Andrew Rodland	Perl	SQL Dump or markup	HTML, Parse tree (eventually?)	Started out as an offline wiki browser, but grew a parser when Wiki2static turned out to be too limiting. No web presence yet; code is in the SVN.	GPL
WikiTaxi	Ralf Junker	Delphi / Pascal	MediaWiki markup, page or fragment	Node-tree, HTML, potentially others	Hand-crafted parser with template expansion, parser functions (core and extended), tag extensions (<ref>, <source>), wiki text parsing. Used for the WikiTaxi offline reader.	No sources available
Wikifilter	?	C++ (VS)	XML dumps	HTML	A Windows program that uses Apache/IIS to serve the pages. Abandoned in 2006, before ParserFunctions were available.
Wikipedia Dump Reader	Benjamin Thyreau	Python	XML dumps	On screen	Cross platform viewer.	GPLv2/~BSD license
Marker	Ryan Blue	ruby	Markup (subset)	HTML or formatted text	Marker is a ruby implementation of a subset of the MediaWiki markup language, intended bring MediaWiki's markup language to non-wiki applications with multiple output formats.	GPL
WikiCloth	nricciar	ruby	Markup	HTML	Ruby implementation of the MediaWiki markup language, including a fair amount of the parser functions.	MIT
XWiki	XWiki dev team	Java	Various WikiMarkups	Well formed sequence of events, HTML/XHTML, other WikiMarkups	XWiki can be used a full-fledged wiki supporting several WikiMarkups (including MediaWiki's markup). It also offers a standalone Rendering Engine that can be used as a Java library for parsing/rendering WikiMarkups. Cant output to mediawiki format as of 2016/03 though.	LGPL
Kiwi	Thomas Luce, Karl Matthias, AboutUs.org	C, Ruby, PEG	Markup	HTML	Kiwi is a PEG-based C implementation with Ruby bindings and a command line parser. It is very fast and supports most of the MediaWiki syntax. Actively developed.	BSD
YaCy	YaCy dev team	Java	XML Dump	XML with Dublin Core Metadata	YaCy is a search engine and a MediaWiki parser is included as one of the import modules. MediaWiki xml dumps are first converted to Dublin Core XML as intermediate format and then inserted into the search index using the built-in Dublin Core importer.	GPL
MessageParser	Neil Kandalgaonkar	JavaScript	Markup	Abstract syntax tree, jQuery object, HTML	Designed for use with message strings, to allow enhanced interface in the browser, like pluralizing internationalized messages or attaching jQuery behavior to links within a message.	GPL
Sweble Wikitext Parser	Hannes Dohrn	Java	Markup	Abstract syntax tree, XML, HTML	Claims to be very thorough.	Apache License 2.0
JWPL api	Torsten Zesch, Richard Eckart de Castilho, Oliver Ferschke, Elisabeth Niemann	Java	XML Dump	API to access pages, outlinks, inlinks and more	"JWPL (Java Wikipedia Library) is a free, Java-based application programming interface that allows to access all information contained in Wikipedia." "JWPL is for you: If you need structured access to Wikipedia in Java." Older parser not maintained any more - JWPL uses Sweble now.	LGPL
libmwparser	Saitmoh	C	XML dumps, Markup	XML, XHTML, Expanded WikiText	Primary an wikimedias offline reader with interwiki support. Libmwparser is a source independent library which supports most of MediaWiki syntax and some extensions like math or gallery.	GPL
mediawiki-parser	Peter Potrowl Erik Rose	Python	Markup	XHTML, raw text, AST	GSoC-2011 project; the use of a PEG parser makes it easy to improve. Parser functions are not supported yet.	GPL
Parsoid	Gabriel Wicke and the Parsoid / Visual editor team	PEG / JavaScript / Node.js	Markup, XML dumps, test cases	Tokens, HTML5 DOM with RDFa and round-trip data	Fully-featured round-tripping parser/runtime that powers the Visual editor on Wikipedia. Work ongoing to provide a HTML-only read / edit interface, and later to become the default parser for MediaWiki. See roadmap. Used to make this edit.	GPL
mwparserfromhell	The Earwig	Python	Markup	AST	A Python library to convert Wiki markup to a navigable string, which can be used to examine and manipulate templates. Written in pure Python, compatible with Python 2.7 and 3, and no dependencies.	MIT License
Saya.Parser.Wiki	Nana Sakisaka	C++	Markup	Abstract syntax tree	Pure C++11 parser implemented with Boost.Spirit.Qi.	Boost Software License 1.0
smc.mw	Marcus Brinkmann	Python	Markup	AST, HTML	Stateful PEG parser based on Grako, with a very clean separation of parsing stages, grammars and semantic transformations.	BSD
Pandoc	John MacFarlane	Haskell	Markup	many	Can convert subset of mediawiki markup to ~35 different formats (5 of which are flavors of markdown).	GPLv2
Wikiforia	Marcus Klang	Java	XML Dumps, Markup	Text	Uses the AST output from Sweble Wikitext Parser internally to produce raw text. Can parallel decompress and parse compressed multistreamed xml dumps.	GPLv2
wtf_wikipedia	Spencer Kelly	Javascript	Markup	JSON	Supports recursive links & templates, parses infoboxes and links, resolves special templates, parses images and categories. runs server-side & browser.
Wiki-infobox-parser	Zhipeng Jiang	JavaScript	Markup	JSON	A light Wikipedia Infobox Parser written in JavaScript.	MIT
wikitextparser	5j9	Python	Markup	AST	Provides several accessor methods in an object tree to navigate to structural elements like sections, tables, links etc. Supports extracting table data as list of lists. Available via pip, supports Python 3.	GPL
PHP-Wikipedia-Syntax-Parser	Don Wilson	PHP	Markup	Associative array	Given raw contents and title of a Wikipedia article, this will output highly useful information in an organized fashion.

A non-parser dumper[edit]

One of the common uses of alternative parsers is to dump wiki content into static form, such as HTML or PDF. Tim Starling has written a script which isn't a parser, but uses the MediaWiki internal code to dump an entire wiki to HTML, from the command-line. See Extension:DumpHTML. This has been used (years ago) to create the static dumps at https://dumps.wikimedia.org

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com

[출처] https://www.mediawiki.org/wiki/Alternative_parsers

번호	제목	글쓴이	날짜	조회 수
1220	[一日30分 인생승리의 학습법] Qiskit 시작하기 (Getting Started with Qiskit)	졸리운_곰	2025.06.03	16
1219	[一日30分 인생승리의 학습법] 양자컴퓨팅 프로그래밍	졸리운_곰	2025.06.03	12
1218	[一日30分 인생승리의 학습법] [Git] 다중 리모트를 사용하여 여러 Git 연동하기(Gitea, GitHub)	졸리운_곰	2025.05.25	7
1217	[一日30分 인생승리의 학습법] [GitHub][terminal] 비밀번호 인증 에러를 토큰으로 해결하고 로그인 하기	졸리운_곰	2025.05.24	20
1216	[一日30分 인생승리의 학습법] [알아봅시다] 블록체인 게임들의 가능성과 미래	졸리운_곰	2025.04.08	29
1215	이 어지러운시대의 극복법 만화보기	unmask	2025.04.08	55
1214	[ 一日30分 인생승리의 학습법] IT 국비교육, 쓰레기 속에서 그나마 덜 쓰레기인 곳 찾는 팁	졸리운_곰	2025.03.08	22
1213	[ 一日30分 인생승리의 학습법] 소프트웨어 개발하다보면 "connection reset" 등, 소프트웨어 버그 적인 문제가아닌 하드웨어나 네트워크 오류 메시지의 예	졸리운_곰	2025.03.01	22
1212	[ 一日30分 인생승리의 학습법] 기술부채(Technical Debt)는 소프트웨어 개발이나 프로젝트 과정에서, 약속된 것과 실제로 제공된 것 사이에 차이가 발생하는 것을 의미합니다.	졸리운_곰	2025.01.23	32
1211	[ 一日30分 인생승리의 학습법] 고가용성(High Availability) 시스템을 위한 5가지 전략	졸리운_곰	2024.12.28	34
1210	[ 一日30分 인생승리의 학습법] 켈리 공식을 간단히 투자해 적용해 보자 - 켈리 크라이티리언과 확률적 사고의 중요성	졸리운_곰	2024.12.26	36
1209	[ 一日30分 인생승리의 학습법] [markdown] mermaid를 이용해서 UML 그리기 - 플로우차트	졸리운_곰	2024.12.01	50
1208	[ 一日30分 인생승리의 학습법] Mermaid.js 정리????????	졸리운_곰	2024.12.01	69
1207	[ 一日30分 인생승리의 학습법] Mermaid를 이용한 시퀀스 다이어그램	졸리운_곰	2024.12.01	34
1206	[ 一日30分 인생승리의 학습법] Mermaid - 코드로 순서도(flowchart) 그리기	졸리운_곰	2024.12.01	30
1205	[ 一日30分 인생승리의 학습법] 유니코드 그래픽 기호(심벌) Huge List of Unicode Symbols	졸리운_곰	2024.07.31	48
1204	[ 一日30分 인생승리의 학습법] PocketBase Attempt to simplify the serve command for prod : 포켓베이스 프로덕션 포트 도메인 네임 설정	졸리운_곰	2024.06.10	73
1203	[ 一日30分 인생승리의 학습법] google spreadsheet app script 로 코인 현황 : 거래소 API 접근할 때 알아두면 좋은 함수	졸리운_곰	2024.06.08	62
1202	[ 一日30分 인생승리의 학습법] 매크로 프로그램 정리	졸리운_곰	2024.06.08	90
1201	[ 一日30分 인생승리의 학습법] 스마트스토어 vs 아임웹 vs 카페24	졸리운_곰	2024.05.16	79

python and java libraries to parse wikipedia dump dataset

python and java libraries to parse wikipedia dump dataset

Alternative parsers

Known implementations[edit]

A non-parser dumper[edit]

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com

댓글 0

python and java libraries to parse wikipedia dump dataset

python and java libraries to parse wikipedia dump dataset

Alternative parsers

Known implementations[edit]

A non-parser dumper[edit]

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다. 어린이용이며, 설치가 필요없는 브라우저 게임입니다. https://s1004games.com

댓글 0

로그인

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com