Mediawiki#
This page summarizes the code structure for this directory and its immediate subdirectories. It focuses on the bin / mediawiki area within the bin module. Use the table of contents below to navigate deeper.
Overview#
The mediawiki directory contains utilities for crawling, indexing, and analyzing MediaWiki/Wikipedia content. These tools extract semantic data from wiki pages and convert it into RDF triples or Elasticsearch indices for knowledge graph integration.
Modules#
AAPG..js — Crawls Wikipedia-style MediaWiki pages to extract hyperlinks and converts them to RDF triples representing relationships between wiki concepts.
mediawikiStats..js — Generates concept category matrices from SPARQL queries, organizing Wikipedia article categories and their associated subjects for statistical analysis.
mediawikiTagger..js — Indexes Wikipedia page content into Elasticsearch by extracting text, categories, and metadata from MediaWiki HTML pages for full-text search capabilities.
Features#
Wiki page crawling with hyperlink extraction and RDF triple generation.
Category-based statistical analysis via SPARQL queries on indexed wiki data.
Elasticsearch indexing of wiki page content for full-text search and concept matching.
Usage#
These are standalone scripts designed for batch processing of MediaWiki content.
AAPG..jscrawls pages and produces RDF triples for import into the triple store.mediawikiTagger..jsindexes page content into Elasticsearch for search integration.mediawikiStats..jsgenerates statistical reports from already-indexed wiki data.
Files in this directory#
AAPG..jsmediawikiStats..jsmediawikiTagger..js