XPath-based Parsing Framework (XPaF) is a simple, fast, open-source parsing framework that makes it easy to extract relations (subject-predicate-object triples) from HTML and XML documents.
Documentation is available here.
Copyright 2011 Google Inc. All Rights Reserved.
-
Install libraries (see platform-specific instructions below)
-
Build xpaf, run tests, and install
./autogen.sh ./configure make make check
-
Clean up
make clean make maintainer-clean
-
Install autotools
apt-get install autoconf automake libtool
-
Install libraries
apt-get install gflags libgtest-dev libprotobuf-dev libxml2 protobuf-compiler
-
Install re2 library
hg clone https://re2.googlecode.com/hg re2 cd re2 make install
-
Run ldconfig to set up symlinks for libraries
ldconfig -n /usr/lib /usr/local/lib
-
Install Homebrew (http://brew.sh/)
-
Install libraries
brew install automake libtool gflags libxml2 protobuf re2
-
Install gtest library
curl -O https://googletest.googlecode.com/files/gtest-1.7.0.zip unzip gtest-1.7.0.zip cd gtest-1.7.0 ./configure make cp -a include/gtest /usr/local/include cp -a lib/.libs/*.{a,dylib} /usr/local/lib/