Skip to content

Tutorial: BBC world news map

Tim Ermilov edited this page Feb 18, 2016 · 4 revisions

This tutorial will guide you through creation of the pipeline that will display sentiments from BBC world news articles on a world map. It assumes that you have either registered at cloud version of Exynize or deployed your own copy of the platform, and finished the "Hello world" tutorial and Twitter product comparison tutorial.

Step 1: RSS source

First, we'll create a source component that will connect to RSS and will deliver us latest published articles.
To simplify the creation, we'll rely on feedparser npm package.
Here's how the code will look:

import FeedParser from 'feedparser';
import request from 'request';

export default (url, obs) => {
    // construct request and feedparser
    const req = request(url);
    const feedparser = new FeedParser();

    // handle errors
    req.on('error', err => obs.onError(err));
    feedparser.on('error', err => obs.onError(err));

    // pipe request into feedparser
    req.on('response', function(res) {
        const stream = this;
        if (res.statusCode !== 200) {
            return this.emit('error', new Error('Bad status code'));
        }
        stream.pipe(feedparser);
    });

    // process articles
    feedparser.on('readable', function() {
        const stream = this;
        let item;
        while (item = stream.read()) {
            obs.onNext(item);
        }
    });
    // trigger end once done
    feedparser.on('end', () => obs.onCompleted());
};

This component will dispatch latest articles from the feed and automatically complete, so we'll only see latest ~20 articles every time we run this source.

Step 2: Full text fetching processor

Next, we'll create a processor component that will fetch full text of the articles for us since RSS feeds usually do not provide it.
To simplify the creation, we'll rely on superagent npm package for HTTP requests and on cheerio for HTML parsing.
Here's how the code will look:

import request from 'superagent';
import cheerio from 'cheerio';

const cleanText = text => text
    .replace(/[\n\r\t]+/g, ' ')
    .replace(/\s+/g, ' ')
    .replace(/(\w)\.([A-Z0-9_])/g, '$1. $2');

const cleanHtml = html => html
    .replace(/[\n\r\t]+/g, ' ')
    .replace(/<!\[CDATA\[.+?\]\]>/g, ' ')
    .replace(/<!--.+?-->/g, ' ')
    .replace(/\s+/g, ' ');

export default (data) => {
    return Rx.Observable.create(obs => {
        const {link} = data;
        request
        .get(link)
        .end((err, res) => {
            if (err) {
                return obs.onError(err);
            }

            const $ = cheerio.load(res.text);
            $('script').remove();
            $('object').remove();
            // try to extract only article text
            let obj = $('.story-body__inner');
            if (!obj || !obj.length) {
                obj = $('body');
            }
            // cleanup
            $('figure', obj).remove();
            // get html and text
            const resHtml = cleanHtml(obj.html()); // BBC news selector
            const resText = cleanText(obj.text());

            // assign to data
            data.text = resText;
            data.html = resHtml;

            // send
            obs.onNext(data);
            obs.onCompleted();
        });
    });
};

This processor will first fetch the full HTML using the link field of incoming data object, then it will try to extract only the meaningful text from it, append both text and HTML to data and return this new data. You can test this by entering {"link": "http://some.link.with/text"} into data field in Exynize editor and hitting "Test" button.

After test succeeds, hit the "Save" button to save your new processor component.

Step 3: Sentiment processor

We'll reuse our sentiment component that we've created during the Twitter product comparison tutorial.

Step 4: FOX annotation processor

Next, we'll create a processor component that will annotate the full text of the incoming articles.
We'll rely on FOX tool API for this. And to simplify the creation, we'll rely on request.js npm package for HTTP requests.
Here's how the code will look:

import _ from 'lodash';
import request from 'request';

// FOX NLP tool API url
const foxUrl = 'http://fox-demo.aksw.org/call/ner/entities';

export default (data) => Rx.Observable.create(obs => {
    // construct request
    const json = {
        input: data.text,
        type: 'text',
        task: 'ner',
        output: 'JSON-LD',
    };
    // send request
    request({
        method: 'POST',
        url: foxUrl,
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify(json),
    }, (err, res, body) => {
        // handle error
        if (err) {
            obs.onError(err);
            return;
        }
        // check if the status code is OK
        if (res && res.statusCode !== 200) {
            obs.onError(`Error code: ${res.statusCode}, ${res.statusMessage}`);
            return;
        }
        // parse results
        const result = JSON.parse(body);
        const entries = result['@graph'] ? result['@graph'] : [];
        const annotations = entries.map(it => ({
            types: it['@type'] ? it['@type']
                .map(t => t.indexOf(':') !== -1 ? t.split(':')[1] : t)
                .map(t => t.toLowerCase())
                .map(_.capitalize)
                .filter(t => t !== 'Annotation') : [],
            name: it['ann:body'],
            beginIndex: typeof it.beginIndex === 'string' ? [it.beginIndex] : it.beginIndex,
            endIndex: typeof it.endIndex === 'string' ? [it.endIndex] : it.endIndex,
        }));
        data.annotations = annotations;
        // return and complete
        obs.onNext(data);
        obs.onCompleted();
    });
});

This processor will first annotate the text from the text field of incoming data object, append resulting annotations to the data and return this new data. You can test this by entering the following data into data field in Exynize editor and hitting "Test" button: {"text": "The philosopher and mathematician Leibniz was born in Leipzig in 1646 and attended the University of Leipzig from 1661-1666. The current chancellor of Germany, Angela Merkel, also attended this university. "}

After test succeeds, hit the "Save" button to save your new processor component.

Step 5: Nominatim processor

Next, we'll create a processor component that will find coordinates for all annotations that has type Location.
We'll rely on Nominatim API for this. And to simplify the creation, we'll rely on nominatim npm package.
Here's how the code will look:

import _ from 'lodash';
import nominatim from 'nominatim';
// covert search function into observable
const observableSearch = Rx.Observable.fromNodeCallback(nominatim.search);

export default (inputData) => Rx.Observable.return(inputData)
.flatMap(data => {
    // if no annotations - just return original data
    if (!data.annotations) {
        return Rx.Observable.return(data);
    }
    // init places array
    if (!data.places) {
        data.places = [];
    }
    // resolve all annotations and merge the results
    return Rx.Observable.merge(data.annotations.map(annotation => {
        if (_.includes(annotation.types, 'Location')) {
            return observableSearch({q: annotation.name})
            .map(([opt, results]) => {
                if (results && results[0]) {
                    return {
                        name: opt.q,
                        lat: results[0].lat,
                        lon: results[0].lon,
                    };
                }

                return undefined;
            });
        }

        return Rx.Observable.return(undefined);
    }))
    .filter(loc => loc !== undefined)
    .reduce((acc, place) => [place, ...acc], [])
    .map(places => {
        data.places = places;
        return data;
    });
});

This processor will use all annotations with type Location to fetch geo coordinates for them, then it'll append resulting coordinates to the data and return this new data. You can test this by entering the following data into data field in Exynize editor and hitting "Test" button: {"annotations": [{"types": ["Location"], "name": "Leipzig"}]}

After test succeeds, hit the "Save" button to save your new processor component.

Step 6: Map renderer

Finally, we need to create a render component that will display results. We'll create a component that will display the incoming data as color-coded points on the map. We'll use leaflet.js to simplify creation of the map. Here's how the code will look:

import L from 'leaflet';
import 'leaflet/dist/leaflet.css';

const styleGray = '#cccccc';
const styleGreen = '#5cb85c';
const styleRed = '#d9534f';

const mapConfig = {
    minZoom: 2,
    maxZoom: 20,
    layers: [
        L.tileLayer(
            'http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png',
            {
                attribution: '&copy; <a href="http://openstreetmap.org">OpenStreetMap</a>' +
                    ' contributors, <a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>',
            }
        )
    ],
    attributionControl: false,
};

// popup rendering
const popup = (it) => `
<a href="${it.link}" target="_blank">${it.title}</a>
<div>${it.description}</div>
`;

// main render generator
export default() => React.createClass({
    componentDidMount() {
        // init map
        this.map = L.map(this.refs.map, mapConfig);
        // set view to show full world map
        this.map.setView([-10, 10], 2);
    },
    componentWillReceiveProps(props) {
        // render items
        props.data.forEach(this.renderItem);
    },
    renderItem(it) {
        if (!it.places) {
            return;
        }
        // go over location
        it.places.forEach((loc) => {
            // do not render location with -1 -1 as lat or lon
            if (loc.lat === -1 || loc.lon === -1) {
                return;
            }

            const color = it.sentiment.score === 0 ? styleGray :
                    it.sentiment.score > 0 ? styleGreen : styleRed;
            const marker = L.circle([loc.lat, loc.lon], 100000, {
                stroke: false,
                fillColor: color,
                fillOpacity: 0.8,
                className: 'leaflet-marker-animated',
            }).addTo(this.map);
            marker.bindPopup(popup(it));
        });
    },

    render() {
        return (
            <div id="map" ref="map" style={{width: '100%', height: '100%', position: 'absolute'}}></div>
        );
    },
});

This component will render a map with red or green circles (depending on sentiments) representing places mentioned in incoming articles.

Step 7: Pipeline assembly

Now that all the components have been created, we need to assemble them into a pipeline.

When adding RSS source, you'll need to provide BBC world news RSS URL: http://feeds.bbci.co.uk/news/world/rss.xml

Processors do not require any configuration - just adding them is sufficient. But make sure to add the processor in same order we'd created them here - order is important.

Finally, also no configuration is required when adding render component.

Make sure to test the pipeline by pressing "Test" button before saving it using the "Save" button.

Step 5: Running and viewing results

Now that you've assembled, tested and saved your new pipeline, you can start it and view the rendered result by clicking "Web" button next to pipeline name.