Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/wayback machine #286

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9d61a6f
Added wayback-machine action
adaveinthelife Jun 4, 2018
35704a0
Merge branch 'master' of https://github.com/internetarchive/internet-…
adaveinthelife Jun 10, 2018
d10b1ea
Further modifications to Wayback code. Fixed slots
adaveinthelife Jun 13, 2018
92c7c8e
Run eslint and local testing on wayback code
adaveinthelife Jun 14, 2018
c2b2cae
Rewrote wayback machine to return promise
adaveinthelife Jun 16, 2018
9bc0f69
Created fallback for speech if country rank missing from alexa rankin…
adaveinthelife Jun 19, 2018
efc09fe
Reverted strings.js back to master to pass tests
adaveinthelife Jun 19, 2018
5514439
Added Wayback dialog to strings.js
adaveinthelife Jun 19, 2018
d4184a2
Wayback machine revisions for code review + changes to config.js file
adaveinthelife Jun 25, 2018
565491b
Changes to string.js with additional dialog
adaveinthelife Jun 26, 2018
734ec81
Changes to wayback spec file
adaveinthelife Jun 27, 2018
4ace6fc
Break traverse off into separate util function and created unit test …
adaveinthelife Jul 4, 2018
3cbd152
Reworked XML parser promise to include resolve and reject
adaveinthelife Jul 6, 2018
22c6015
Reworked all functions to return promises and then created promises p…
adaveinthelife Jul 10, 2018
0c7c183
Added unit tests for axios requests in wayback feature
adaveinthelife Jul 10, 2018
bcb2da7
Added more unit tests for axios requests in wayback feature
adaveinthelife Jul 10, 2018
ced199e
Took some debug messages out of WB code and traverse util
adaveinthelife Jul 10, 2018
d7b796c
Added unit tests for archiveEngine and alexaEngine
adaveinthelife Jul 11, 2018
167d287
Added unit test for xml converter in wb machine
adaveinthelife Jul 11, 2018
abb91d7
Took out axios calls for wayback.spec and added them into fixtures
adaveinthelife Jul 13, 2018
fa153ac
Changed promise pipeline to avoid callback
adaveinthelife Jul 14, 2018
698830e
Took out missing axios variable in wayback spec
adaveinthelife Jul 14, 2018
74f93e5
Rewrote archive and alexa engines to be pure functions
adaveinthelife Jul 14, 2018
a3364d7
Rewrote pipeline responses to use obj assign
adaveinthelife Jul 16, 2018
d916d1e
Took unneeded promises out of pipeline, used unpacking to make code c…
adaveinthelife Jul 16, 2018
f53d4f3
Added catch statement for WB handler
adaveinthelife Jul 16, 2018
d76ca18
Simplified pipeline for WB machine
adaveinthelife Jul 16, 2018
8e3ec2f
Fixed typo in catch statement
adaveinthelife Jul 16, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion functions/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"object.entries": "^1.0.4",
"raven": "^2.6.0",
"replaceall": "^0.1.6",
"supports-color": "^5.4.0"
"xml2js": "^0.4.19"
},
"devDependencies": {
"axios-mock-adapter": "^1.15.0",
Expand Down
122 changes: 122 additions & 0 deletions functions/src/actions/wayback-machine.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
// Third party imports
const axios = require('axios');
const mustache = require('mustache');
const xml2js = require('xml2js');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all new imports should be installed by:

npm install --save xml2js

when you would need new lib for testing you should use instead:

npm install --save-dev xml2js

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should install xml2js with:

npm install --save xml2js

to add it to package.json


// Local imports
const config = require('../config');
const {debug} = require('../utils/logger')('ia:actions:wayback-machine');
const dialog = require('../dialog');
const endpointProcessor = require('../network/endpoint-processor');
const traverse = require('../utils/traverse');
const waybackStrings = require('../strings').intents.wayback;

/**
* Handle wayback query action
* - fill slots of wayback query
* - perform data requests to archive and alexa rankings
* - construct response speech for action
*
* @param app
*/
function handler (app) {
// Create wayback object
const waybackObject = {
url: '',
earliestYear: 0,
latestYear: 0,
totalUniqueURLs: 0,
alexaWorldRank: 0,
alexaUSRank: 0,
speech: waybackStrings.default,
};

// Check to see that both parameters have content
if (!app.params.getByName('wayback') && !app.params.getByName('url')) {
debug('wayback action called by mistake');
dialog.ask(app, waybackObject);
}

// Get url parameter and make url queries
waybackObject.url = app.params.getByName('url');
const archiveQueryURL = endpointProcessor.preprocess(
config.wayback.ARCHIVE, app, waybackObject
);
const alexaQueryURL = endpointProcessor.preprocess(
config.wayback.ALEXA, app, waybackObject
);

return Promise.all([axios.get(archiveQueryURL), axios.get(alexaQueryURL)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use callback nesting here. One of the main feature of Promises is avoiding callback hell. It should be something like:

action.then(a => {
  return Promise.all(/*....*/);
})
.then(b => {
  return Promise.all(/*....*/);
})
.then(c => {
});

.then(function (allData) {
// All data available here in the order it was called.

// Parse data from archive request
let archiveJSON = allData[0].data;
archiveEngine(archiveJSON, waybackObject);

// Parse data from alexa request
let XMLparser = new xml2js.Parser();
let convertXML = new Promise((resolve, reject) => {
XMLparser.parseString(allData[1].data, function (err, result) {
if (err) {
let error = new Error('The XML parser didn\'t work. Error message: ' + err);
reject(error);
} else {
resolve(result);
}
});
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to store convertXML promise. You are already inside of Promise.then pipeline so you can return new Promise. And btw you need to pass archiveJSON down the promises chain. So you could use something like:

return Promise.all([archiveJSON, new Promise((resolve, reject) => {
  //...
}])

and btw it would need it because part of code from // Construct response dialog for action will be called before: alexaEngine(JSON.parse(JSON.stringify(fulfilled)), waybackObject);

Btw I'd recommend to write unit test of this pipeline, so you would see how promise will fire one after another.

In additional I'd recommend to put:

XMLparser.parseString(allData[1].data, function (err, result) {

in separate function so it would improve readability of this pipeline.

convertXML
.then(function (fulfilled) {
debug('XML parse successful!');
alexaEngine(JSON.parse(JSON.stringify(fulfilled)), waybackObject);
})
.catch(function (error) {
debug(error.message);
waybackObject.speech = waybackStrings.error;
dialog.ask(app, waybackObject);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we would like to change dialog flow here we should embed XMLparser.parseString in Promises pipeline. For example by wrapping it to:

new Promise((resolve, reject) => {
  XMLparser.parseString(allData[1].data, function (err, result) {
  //...
})

Because right now we will reach dialog.close(app, waybackObject); whether result you would get in XMLparser.parseString.

Actually, I'd recommend to create unit tests for that part of the code - it would help a lot. All async scenarios are very tricky.

});

// Construct response dialog for action
if (waybackObject.alexaUSRank !== 0) {
waybackObject.speech = mustache.render(waybackStrings.speech, waybackObject);
waybackObject.speech += mustache.render(waybackStrings.additionalSpeech, waybackObject);
} else {
waybackObject.speech = mustache.render(waybackStrings.speech, waybackObject);
waybackObject.speech += '.';
}

dialog.close(app, waybackObject);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want gracefully cover .catch case here because you throw reject/error in the pipeline, and request to a server could get an exception

} // End of handler

function archiveEngine (archiveJSON, waybackObject) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just general comments. I'd recommend writing pure functions everywhere it possible.

Motivation:

  1. they are much easy to test
  2. easy to reuse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function also should be pure-function and it doesn't need to be async

// Create array of capture years and then find earliest year
// and most recent year.
let yearsArray = Object.keys(archiveJSON.captures);
waybackObject.earliestYear = yearsArray[0];
waybackObject.latestYear = yearsArray[yearsArray.length - 1];

// Traverse URL category

// Find baseline of URL count
waybackObject.totalUniqueURLs += traverse(archiveJSON.urls[waybackObject.earliestYear]);
// debug('Baseline url count: ' + waybackObject.totalUniqueURLs);

waybackObject.totalUniqueURLs += traverse(archiveJSON.new_urls);
// debug('Final url count: ' + waybackObject.totalUniqueURLs);
}

function alexaEngine (alexaJSON, waybackObject) {
waybackObject.alexaWorldRank = alexaJSON['ALEXA']['SD'][0]['POPULARITY'][0]['$']['TEXT'];
try {
waybackObject.alexaUSRank = alexaJSON['ALEXA']['SD'][0]['COUNTRY'][0]['$']['RANK'];
} catch (e) {
debug('Country not found');
debug(e);
}
}

module.exports = {
handler,
};
5 changes: 5 additions & 0 deletions functions/src/config.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ module.exports = {
DEFAULT_SONG_IMAGE: 'http://archive.org/images/notfound.png',
},

wayback: {
ARCHIVE: 'http://web.archive.org/__wb/search/metadata?q={{url}}',
ALEXA: 'http://data.alexa.com/data?cli=10&url={{url}}',
},

/**
* settings specific for supported platforms
*/
Expand Down
8 changes: 8 additions & 0 deletions functions/src/strings.js
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,14 @@ module.exports = {
speech: 'Version is {{version}}.',
},

wayback: {
speech: '{{url}} was first captured by the Internet Archive in {{earliestYear}} and most recently in {{latestYear}}. The archive has {{totalUniqueURLs}} unique url\'s for this website. <break></break>{{url}} is ranked <say-as interpret-as="ordinal">{{alexaWorldRank}}</say-as> in the world in popularity',
additionalSpeech: ' and <say-as interpret-as="ordinal">{{alexaUSRank}}</say-as> in the United States.',
default: 'Would you like to use the wayback machine to hear the history of a website? Simply say Wayback Machine and the name of the website you\'d like to hear.',
error: "I'm sorry I'm having trouble here. Maybe we should try this again later.",
suggestions: ['wayback machine google.com', 'wayback machine archive.org']
},

welcome: {
acknowledges: [
'Welcome to music at the Internet Archive.'
Expand Down
31 changes: 31 additions & 0 deletions functions/src/utils/traverse.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
const _ = require('lodash');
const {debug} = require('../utils/logger')('ia:actions:utils:traverse');

/**
* Traverse a given object
*
* @param {Object} obj
*/
module.exports = function (obj) {
let results = [];
function traverse (obj) {
_.forOwn(obj, (val, key) => {
if (_.isArray(val)) {
val.forEach(el => {
traverse(el);
});
} else if (_.isObject(val)) {
traverse(val);
} else {
results.push(val);
}
});
}
traverse(obj);
let count = 0;
while (results.length !== 0) {
count += results.pop();
}
debug('final count inside traverse = ' + count);
return count;
};
25 changes: 25 additions & 0 deletions functions/tests/actions/wayback-machine.spec.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
const {expect} = require('chai');
const rewire = require('rewire');

const action = rewire('../../src/actions/wayback-machine');

const mockApp = require('../_utils/mocking/platforms/app');
const mockDialog = require('../_utils/mocking/dialog');

describe('actions', () => {
describe('wayback machine', () => {
let app;
let dialog;

beforeEach(() => {
app = mockApp();
dialog = mockDialog();
action.__set__('dialog', dialog);
});

it('check to see that a promise is returned with network requests', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please give some space here ;)

action.handler(app);
expect(Promise.resolve()).to.be.a('promise');
});
});
});
35 changes: 35 additions & 0 deletions functions/tests/utils/traverse.spec.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
const {expect} = require('chai');
const traverse = require('../../src/utils/traverse');

let testJSON = {'captures': {
'1999': {
'text/html': 18360
},
'2000': {
'application/x-director': 19,
'video/quicktime': 1584,
'application/x-troff-man': 1,
'x-world/x-vrml': 1,
'audio/x-pn-realaudio': 176,
'audio/mpeg': 195,
'audio/x-wav': 3098,
'image/png': 97,
'text/html': 901401,
'video/x-ms-asf': 142,
'image/gif': 17388,
'text/plain': 394428,
'image/jpeg': 82903,
'application/x-shockwave-flash': 39,
'application/zip': 108,
'audio/x-aiff': 2767,
'text/css': 55,
'application/pdf': 291
}}};

describe('utils', () => {
describe('traverse', () => {
it('should traverse a given object to return the sum of it\'s leaf nodes', () => {
expect(traverse(testJSON)).to.be.equal(1423053);
});
});
});