Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/wayback machine #286

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9d61a6f
Added wayback-machine action
adaveinthelife Jun 4, 2018
35704a0
Merge branch 'master' of https://github.com/internetarchive/internet-…
adaveinthelife Jun 10, 2018
d10b1ea
Further modifications to Wayback code. Fixed slots
adaveinthelife Jun 13, 2018
92c7c8e
Run eslint and local testing on wayback code
adaveinthelife Jun 14, 2018
c2b2cae
Rewrote wayback machine to return promise
adaveinthelife Jun 16, 2018
9bc0f69
Created fallback for speech if country rank missing from alexa rankin…
adaveinthelife Jun 19, 2018
efc09fe
Reverted strings.js back to master to pass tests
adaveinthelife Jun 19, 2018
5514439
Added Wayback dialog to strings.js
adaveinthelife Jun 19, 2018
d4184a2
Wayback machine revisions for code review + changes to config.js file
adaveinthelife Jun 25, 2018
565491b
Changes to string.js with additional dialog
adaveinthelife Jun 26, 2018
734ec81
Changes to wayback spec file
adaveinthelife Jun 27, 2018
4ace6fc
Break traverse off into separate util function and created unit test …
adaveinthelife Jul 4, 2018
3cbd152
Reworked XML parser promise to include resolve and reject
adaveinthelife Jul 6, 2018
22c6015
Reworked all functions to return promises and then created promises p…
adaveinthelife Jul 10, 2018
0c7c183
Added unit tests for axios requests in wayback feature
adaveinthelife Jul 10, 2018
bcb2da7
Added more unit tests for axios requests in wayback feature
adaveinthelife Jul 10, 2018
ced199e
Took some debug messages out of WB code and traverse util
adaveinthelife Jul 10, 2018
d7b796c
Added unit tests for archiveEngine and alexaEngine
adaveinthelife Jul 11, 2018
167d287
Added unit test for xml converter in wb machine
adaveinthelife Jul 11, 2018
abb91d7
Took out axios calls for wayback.spec and added them into fixtures
adaveinthelife Jul 13, 2018
fa153ac
Changed promise pipeline to avoid callback
adaveinthelife Jul 14, 2018
698830e
Took out missing axios variable in wayback spec
adaveinthelife Jul 14, 2018
74f93e5
Rewrote archive and alexa engines to be pure functions
adaveinthelife Jul 14, 2018
a3364d7
Rewrote pipeline responses to use obj assign
adaveinthelife Jul 16, 2018
d916d1e
Took unneeded promises out of pipeline, used unpacking to make code c…
adaveinthelife Jul 16, 2018
f53d4f3
Added catch statement for WB handler
adaveinthelife Jul 16, 2018
d76ca18
Simplified pipeline for WB machine
adaveinthelife Jul 16, 2018
8e3ec2f
Fixed typo in catch statement
adaveinthelife Jul 16, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion functions/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"object.entries": "^1.0.4",
"raven": "^2.6.0",
"replaceall": "^0.1.6",
"supports-color": "^5.4.0"
"xml2js": "^0.4.19"
},
"devDependencies": {
"axios-mock-adapter": "^1.15.0",
Expand Down
136 changes: 136 additions & 0 deletions functions/src/actions/wayback-machine.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
// Third party imports
const axios = require('axios');
const mustache = require('mustache');
const xml2js = require('xml2js');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all new imports should be installed by:

npm install --save xml2js

when you would need new lib for testing you should use instead:

npm install --save-dev xml2js

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should install xml2js with:

npm install --save xml2js

to add it to package.json


// Local imports
const config = require('../config');
const {debug} = require('../utils/logger')('ia:actions:wayback-machine');
const dialog = require('../dialog');
const endpointProcessor = require('../network/endpoint-processor');
const traverse = require('../utils/traverse');
const waybackStrings = require('../strings').intents.wayback;

/**
* Handle wayback query action
* - fill slots of wayback query
* - perform data requests to archive and alexa rankings
* - construct response speech for action
*
* @param app
*/
function handler (app) {
// Create wayback object
let waybackObject = {
url: '',
earliestYear: 0,
latestYear: 0,
totalUniqueURLs: 0,
alexaWorldRank: 0,
alexaUSRank: 0,
speech: waybackStrings.default,
};

// Check to see that both parameters have content
if (!app.params.getByName('wayback') && !app.params.getByName('url')) {
debug('wayback action called by mistake');
dialog.ask(app, waybackObject);
}

// Get url parameter and make url queries
waybackObject.url = app.params.getByName('url');
const archiveQueryURL = endpointProcessor.preprocess(
config.wayback.ARCHIVE, app, waybackObject
);
const alexaQueryURL = endpointProcessor.preprocess(
config.wayback.ALEXA, app, waybackObject
);

return Promise.all([axios.get(archiveQueryURL), axios.get(alexaQueryURL)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use callback nesting here. One of the main feature of Promises is avoiding callback hell. It should be something like:

action.then(a => {
  return Promise.all(/*....*/);
})
.then(b => {
  return Promise.all(/*....*/);
})
.then(c => {
});

.then(([archiveRes, alexaRes]) => {
return ([archiveEngine(archiveRes.data), xmlConverter(alexaRes.data)]);
})
.then(([archiveEngineRes, xmlRes]) => {
waybackObject = Object.assign(waybackObject, archiveEngineRes);
return alexaEngine(xmlRes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alexaEngine is sync function it doesn't return Promise. So you can just merge its response right to waybackObject object without waiting then.

})
.then(response => {
waybackObject = Object.assign(waybackObject, response);

// Construct response dialog for action
if (waybackObject.alexaUSRank !== 0) {
waybackObject.speech = mustache.render(waybackStrings.speech, waybackObject);
waybackObject.speech += mustache.render(waybackStrings.additionalSpeech, waybackObject);
} else {
waybackObject.speech = mustache.render(waybackStrings.speech, waybackObject);
waybackObject.speech += '.';
}
dialog.close(app, waybackObject);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want gracefully cover .catch case here because you throw reject/error in the pipeline, and request to a server could get an exception

} // End of handler

function archiveEngine (archiveJSON) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the inner logic of archiveEngine and alexaEngine are not asynchronous, so you don't need Promises here and there.

debug('Inside archiveEngine...');
const res = { earliestYear: 0, latestYear: 0, totalUniqueURLs: 0 };
// Create array of capture years and then find earliest year
// and most recent year.
let yearsArray = Object.keys(archiveJSON.captures);
res.earliestYear = yearsArray[0];
res.latestYear = yearsArray[yearsArray.length - 1];

// Traverse URL category

// Find baseline of URL count
res.totalUniqueURLs += traverse(archiveJSON.urls[res.earliestYear]);
// debug('Baseline url count: ' + obj.totalUniqueURLs);

res.totalUniqueURLs += traverse(archiveJSON.new_urls);
// debug('Final url count: ' + obj.totalUniqueURLs);

if (res.totalUniqueURLs > 0) {
debug('archiveEngine successful!');
return res;
} else {
let error = new Error('archiveEngine didn\'t work.');
return error;
}
}

function alexaEngine (alexaJSON) {
debug('Inside alexaEngine...');
const res = { alexaWorldRank: 0, alexaUSRank: 0 };
res.alexaWorldRank = alexaJSON['ALEXA']['SD'][0]['POPULARITY'][0]['$']['TEXT'];
try {
res.alexaUSRank = alexaJSON['ALEXA']['SD'][0]['COUNTRY'][0]['$']['RANK'];
} catch (e) {
debug('Country not found');
debug(e);
}
if (res.alexaWorldRank > 0) {
debug('alexaEngine successful!');
return res;
} else {
let error = new Error('alexaEngine didn\'t work.');
return error;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we should throw an error here. return passes it as a successful response.

}
}

function xmlConverter (data) {
return new Promise(function (resolve, reject) {
debug('Inside xmlConverter promise...');
let XMLparser = new xml2js.Parser();
XMLparser.parseString(data, function (err, result) {
if (err) {
let error = new Error('The XML parser didn\'t work.');
reject(error);
} else {
debug('xmlConverter promise successful!');
resolve(JSON.parse(JSON.stringify(result)));
}
});
});
}

module.exports = {
handler,
};
5 changes: 5 additions & 0 deletions functions/src/config.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ module.exports = {
DEFAULT_SONG_IMAGE: 'http://archive.org/images/notfound.png',
},

wayback: {
ARCHIVE: 'http://web.archive.org/__wb/search/metadata?q={{url}}',
ALEXA: 'http://data.alexa.com/data?cli=10&url={{url}}',
},

/**
* settings specific for supported platforms
*/
Expand Down
8 changes: 8 additions & 0 deletions functions/src/strings.js
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,14 @@ module.exports = {
speech: 'Version is {{version}}.',
},

wayback: {
speech: '{{url}} was first captured by the Internet Archive in {{earliestYear}} and most recently in {{latestYear}}. The archive has {{totalUniqueURLs}} unique url\'s for this website. <break></break>{{url}} is ranked <say-as interpret-as="ordinal">{{alexaWorldRank}}</say-as> in the world in popularity',
additionalSpeech: ' and <say-as interpret-as="ordinal">{{alexaUSRank}}</say-as> in the United States.',
default: 'Would you like to use the wayback machine to hear the history of a website? Simply say Wayback Machine and the name of the website you\'d like to hear.',
error: "I'm sorry I'm having trouble here. Maybe we should try this again later.",
suggestions: ['wayback machine google.com', 'wayback machine archive.org']
},

welcome: {
acknowledges: [
'Welcome to music at the Internet Archive.'
Expand Down
31 changes: 31 additions & 0 deletions functions/src/utils/traverse.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
const _ = require('lodash');
// const {debug} = require('../utils/logger')('ia:actions:utils:traverse');

/**
* Traverse a given object
*
* @param {Object} obj
*/
module.exports = function (obj) {
let results = [];
function traverse (obj) {
_.forOwn(obj, (val, key) => {
if (_.isArray(val)) {
val.forEach(el => {
traverse(el);
});
} else if (_.isObject(val)) {
traverse(val);
} else {
results.push(val);
}
});
}
traverse(obj);
let count = 0;
while (results.length !== 0) {
count += results.pop();
}
// debug('final count inside traverse = ' + count);
return count;
};
58 changes: 58 additions & 0 deletions functions/tests/actions/wayback-machine.spec.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
const {expect} = require('chai');
const rewire = require('rewire');

const action = rewire('../../src/actions/wayback-machine');

const archiveRequest = require('../fixtures/wayback-archive.json');
const alexaRequest = require('../fixtures/wayback-alexa.xml');
const mockApp = require('../_utils/mocking/platforms/app');
const mockDialog = require('../_utils/mocking/dialog');

describe('actions', () => {
describe('wayback machine', () => {
let app;
let dialog;

beforeEach(() => {
app = mockApp();
dialog = mockDialog();
action.__set__('dialog', dialog);
});

it('check to see that overall action eventually returns a promise', () => {
action.handler(app);
expect(Promise.resolve()).to.be.a('promise');
});

it('check to see that archiveEngine is working', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adaveinthelife before merging this PR please fix those tests or note that they are not working and should be fixed

let obj = { earliestYear: 0, latestYear: 0, totalUniqueURLs: 0 };
let result;

action.__set__('archiveEngine', function (a, b) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems this test has never come inside of this mock. You never call handler in this test on others. Actually, it is not clear what: check to see that archiveEngine is working means here, could you be more specific?

result = action.archiveEngine(archiveRequest, obj);
expect(result).to.change(obj, 'earliestYear');
expect(result).to.change(obj, 'latestYear');
expect(result).to.change(obj, 'totalUniqueURLs');
});
});

it('check to see that alexaEngine is working', () => {
let obj = { alexaWorldRank: 0 };
let result;

action.__set__('alexaEngine', function (a, b) {
result = action.alexaEngine(alexaRequest, obj);
expect(result).to.change(obj, 'alexaWorldRank');
});
});

it('check to see that xmlConverter is working', () => {
let result;

action.__set__('xmlConverter', function (a) {
result = action.xmlConverter(alexaRequest);
expect(result).to.not.throw('The XML parser didn\'t work.');
});
});
});
});
1 change: 1 addition & 0 deletions functions/tests/fixtures/wayback-alexa.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<!-- Need more Alexa data? Find our APIs here: https://aws.amazon.com/alexa/ --><ALEXA VER="0.9" URL="cnn.com/" HOME="0" AID="=" IDN="cnn.com/"><SD><POPULARITY URL="cnn.com/" TEXT="104" SOURCE="panel"/><REACH RANK="88"/><RANK DELTA="-4"/><COUNTRY CODE="US" NAME="United States" RANK="27"/></SD></ALEXA>
1 change: 1 addition & 0 deletions functions/tests/fixtures/wayback-archive.json

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions functions/tests/utils/traverse.spec.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
const {expect} = require('chai');
const traverse = require('../../src/utils/traverse');

let testJSON = {'captures': {
'1999': {
'text/html': 18360
},
'2000': {
'application/x-director': 19,
'video/quicktime': 1584,
'application/x-troff-man': 1,
'x-world/x-vrml': 1,
'audio/x-pn-realaudio': 176,
'audio/mpeg': 195,
'audio/x-wav': 3098,
'image/png': 97,
'text/html': 901401,
'video/x-ms-asf': 142,
'image/gif': 17388,
'text/plain': 394428,
'image/jpeg': 82903,
'application/x-shockwave-flash': 39,
'application/zip': 108,
'audio/x-aiff': 2767,
'text/css': 55,
'application/pdf': 291
}}};

describe('utils', () => {
describe('traverse', () => {
it('should traverse a given object to return the sum of it\'s leaf nodes', () => {
expect(traverse(testJSON)).to.be.equal(1423053);
});
});
});