Skip to content

dbuscaglia/triplets_map_reduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Input: a text string containing English words, whitespace (spaces and newlines) and punctuation like commas, periods, question marks and semicolons.

For example:

"""Hello, I like nuts. Do you like nuts? No? Are you sure? Why don't you like nuts? Are you nuts? I like you"""

Output:

Print a list of triplets. Each triplet is a pair of words and a count

For example the output for the sample input:

Are you: 2 like nuts: 3 you like: 3 I like: 2

A pair of words should show up in the output if one of the words follows the other in the input and are separated only by whitespace. Every pair that shows up more than once should have an entry in the output with the correct number of occurrences. Note, that the order of the words in the pair doesn't matter: 'green bee' and 'bee green' are 2 occurrences of the same pair. Ignore case. 'BlUe sKY' is the same pair as 'SKy bLUE'.

Your mission if you choose to accept it:

Write a function that accepts the input and produces the output

This is my solution to the coding challenge

I have designed this to be deployable easily as a production caliber solution.

The dependencies are in requirements.txt pip install -r requirements.txt

To run unit tests: python -m unittest discover

To run the job with sample input (you can use any file you would like) python jobs/triplets_job.py sample_input.txt

We can discuss how this would be run on EMR or Hadoop / scheduling

About

solution to the triplets problem

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages