diff --git a/analyzers/URLs/README.md b/analyzers/URLs/README.md index 518eed1..27dfbaa 100644 --- a/analyzers/URLs/README.md +++ b/analyzers/URLs/README.md @@ -4,4 +4,22 @@ An analyzer for URL links ## Analyzer Notes -Notes about the analyzer go here. \ No newline at end of file +This is an NLP++ Analyzer that parses through text and identifies any hyperlinks present in the text. A wide variety of link formats are recognized. + +Some sample links can be found in input/text.txt + +From each hyperlink,the information like: + +- Scheme of the link(Like https(Secure Hypertext Tranfer Protocol), ftp(File Transfer Protocol)) + +- Domain-name(eg: wikipedia) + +- Sub-domain(like en(English)/jp(Japanese) etc.) + +- Page path + +- Top Level Domain(like org/edu etc.)is extracted and made available in the json format. + +A sample output can be found(for the sample text) in input/text.text_log/output.json. + +To run the analyzer, create a text file in input folder consisting of the text to be parsed.