Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
KruthikaP-21 authored Jun 6, 2024
1 parent c6a96fb commit 0099755
Showing 1 changed file with 19 additions and 1 deletion.
20 changes: 19 additions & 1 deletion analyzers/URLs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,22 @@ An analyzer for URL links

## Analyzer Notes

Notes about the analyzer go here.
This is an NLP++ Analyzer that parses through text and identifies any hyperlinks present in the text. A wide variety of link formats are recognized.

Some sample links can be found in input/text.txt

From each hyperlink,the information like:

- Scheme of the link(Like https(Secure Hypertext Tranfer Protocol), ftp(File Transfer Protocol))

- Domain-name(eg: wikipedia)

- Sub-domain(like en(English)/jp(Japanese) etc.)

- Page path

- Top Level Domain(like org/edu etc.)is extracted and made available in the json format.

A sample output can be found(for the sample text) in input/text.text_log/output.json.

To run the analyzer, create a text file in input folder consisting of the text to be parsed.

0 comments on commit 0099755

Please sign in to comment.