Skip to content

Commit

Permalink
Merge pull request #114 from KruthikaP-21/patch-1
Browse files Browse the repository at this point in the history
Readme for Analyzers
  • Loading branch information
dehilsterlexis authored Jun 6, 2024
2 parents 1cc1ced + 0099755 commit dad5a76
Show file tree
Hide file tree
Showing 4 changed files with 51 additions and 4 deletions.
10 changes: 9 additions & 1 deletion analyzers/Address Parser/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,12 @@ A parser for American postal addresses

## Analyzer Notes

Notes about the analyzer go here.
This is an NLP++ Analyzer that parses through text and identifies addresses(in USPS formats) listed anywhere in the text.

From the addresses, the following information is extracted(based on the type of address all or some of this information is extracted): House number, Street Number, Street Name, Street Suffix(like ST- for street),Street type(like Lane or Road), city, state, pincode, type of address.

Additionally Rural Route and Highway Contract addresses are also parsed form which the information of Higway contract number/Rural Route and the Post box number is extracted.

All this information is made available in the json format. A sample output can be found(for the sample text) in input/text.text_log/output.json.

To run the analyzer, create a text file in input folder consisting of the text to be parsed.
4 changes: 3 additions & 1 deletion analyzers/Email Addresses/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,6 @@ An analyzer for email addresses

## Analyzer Notes

Notes about the analyzer go here.
This is an NLP++ Analyzer that parses through text and identifies email addresses listed in the text. A variety of email-id formats are recognized. Some sample addresses can be found in input/text.txt From each email-id,the information on the local-name, the domain-name and the top level domain(tld) is extracted and made available in the json format. A sample output can be found(for the sample text) in input/text.text_log/output.json.

To run the analyzer, create a text file in input folder consisting of the text to be parsed.
21 changes: 20 additions & 1 deletion analyzers/Telephone Numbers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,23 @@ An analyzer for telephone numbers

## Analyzer Notes

Notes about the analyzer go here.
This is an NLP++ Analyzer that parses through text and identifies Telephone numbers listed in the text. Some sample telephone number formats can be found in input/text.txt

Here are some examples
- 212.456.7890
- 212 456 7890
- +12124567890
- +12124567890
- +1 212.456.7890
- +212-456-7890
- 1-212-456-7890

For each telephone number identified in the text, the following information is extracted:

- the text identified
- area
- prefix of the number
- station value
- the type of the telephone(mobile/landline).

To run the analyzer, create a text file in the input folder consisting of the text to be parsed.
20 changes: 19 additions & 1 deletion analyzers/URLs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,22 @@ An analyzer for URL links

## Analyzer Notes

Notes about the analyzer go here.
This is an NLP++ Analyzer that parses through text and identifies any hyperlinks present in the text. A wide variety of link formats are recognized.

Some sample links can be found in input/text.txt

From each hyperlink,the information like:

- Scheme of the link(Like https(Secure Hypertext Tranfer Protocol), ftp(File Transfer Protocol))

- Domain-name(eg: wikipedia)

- Sub-domain(like en(English)/jp(Japanese) etc.)

- Page path

- Top Level Domain(like org/edu etc.)is extracted and made available in the json format.

A sample output can be found(for the sample text) in input/text.text_log/output.json.

To run the analyzer, create a text file in input folder consisting of the text to be parsed.

0 comments on commit dad5a76

Please sign in to comment.