.NET 8 C# URL normalizer.
URL normalization, also known as URL canonicalization, is the process of normalizing (standardizing) the text representation of a URL to determine if differently-formatted URLs are identical.
-
Duplicate slashes are removed
file://example.com/foo//bar.html
→file://example.com/foo/bar.html
-
Default port is removed
ftp://example.com:21/
→ftp://example.com/
-
Dot-segments are removed
file://example.com/foo/./bar/baz/../qux
→file://example.com/foo/bar/qux
-
Empty path is converted to "/"
ftp://example.com
→ftp://example.com/
-
Percent-encoded triplets are uppercased
ftp://example.com/foo%2a
→ftp://example.com/foo%2A
-
Percent-encoded triplets of unreserved characters are decoded
ftp://example.com/%7Efoo
→ftp://example.com/~foo
-
Scheme and host are lowercased
FTP://[email protected]/Foo
→ftp://[email protected]/Foo
-
Directory index can be removed (optional, via
removableDirectoryIndexNames
)
http://example.com/default.asp
→http://example.com/
http://example.com/a/index.html
→http://example.com/a/
-
Fragment can be removed (optional, via
isFragmentIgnored
)
http://example.com/bar.html#section1
→http://example.com/bar.html
-
Scheme can be changed (optional, via
PreferredScheme
)
https://example.com/
→http://example.com/
-
Query parameters are sorted
http://example.com/display?lang=en&article=fred
→http://example.com/display?article=fred&lang=en
-
User-info can be removed (optional, via
isUserInfoIgnored
)
http://user:[email protected]
→http://example.com/
-
Empty query is removed
http://example.com/display?
→http://example.com/display
PM> Install-Package Toimik.UrlNormalization
> dotnet add package Toimik.UrlNormalization
// Use default arguments
// var normalizer = new UrlNormalizer();
// Use custom arguments
var normalizer = new UrlNormalizer(isAdjacentSlashesCollapsed: false);
var url = ...
var normalizedlUrl = normalizer.Normalize(url);
// Use default arguments
// var normalizer = new HttpUrlNormalizer();
// Use custom arguments
var normalizer = new HttpUrlNormalizer(
preferredScheme: "https",
isUserInfoIgnored: false,
removableDirectoryIndexNames: new HashSet<string>(0), // override the default
isFragmentIgnored: false);
var url = ...
var normalizedlUrl = normalizer.Normalize(url);