Package com.gmt2001
Class PatternDetector
java.lang.Object
com.gmt2001.PatternDetector
Provides pattern matchers to JS, where Java RegEx is required
- Author:
- gmt2001
-
Method Summary
Modifier and TypeMethodDescriptionstatic String
Returns the link contained in the input string which matches the links regexReturns all links contained in the input string which matches the links regexstatic boolean
hasAnyLinks
(String str) Indicates if the input string matches the links regexstatic boolean
hasIpLinks
(String str) Indicates if the input string matches the links regex for theip
capture groupstatic boolean
hasProtoLinks
(String str) Indicates if the input string matches the links regex for theprotouri
capture groupstatic boolean
hasWebLinks
(String str) Indicates if the input string matches the links regex for theweburi
capture groupstatic Matcher
linksMatcher
(String str) Provides aMatcher
for the links pattern against the input string
-
Method Details
-
linksMatcher
Provides aMatcher
for the links pattern against the input stringUse the top-level capture groups to determine which type of match was made
All matches are made using Unicode Case-Insensitive rules
The links pattern checks for any of the following:
- Any text followed by a full stop
.
followed by a valid TLD, which looks like an HTTP, FTP, RTSP, or WS link- Unicode text is supported
- Non-ASCII TLDs match against both the punycode and Unicode representations, such as matching either
xn--vermgensberater-ctb
orvermögensberater
- The scheme and port are matched optionally for the purposes of including them in the capture groups
- The path, query, and fragment are not matched or output in the capture groups
-
Capture Groups:
- weburi - the whole match of scheme and authority, such as
https://hello.example.com:25000
- webscheme - the scheme, if present, such as
https
- webauthority - the entire authority, such as
hello.examaple.com:25000
- webdomain - the domain component, including subdomain, such as
hello.example
- the TLD, such as
com
- webtld - if spaces were not used around the dot and no known workarounds were detected
example.com
- webworkaroundtld - if spaces were used around the dot, or another potential workaround to detection was used
example. com
- Only some TLDs are covered by workaround detection
- webtld - if spaces were not used around the dot and no known workarounds were detected
- webport - the port, if present, such as
25000
- webdomain - the domain component, including subdomain, such as
- webscheme - the scheme, if present, such as
- weburi - the whole match of scheme and authority, such as
- IP addresses
- Capture Groups:
- ip - any type of IP address
- ipv4 - any sequence of base-10 numbers
[0-9]
and full stops.
that looks like a valid IPv4 address - ipv6 - any sequence of base-16 numbers
[0-9a-fA-F]
and colons:
that looks like an IPv6 address- Supports rules for removing leading
0
in each group - Supports
::
- Will false-trigger on incorrect usage of colons, such as
2001:0db8::ff00::8329
- Supports rules for removing leading
- ipv4 - any sequence of base-10 numbers
- ip - any type of IP address
- Capture Groups:
- Any text that looks like a selected list of URIs used for other protocols, such as
skype:
,magnet:
, andmailto:
- Capture Groups:
- protouri - the whole match of protocol URI, such as
magnet:?xt=urn:btih:c12fe1c06bba254a9dc9f519b335aa7c1367a88a
- protoscheme - the scheme, such as
magnet
- protourn - the urn or other data of the URI, such as
?xt=urn:btih:c12fe1c06bba254a9dc9f519b335aa7c1367a88a
- protoscheme - the scheme, such as
- protouri - the whole match of protocol URI, such as
- Capture Groups:
- All capture types will also include the following capture groups:
- path - the path, query, and fragment components, if present
- Parameters:
str
- the string being tested- Returns:
- a
Matcher
that can be used to test if the string contains links
- Any text followed by a full stop
-
hasAnyLinks
Indicates if the input string matches the links regex- Parameters:
str
- the string being tested- Returns:
true
if a link is detected- See Also:
-
hasWebLinks
Indicates if the input string matches the links regex for theweburi
capture group- Parameters:
str
- the string being tested- Returns:
true
if a link is detected in theweburi
capture group- See Also:
-
hasIpLinks
Indicates if the input string matches the links regex for theip
capture group- Parameters:
str
- the string being tested- Returns:
true
if a link is detected in theip
capture group- See Also:
-
hasProtoLinks
Indicates if the input string matches the links regex for theprotouri
capture group- Parameters:
str
- the string being tested- Returns:
true
if a link is detected in theprotouri
capture group- See Also:
-
getLink
Returns the link contained in the input string which matches the links regexIf multiple links are present, only the first one returned by the
Matcher
is returnedMatches against all link types
- Parameters:
str
- the string being tested- Returns:
null
if no links were detected; otherwise, the first link returned by theMatcher
- See Also:
-
getLinks
Returns all links contained in the input string which matches the links regexMatches against all link types
-