Package com.gmt2001

Class PatternDetector

java.lang.Object
com.gmt2001.PatternDetector

public final class PatternDetector extends Object
Provides pattern matchers to JS, where Java RegEx is required
Author:
gmt2001
  • Method Details

    • linksMatcher

      public static Matcher linksMatcher(String str)
      Provides a Matcher for the links pattern against the input string

      Use the top-level capture groups to determine which type of match was made

      All matches are made using Unicode Case-Insensitive rules

      The links pattern checks for any of the following:

      • Any text followed by a full stop . followed by a valid TLD, which looks like an HTTP, FTP, RTSP, or WS link
        • Unicode text is supported
        • Non-ASCII TLDs match against both the punycode and Unicode representations, such as matching either xn--vermgensberater-ctb or vermögensberater
        • The scheme and port are matched optionally for the purposes of including them in the capture groups
        • The path, query, and fragment are not matched or output in the capture groups
        • Capture Groups:
          • weburi - the whole match of scheme and authority, such as https://hello.example.com:25000
            • webscheme - the scheme, if present, such as https
            • webauthority - the entire authority, such as hello.examaple.com:25000
              • webdomain - the domain component, including subdomain, such as hello.example
              • the TLD, such as com
                • webtld - if spaces were not used around the dot and no known workarounds were detected example.com
                • webworkaroundtld - if spaces were used around the dot, or another potential workaround to detection was used example. com
                • Only some TLDs are covered by workaround detection
              • webport - the port, if present, such as 25000
      • IP addresses
        • Capture Groups:
          • ip - any type of IP address
            • ipv4 - any sequence of base-10 numbers [0-9] and full stops . that looks like a valid IPv4 address
            • ipv6 - any sequence of base-16 numbers [0-9a-fA-F] and colons : that looks like an IPv6 address
              • Supports rules for removing leading 0 in each group
              • Supports ::
              • Will false-trigger on incorrect usage of colons, such as 2001:0db8::ff00::8329
      • Any text that looks like a selected list of URIs used for other protocols, such as skype:, magnet:, and mailto:
        • Capture Groups:
          • protouri - the whole match of protocol URI, such as magnet:?xt=urn:btih:c12fe1c06bba254a9dc9f519b335aa7c1367a88a
            • protoscheme - the scheme, such as magnet
            • protourn - the urn or other data of the URI, such as ?xt=urn:btih:c12fe1c06bba254a9dc9f519b335aa7c1367a88a
      • All capture types will also include the following capture groups:
        • path - the path, query, and fragment components, if present
      Parameters:
      str - the string being tested
      Returns:
      a Matcher that can be used to test if the string contains links
    • hasAnyLinks

      public static boolean hasAnyLinks(String str)
      Indicates if the input string matches the links regex
      Parameters:
      str - the string being tested
      Returns:
      true if a link is detected
      See Also:
    • hasWebLinks

      public static boolean hasWebLinks(String str)
      Indicates if the input string matches the links regex for the weburi capture group
      Parameters:
      str - the string being tested
      Returns:
      true if a link is detected in the weburi capture group
      See Also:
    • hasIpLinks

      public static boolean hasIpLinks(String str)
      Indicates if the input string matches the links regex for the ip capture group
      Parameters:
      str - the string being tested
      Returns:
      true if a link is detected in the ip capture group
      See Also:
    • hasProtoLinks

      public static boolean hasProtoLinks(String str)
      Indicates if the input string matches the links regex for the protouri capture group
      Parameters:
      str - the string being tested
      Returns:
      true if a link is detected in the protouri capture group
      See Also:
    • getLink

      public static String getLink(String str)
      Returns the link contained in the input string which matches the links regex

      If multiple links are present, only the first one returned by the Matcher is returned

      Matches against all link types

      Parameters:
      str - the string being tested
      Returns:
      null if no links were detected; otherwise, the first link returned by the Matcher
      See Also:
    • getLinks

      public static List<String> getLinks(String str)
      Returns all links contained in the input string which matches the links regex

      Matches against all link types

      Parameters:
      str - the string being tested
      Returns:
      a List of links returned by the Matcher
      See Also: