Regular expression for matching URL

For a piece of software I wrote, I needed a security check which validates URLs.

It needed to match any URL starting with HTTP or HTTPS, the host could be a IP or a hostname (like localhost) and it also should match port numbers (like :8080 for example) and trailing directories (/directory/anotherdirectory).

Basically, it should validate the following URLs:

  • http://localhost/directory/directory
  • http://localhost:8080/directory/directory
  • https://123.123.123.123:9090/directory/directory
  • https://123.123.123.123:9090/
  • http://srv01.domain.tld:10005/directory/directory
  • https://srv01.domain.tld/

I found a sort-of solution on StackOverflow which seems to be compliant with the RFC1123 standard. Those solutions are not really what I want, but together, they will be:

(((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))|((([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9])))

I also added some extra’s to match a full URL, including HTTP/HTTPS, port numbers and trailing directories:

^(https?)://(((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))|((([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9])))(:[1-9][0-9]+)?(/)?([/?].+)?$

I tested this one on RegexPlanet and it seems to work perfectly.

This is how it looks like in Java code. I splitted up the regex for better readability and maintenance (mind the extra escaping backslashes, which are needed when you use regexes in Java):

// Regular expression is based on the RFC1123 standard: http://tools.ietf.org/html/rfc1123
// Solution found at: http://stackoverflow.com/questions/106179/regular-expression-to-match-hostname-or-ip-address
private final static String URL_PATTERN_FRONT = "^(https?)://";
private final static String URL_PATTERN_IP = "(((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))";
private final static String URL_PATTERN_HOSTNAME = "|((([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\\-]*[a-zA-Z0-9])\\.)*([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\\-]*[A-Za-z0-9])))";
private final static String URL_PATTERN_REAR = "(:[1-9][0-9]+)?(/)?([/?].+)?$";

public final static String URL_PATTERN = URL_PATTERN_FRONT + URL_PATTERN_IP + URL_PATTERN_HOSTNAME + URL_PATTERN_REAR;

Hopefully, this will work for you, too. đŸ™‚

3 comments

    • himmat on 2013/02/05 at 12:43
    • Reply

    hi this tutorial really worked for me, thank you

    • himmat on 2013/02/05 at 13:34
    • Reply

    really useful, this tutorial work for me , thank you

    1. Glad it worked for you too. Thanks for your comment! đŸ™‚

Leave a Reply

Your email address will not be published.