Class URLUtils

java.lang.Object
uk.ac.starlink.util.URLUtils

public class URLUtils extends Object
Provides convenience methods for resolving URLs. This class provides some static methods for turning strings into URLs. This tends to be a bit of a pain in java, since you have to watch out for MalformedURLExceptions all over and work out what the context is. The methods provided here assume that if a string looks like a URL it is one, if it doesn't it's a file name, and if it's not absolute or resolved against a given context it is relative to the current directory. From the point of view of a user providing text to an application, or an XML document providing an href, this is nearly always what is wanted. The strategy can lead to surprising situations in the case that wacky URL protocols are used; for instance if makeURL is called on the string "gftp://host/file" and no gftp handler is installed, it will be interpreted as a file-protocol URL referring to the (presumably non-existent) file "gftp://host/file". In this case the eventual upshot will presumably be a file-not-found type error rather than a MalformedURLException type error getting presented to the user. Users of this class should be of the opinion that this is not a particularly bad thing.

The systemId strings used by Sources have similar semantics to the strings which this class converts to URLs or contexts.

This class assumes that the "file:" protocol is legal for URLs, and will throw AssertionErrors if this turns out not to be the case.

Author:
Mark Taylor (Starlink), Norman Gray (Starlink)
  • Method Details

    • newURL

      public static URL newURL(String spec) throws MalformedURLException
      Drop-in replacement for the deprecated URL(String) constructor.

      All URL constructors are deprecated since Java 20 because of issues with parsing and validation. This utility method provides a way for code to avoid deprecation warnings. It may not do much to solve the underlying problems, and might introduce some new ones, but code that is having problems here can be adapted to handle URL creation more carefully; such approaches, according to the JDK documentation, should generally be URI-based. Other utility methods may be added here in future as required.

      As far as I can tell, most of the difficulties arising with URL parsing that have led to the deprecation relate to relatively strange URLs, so that "normal" http/https/file-protocol URL strings passed to this method should behave the same as if passed to the deprecated constructor. However, there may be changes of behaviour when it comes to constructions like embedded spaces in paths or special characters in query parts etc.

      Note that passing a string to this method which is not a valid URI, for instance because it contains unescaped illegal characters like "[", will fail, unlike the call to new URL(). In such cases a MalformedURLException will be thrown (which is really the result of a URISyntaxException).

      Parameters:
      spec - textual representation of URL
      Returns:
      URL, not null
      Throws:
      MalformedURLException - in case of syntax error
    • makeURL

      public static URL makeURL(String location)
      Obtains a URL from a string. If the String has the form of a URL, it is turned directly into a URL. If it does not, it is treated as a filename, and turned into a file-protocol URL. In the latter case a relative or absolute filename may be used. If it is null or a blank string (or something else equally un-filename like?) then null is returned.
      Parameters:
      location - a string representing the location of a resource
      Returns:
      a URL representing the location of the resource
    • makeURL

      public static URL makeURL(String context, String location)
      Obtains a URL from a string in a given context. The string context is turned into a URL as per the makeURL(String) method, unless it is null or the empty string, in which case it is treated as a reference to the current directory. The string location is then turned into a URL in the same way as using makeURL(String), except that if it represents a relative path it is resolved in the context of context, taking its protocol and/or relative position from it.
      Parameters:
      context - a string representing the context within which location is to be resolved
      location - a string representing the location of a resource
      Returns:
      a URL representing the location of the resource
    • urlToUri

      public static URI urlToUri(URL url) throws MalformedURLException
      Turns a URL into a URI.

      Since URIs are syntactically and semantically a superset of URLs, this conversion should not cause any errors. If, however, the input URL is malformed in rather extreme ways, then the URI construction will fail. These ways include (but are not necesssarily limited to) the features discussed in URI(String,String,String,String,String), namely that a scheme is present, but with a relative path, or that it has a registry-based authority part.

      Because of the way the class does the conversion, the method will itself resolve some malformations of URLs. You should not rely on this, however, firstly because the method might in principle change, but mostly because you should avoid creating such malformed URLs in the first place.

      The most common source of malformed URLs is that of file URLs which have inadequately escaped (windows) drive letters or spaces in the name: such URLs should be constructed using the File.toURI() or File.toURL() methods. Such URLs will be escaped by this method.

      Parameters:
      url - a URL to be converted. If this is null, then the method returns null
      Returns:
      the input URL as a URI, or null if the input was null
      Throws:
      MalformedURLException - if the URI cannot be constructed because the input URL turns out to be malformed
    • makeFileURL

      public static URL makeFileURL(File file)
      Constructs a legal URL for a given File. Unlike java, this gives you a URL which conforms to RFC1738 and looks like "file://localhost/abs-path" rather than "file:abs-or-rel-path".
      Parameters:
      file - file
      Returns:
      URL
      See Also:
      • "RFC 1738"
    • fixURL

      public static URL fixURL(URL url)
      Fixes file: URLs which don't have enough slashes in them. Java generates invalid URLs of the form "file:abs-or-rel-path" when it should generate "file://localhost/abs-path".
      Parameters:
      url - input URL
      Returns:
      fixed URL
      See Also:
      • "RFC 1738"
    • sameResource

      public static boolean sameResource(URL url1, URL url2)
      Attempts to determine whether two URLs refer to the same resource. Not likely to be foolproof, but slightly smarter than using equals.
      Parameters:
      url1 - first URL
      url2 - second URL
      Returns:
      true if url1 and url2 appear to refer to the same resource
    • urlToFile

      public static File urlToFile(String url)
      Locates the local file, if any, represented by a URL. If the URL string uses the "file:" protocol, and has no query or anchor parts, the filename will be extracted and the corresponding file returned. Otherwise, null is returned.
      Parameters:
      url - URL string
      Returns:
      local file referenced by url, or null
    • urlEquals

      public static boolean urlEquals(URL url1, URL url2)
      Compares two URLs. This does approximatly the same job as the URL.equals() method, but it avoids the possible network accesses associated with that implementation, and copes with null values.
      Parameters:
      url1 - first URL
      url2 - second URL
      Returns:
      true iff both are the same, or both are null
    • urlEncode

      public static String urlEncode(String txt)
      URL-encodes a string using UTF-8, without pesky exceptions.
      Parameters:
      txt - string to encode
      Returns:
      encoded string
      See Also:
    • urlDecode

      public static String urlDecode(String txt)
      URL-decodes a string using UTF-8, without pesky exceptions.
      Parameters:
      txt - string to decode
      Returns:
      decoded string
      See Also:
    • followRedirects

      public static URLConnection followRedirects(URLConnection conn, int[] redirCodes) throws IOException
      Takes a URLConnection and repeatedly follows 3xx redirects until a non-redirect status is achieved. Infinite loops are defended against. The Accept-Encoding header, if present, is propagated to redirect targets.

      Note that the HttpURLConnection.setInstanceFollowRedirects(boolean) method does something like this, but it refuses to redirect between different URL protocols, for security reasons (see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4620571). Considering similar arguments, this method will direct HTTP->HTTPS, but not vice versa.

      Parameters:
      conn - initial URL connection
      redirCodes - list of HTTP codes for which redirects should be followed; if null all suitable 3xx redirections will be followed (301, 302, 303, 307, 308)
      Returns:
      target URL connection (if no redirects, the same as hconn)
      Throws:
      IOException
      See Also:
    • resolveLocation

      public static URL resolveLocation(URL url0, String location) throws IOException
      Returns the URL corresponding by a supplied location string, resolved in the context of a base URL.
      Parameters:
      url0 - context URL
      location - location, may be relative or absolute URI
      Returns:
      resolved location URL
      Throws:
      IOException - if anything goes wrong
    • installCustomHandlers

      public static void installCustomHandlers()
      Attempts to install additional URL protocol handlers suitable for astronomy applications. Currently installs handlers which can supply MySpace connections using either "ivo:" or "myspace:" protocols.