uritools — URI parsing, classification and composition¶
This module provides RFC 3986 compliant functions for parsing,
classifying and composing URIs and URI references, largely replacing
the Python Standard Library’s urllib.parse module.
>>> from uritools import uricompose, urijoin, urisplit, uriunsplit
>>> uricompose(scheme='foo', host='example.com', port=8042,
... path='/over/there', query={'name': 'ferret'},
... fragment='nose')
'foo://example.com:8042/over/there?name=ferret#nose'
>>> parts = urisplit(_)
>>> parts.scheme
'foo'
>>> parts.authority
'example.com:8042'
>>> parts.getport(default=80)
8042
>>> parts.getquerydict().get('name')
['ferret']
>>> parts.isuri()
True
>>> parts.isabsuri()
False
>>> urijoin(uriunsplit(parts), '/right/here?name=swallow#beak')
'foo://example.com:8042/right/here?name=swallow#beak'
For various reasons, urllib.parse and its Python 2 predecessor
urlparse are not compliant with current Internet standards. As
stated in Lib/urllib/parse.py:
RFC 3986 is considered the current standard and any future changes to urlparse module should conform with it. The urlparse module is currently not entirely compliant with this RFC due to defacto scenarios for parsing, and for backward compatibility purposes, some parsing quirks from older RFCs are retained.
This module aims to provide fully RFC 3986 compliant replacements for
the most commonly used functions found in urllib.parse. It
also includes functions for distinguishing between the different forms
of URIs and URI references, and for conveniently creating URIs from
their individual components.
See also
URI Classification¶
According to RFC 3986, a URI reference is either a URI or a relative reference. If the URI reference’s prefix does not match the syntax of a scheme followed by its colon separator, then the URI reference is a relative reference.
A relative reference that begins with two slash characters is termed a network-path reference. A relative reference that begins with a single slash character is termed an absolute-path reference. A relative reference that does not begin with a slash character is termed a relative-path reference.
When a URI reference refers to a URI that is, aside from its fragment component, identical to the base URI, that reference is called a same-document reference. Examples of same-document references are relative references that are empty or include only the number sign (“#”) separator followed by a fragment identifier.
A URI without a fragment identifier is termed an absolute URI. A base URI, for example, must be an absolute URI. If the base URI is obtained from a URI reference, then that reference must be stripped of any fragment component prior to its use as a base URI.
-
uritools.isuri(uristring)¶ Return
Trueif uristring is a URI.
-
uritools.isabsuri(uristring)¶ Return
Trueif uristring is an absolute URI.
-
uritools.isnetpath(uristring)¶ Return
Trueif uristring is a network-path reference.
-
uritools.isabspath(uristring)¶ Return
Trueif uristring is an absolute-path reference.
-
uritools.isrelpath(uristring)¶ Return
Trueif uristring is a relative-path reference.
-
uritools.issamedoc(uristring)¶ Return
Trueif uristring is a same-document reference.
URI Composition¶
-
uritools.uricompose(scheme=None, authority=None, path='', query=None, fragment=None, userinfo=None, host=None, port=None, querysep='&', encoding='utf-8')¶ Compose a URI reference string from its individual components.
All components may be specified as either Unicode strings, which will be encoded according to encoding, or
bytesobjects.authority may also be passed a three-item iterable specifying userinfo, host and port subcomponents. If both authority and any of the userinfo, host or port keyword arguments are given, the keyword argument will override the corresponding authority subcomponent.
query may also be passed a mapping object or a sequence of two-element tuples, which will be converted to a string of name=value pairs separated by querysep.
The returned URI reference is of type
str.
-
uritools.urijoin(base, ref, strict=False)¶ Convert a URI reference relative to a base URI to its target URI string.
If strict is
False, a scheme in the reference is ignored if it is identical to the base URI’s scheme.
-
uritools.uriunsplit(parts)¶ Combine the elements of a five-item iterable into a URI reference’s string representation.
URI Decomposition¶
-
uritools.uridefrag(uristring)¶ Remove an existing fragment component from a URI reference string.
The return value is an instance of a subclass of
collections.namedtuplewith the following read-only attributes:Attribute Index Value uri0 Absolute URI, or relative reference without a fragment identifier fragment1 Fragment identifier, or Noneif no fragment was present
-
uritools.urisplit(uristring)¶ Split a well-formed URI reference string into a tuple with five components corresponding to a URI’s general structure:
<scheme>://<authority>/<path>?<query>#<fragment>
The return value is an instance of a subclass of
collections.namedtuplewith the following read-only attributes:Attribute Index Value scheme0 URI scheme, or Noneif not presentauthority1 Authority component, or Noneif not presentpath2 Path component, always present but may be empty query3 Query component, or Noneif not presentfragment4 Fragment identifier, or Noneif not presentuserinfoUserinfo subcomponent of authority, or Noneif not presenthostHost subcomponent of authority, or Noneif not presentportPort subcomponent of authority as a (possibly empty) string, or Noneif not present
URI Encoding¶
-
uritools.uridecode(uristring, encoding='utf-8', errors='strict')¶ Decode a URI string or string component.
If encoding is set to
None, return the percent-decoded uristring as abytesobject. Otherwise, replace any percent-encodings and decode uristring using the codec registered for encoding, returning a Unicode string.
-
uritools.uriencode(uristring, safe='', encoding='utf-8', errors='strict')¶ Encode a URI string or string component.
If uristring is a
bytesobject, replace any characters not inUNRESERVEDor safe with their corresponding percent-encodings and return the result as abytesobject. Otherwise, encode uristring using the codec registered for encoding before replacing any percent encodings.
Structured Parse Results¶
The result objects from the uridefrag() and urisplit()
functions are instances of subclasses of
collections.namedtuple. These objects contain the attributes
described in the function documentation, as well as some additional
convenience methods.
-
class
uritools.DefragResult¶ Class to hold
uridefrag()results.-
getfragment(default=None, encoding='utf-8', errors='strict')¶ Return the decoded fragment identifier, or default if the original URI did not contain a fragment component.
-
geturi()¶ Return the recombined version of the original URI as a string.
-
-
class
uritools.SplitResult¶ Base class to hold
urisplit()results.Return the decoded userinfo, host and port subcomponents of the URI authority as a three-item tuple.
-
getfragment(default=None, encoding='utf-8', errors='strict')¶ Return the decoded fragment identifier, or default if the original URI reference did not contain a fragment component.
-
gethost(default=None, errors='strict')¶ Return the decoded host subcomponent of the URI authority as a string or an
ipaddressaddress object, or default if the original URI reference did not contain a host.
-
getpath(encoding='utf-8', errors='strict')¶ Return the normalized decoded URI path.
-
getport(default=None)¶ Return the port subcomponent of the URI authority as an
int, or default if the original URI reference did not contain a port or if the port was empty.
-
getquery(default=None, encoding='utf-8', errors='strict')¶ Return the decoded query string, or default if the original URI reference did not contain a query component.
-
getquerydict(sep='&', encoding='utf-8', errors='strict')¶ Split the query component into individual name=value pairs separated by sep and return a dictionary of query variables. The dictionary keys are the unique query variable names and the values are lists of values for each name.
-
getquerylist(sep='&', encoding='utf-8', errors='strict')¶ Split the query component into individual name=value pairs separated by sep, and return a list of (name, value) tuples.
-
getscheme(default=None)¶ Return the URI scheme in canonical (lowercase) form, or default if the original URI reference did not contain a scheme component.
-
geturi()¶ Return the re-combined version of the original URI reference as a string.
-
getuserinfo(default=None, encoding='utf-8', errors='strict')¶ Return the decoded userinfo subcomponent of the URI authority, or default if the original URI reference did not contain a userinfo field.
-
isabspath()¶ Return
Trueif this is an absolute-path reference.
-
isabsuri()¶ Return
Trueif this is an absolute URI.
-
isnetpath()¶ Return
Trueif this is a network-path reference.
-
isrelpath()¶ Return
Trueif this is a relative-path reference.
-
issamedoc()¶ Return
Trueif this is a same-document reference.
-
isuri()¶ Return
Trueif this is a URI.
-
transform(ref, strict=False)¶ Transform a URI reference relative to self into a
SplitResultrepresenting its target URI.
Character Constants¶
-
uritools.GEN_DELIMS¶ A string containing all general delimiting characters specified in RFC 3986.
-
uritools.RESERVED¶ A string containing all reserved characters specified in RFC 3986.
-
uritools.SUB_DELIMS¶ A string containing all subcomponent delimiting characters specified in RFC 3986.
-
uritools.UNRESERVED¶ A string containing all unreserved characters specified in RFC 3986.