Not all characters are valid in a URI. According to the Internet Engineering Task Force's document RFC-3986 (URI Generic Syntax), we can define a series of valid characters that can be used "as is", reserved characters that can be used "as is", but have a special meaning, and any other character that will need to be "URL Encoded" into a numerical format to be valid and usable and not conflict with characters that actually do have a special meaning.

Valid characters

The following characters can be used directly within a URL and do not require any special encoding:

CharactersExplanation

A  to Z

Uppercase alphabetical

a  to z

Lowercase alphabetical
0  to 9 Numerical
- Hyphen
_ Underscore
. Dot / full stop
~ Tilde


Reserved characters

These characters can also be used "as is", but they have a special meaning in a URI, and if they are to be used outside of their reserved meaning, then they must be URL Encoded. Failure to do so may generate unforeseen consequences in the processing of the URL, notably, problems can occur if you send a URI as a parameter for a redirection inside another URL, and if these characters are not URL Encoded, then the server may not be able to determine where a URL ends and where a parameter starts, or how many parameters there really is in a received URL.

CharactersMeaning
: Protocol separator or username/password separator when specified in the URL
@ Credential and host separator
/ Directory separator for resource or folder paths.
? Query string separator
& Separator for key-value pairs if more than one key-value pair is present in the URI
=Assigns a value to a key in a URI
# End of URL Anchor, indicating to a browser to jump to that anchor in an HTML page, if present in the source code.
%Character indicating a "percent-encoded" (urlencoded) character, and will be followed by a numerical code to represent a reserved character that otherwise could corrupt the meaning of a URI
+ Space
[ 
] 
! 
$ 
( 
) 
* 
, 
; 


Any character not defined in the valid character list, or not used as per the reserved character list usage should be URL encoded.

Encoding reserved and invalid characters

You can use an external tool to check URL encoding, or you can use the URL encoding features of your programming language to ensure that you are not using invalid or reserved characters in your links or variables, such as https://www.urlencoder.org/

Reserved characters have a specific meaning, and if misused, can break a URI. If you have to use one of these reserved characters for a meaning other than the one intended by the RFC document, then you will have to encode the character to a URL Encoded value.

For example, you have a variable called email in your query string, equal to mail@example.com - but the @ symbol is a reserved character, so you will need to encode it to avoid the web server or your browser interpreting this as you giving a URL to whom a username:password combination is to be sent.

Encoding an email address in a URI

If we use the site noted above, you will note that the @  character is encoded to %40 - the % reserved character indicates to the server that the following number value is the numerical representation of the actual value that you want to use, so for example, this query string is totally valid:

http://username:password@server.example.com?email=user%40example.com

As you can see in the example above, the @ symbol is correctly used as the separator between the username:password section sent to server.example.com, and the email variable in the query string has correctly encoded the arobase symbol to %40 to avoid ambiguity when checking where the username:password@site part stops and where the actual Query String starts, otherwise, the server cannot interpret what part of the URI is the password and what part is the URI proper.

Encoding the % character itself

If you need to set the % character for any other use than specifying the numerical value of a character, for example in the following URL:

http://example.com/crypto_currency_falls_40%_in_one_day

the % sign here is part of the URI and not used as a control character and itself will need to be encoded: Its code is %25 . This would therefore need to look like this:

http://example.com/crypto_currency_falls_40%25_in_one_day

The server will understand that %25 will need to be replaced by "%" and that it is not being used in its reserved meaning, but just as a plain character.

Invalid URI with variables

URL encoding is also important when setting redirect links in a URL, where you may need to add a complete URI with variables into a variable, as to avoid the initial page interpreting variables that it should not.

http://www.example.com/login?firstlogin=true&redirect=http://www.example.com/error?secondlogin=false&date=2022-01-01

This is invalid as there are 2 query string delimiters present in clear text (?) where there must only be one, and possibly the login page specified could attempt to process the variables secondLogin  and date that should only be seen by the error page set in the redirect variable, and not by the login  page.

To ensure that the second variable does not collide with the first, the value of the redirect  parameter should be urlencoded.

Valid URL - encode the value of the redirect variable

If we URL Encode the value of the redirect  parameter of the previous example, we end up with a URI that looks like this:

http://www.example.com/login?firstlogin=true&redirect=http%3A%2F%2Fwww.example.com%2Ferror%3Fsecondlogin%3Dfalse%26date%3D2018-01-01 

From here, the reserved characters :/?=  and &  that are present in the value are URL encoded to their corresponding percent values, and can no longer be potentially considered part of the initial URI.

The server that processes this key will of course need to URL decode the value in order to handle it properly and obtain the expected value.

Knowledge Base Reference ID: 202202271326