Resolved: Doubt regarding the usage of brackets in re
Why are we using (\w+)\ here? and the bracelets in jpg|png|gif too
Thank you for your question and congratulations on advancing in the Python course!
Let's break the problem down into several steps. The first part in the raw string is
(\w+). This part searches for all alphanumeric characters in the
string variable that come before a dot (indicated by
\.). We also have
(jpg|png|gif). This helps finding all files with the extensions .jpg, .png, or .gif. The
| sign is read as 'or'.
The reason why we need the parentheses around
jpg|png|gif is that otherwise we modify the condition entirely. Examine the code and the output below where I have removed the parentheses. What we ask the computer to find are stings that match either of these three conditions:
That is why, in the output below, we do not get a dot and the name of the file before the extensions png and gif.
One more modification you could do to understand the code better is to remove the
+ sign in
(\w+). Performing this operation would result in the following output:
Notice how, this time, only the last alphanumeric characters of
image02are displayed. The
+ sign helped catching all character in the name.
Hope this helped for the understanding of the problem. What I can advise you is to play around and modify the code, study the output and try various things. In this way you are guaranteed to gain a deeper understanding on the topic!
Thank you and keep up the good work!
Hey again Preetpal,
Thank you, I am happy that you found the answer helpful!
The reason why we put a
\ sign is to either create a special sequence, or escape a special character. When we type
\w in our raw string, we create a special sequence that carries a special meaning, namely to match characters that are considered alphanumeric in the ASCII character set. When we type
\. in our raw string, we escape the special character
'.' which otherwise has a special meaning. By escaping the dot character, we allow Python to search for a dot in our string.
Let me demonstrate this with an example. Here is our original code (I have removed the parentheses around
\w+ for clarity):
As expected, we obtain the names of all files with extensions .jpg, .png, or .gif. Note that I have added a file in the end called
w+.jpg. Moreover, note that this file is not among the output files! This is because
+ is not an alphanumeric character. Try removing the
+ from the name and convince yourself that
w.jpg will be among the outputs.
The question I would ask now is how should we modify our raw string such that we output this last file called
w+.jpg? Well, first we need to remove the
\ character in front of
w since we want to exactly match the letter
w in our string. After, we need to escape the
+ sign which used to have a special meaning without the
\, namely it displayed the entire name of the file. Escaping the
+ sign means that we want to exactly match the
+ our string. The new code then becomes:
Lastly, what about the dot? According to the Python documentation, in the default mode, the dot matches any character except a newline. Hmm, okay, let's see what happens if we delete our raw string entirely and only write a dot:
The output continues as I scroll down. So we see that we need to escape the dot character in order to match it exactly in our string.
I know this can be a little overwhelming but it is indeed quite an interesting topic. Here is the official Python documentation listing all regular expressions and their function. Go ahead and explore what the plus and the dot characters do, what the escape symbol does and what the special sequences are.