Last answered:

17 Feb 2023

Posted on:

30 Oct 2021

1

Resolved: Doubt regarding the usage of brackets in re

Why are we using (\w+)\ here? and the bracelets in jpg|png|gif too

5 answers ( 1 marked as helpful)
Instructor
Posted on:

01 Nov 2021

1

Hey Preetpal,

Thank you for your question and congratulations on advancing in the Python course!

Let's break the problem down into several steps. The first part in the raw string is (\w+). This part searches for all alphanumeric characters in the string variable that come before a dot (indicated by \.). We also have (jpg|png|gif). This helps finding all files with the extensions .jpg, .png, or .gif. The | sign is read as 'or'.

The reason why we need the parentheses around jpg|png|gif is that otherwise we modify the condition entirely. Examine the code and the output below where I have removed the parentheses. What we ask the computer to find are stings that match either of these three conditions:

  1. (\w)\.jpg
  2. png
  3. gif
    That is why, in the output below, we do not get a dot and the name of the file before the extensions png and gif.
    image

One more modification you could do to understand the code better is to remove the + sign in (\w+). Performing this operation would result in the following output:
image

Notice how, this time, only the last alphanumeric characters of image01 and image02are displayed. The + sign helped catching all character in the name.

Hope this helped for the understanding of the problem. What I can advise you is to play around and modify the code, study the output and try various things. In this way you are guaranteed to gain a deeper understanding on the topic!

Thank you and keep up the good work!

Kind regards,
365 Hristina

Posted on:

01 Nov 2021

0

Thanks for such a beautiful explaination

Posted on:

01 Nov 2021

0

One more thing why did we use / before w and after )

Instructor
Posted on:

02 Nov 2021

1

Hey again Preetpal,

Thank you, I am happy that you found the answer helpful!

The reason why we put a \ sign is to either create a special sequence, or escape a special character. When we type \w in our raw string, we create a special sequence that carries a special meaning, namely to match characters that are considered alphanumeric in the ASCII character set. When we type \. in our raw string, we escape the special character '.' which otherwise has a special meaning. By escaping the dot character, we allow Python to search for a dot in our string.

Let me demonstrate this with an example. Here is our original code (I have removed the parentheses around \w+ for clarity):
image
As expected, we obtain the names of all files with extensions .jpg, .png, or .gif. Note that I have added a file in the end called w+.jpg. Moreover, note that this file is not among the output files! This is because + is not an alphanumeric character. Try removing the + from the name and convince yourself that w.jpg will be among the outputs.

The question I would ask now is how should we modify our raw string such that we output this last file called w+.jpg? Well, first we need to remove the \ character in front of w since we want to exactly match the letter w in our string. After, we need to escape the + sign which used to have a special meaning without the \, namely it displayed the entire name of the file. Escaping the + sign means that we want to exactly match the + our string. The new code then becomes:
image

Lastly, what about the dot? According to the Python documentation, in the default mode, the dot matches any character except a newline. Hmm, okay, let's see what happens if we delete our raw string entirely and only write a dot:
image
The output continues as I scroll down. So we see that we need to escape the dot character in order to match it exactly in our string.

I know this can be a little overwhelming but it is indeed quite an interesting topic. Here is the official Python documentation listing all regular expressions and their function. Go ahead and explore what the plus and the dot characters do, what the escape symbol does and what the special sequences are.
https://docs.python.org/3/library/re.html

Thank you!

Kind regards,
365 Hristina

Posted on:

17 Feb 2023

1

Thanks, Preetpal for your question  

And what a wonderful explanation you provide, Hristina! 

a special sequence, or escape a special character are new concepts I explore in your answer.

Thanks a million

Submit an answer