The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

Scraping JavaScript: Chromium won’t install

Scraping JavaScript: Chromium won’t install

0
Votes
6
Answer

Hi there
I’m currently taking the web scraping course and I’m having a lot of fun. The lectures are easy to follow and everything is explained nicely. I’m almost done with the course. Only scraping HTML with JavaScript is left. However, I’ve run into a little problem. I’d appreciate your help.
Course: Web Scraping and API Fundamentals in Python
Section: The requests-html package
Video / lecture: Scraping JavaScript 
From minute 2:20 you explain how the code ‘await r.html.arender()‘ should install the chromium browser before running the JavaScript on the page we’re working on. However, chromium won’t install here. This is the error I get:
 
[W:pyppeteer.chromium_downloader] start chromium download.
Download may take a few minutes.
—————————————————————————
Error Traceback (most recent call last)
~\anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
484 try:
–> 485 cnx.do_handshake()
486 except OpenSSL.SSL.WantReadError:

~\anaconda3\lib\site-packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
434
435 if new_retry.is_exhausted():
–> 436 raise MaxRetryError(_pool, url, error or ResponseError(cause))
437
438 log.debug(“Incremented Retry for (url=’%s’): %r”, url, new_retry)
MaxRetryError: HTTPSConnectionPool(host=’storage.googleapis.com’, port=443): Max retries exceeded with url: /chromium-browser-snapshots/Win_x64/575458/chrome-win32.zip (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)])”)))
 
Any advice on how I should proceed further? 🙁 
Thanks in advance and thank you for the great course!

6 Answers

0
Votes

Hi Joey,
You can try to downgrade the ‘urllib3’ package to 1.25.8:
pip install urllib3==1.25.8
Best,
365 Team

0
Votes

Hi Nikola
 
Thanks for your suggestion. Unfortunately, it didn’t work. I got the same error.
 
Here’s what I did:
1. I ran ‘pip install urllib3==1.25.8’ in Jupyter.
2. I restarted the kernel and ran all lines anew.
3. I ran ‘pip show urllib3’ to check whether the version was downgraded correctly. I got a positive response, namely:
Name: urllib3
Version: 1.25.8
Summary: HTTP library with thread-safe connection pooling, file post, and more.
Home-page: https://urllib3.readthedocs.io/
Author: Andrey Petrov
Author-email: andrey.petrov@shazow.net
License: MIT
Location: c:\users\dimit\anaconda3\lib\site-packages
Requires:
Required-by: requests, pyppeteer
Note: you may need to restart the kernel to use updated packages.
4. I ran ‘await r.html.arender()’, but got the same error as before, namely:
MaxRetryError: HTTPSConnectionPool(host=’storage.googleapis.com’, port=443): Max retries exceeded with url: /chromium-browser-snapshots/Win_x64/575458/chrome-win32.zip (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)])”)))
 
I’m really frustrated, because I’m almost done with the course and this little thing is preventing me from finishing it entirely. Please help.

0
Votes

Never mind. Found a solution which worked out great. Here’s the link:
https://github.com/miyakogi/pyppeteer/issues/258

0
Votes

I am glad you managed to fix the issue!
Thank you for sharing the solution as well!
 
Best,
365 Team

0
Votes

Now I have another problem. x-( 
 
 
I’m working on the exercise ‘Scraping YouTube’ from the same ‘Web Scraping and API Fundamentals in Python’ course. Here’s my code so far:
 
 
from requests_html import AsyncHTMLSession #Load requests-html package
session = AsyncHTMLSession() #Start a session
base_site = ‘https://www.youtube.com’
r = await session.get(base_site) #Send a GET request
await r.html.arender() #Render the JS code on the website
 
 
However, I can’t go on because i get the following error:
 
 

RuntimeError: Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.

 
 
I tried rerunning the code and the error changed to:
 
 

Future exception was never retrieved
future: <Future finished exception=NetworkError('Protocol error Target.detachFromTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.detachFromTarget: Target closed.
Future exception was never retrieved
future: <Future finished exception=NetworkError('Protocol error (Target.sendMessageToTarget): No session with given id')>
pyppeteer.errors.NetworkError: Protocol error (Target.sendMessageToTarget): No session with given id

 
 
I can’t get the code to run without an error. Please help.

0
Votes

Too bad the guys from the team weren’t able to help. Anyway, I found a solution to my second problem (see the link below). The method described in the link is different from the one in the lecture, but it got the job done. Now I’m done with the exercise and the course. Scraping is a lot of fun. 🙂
 
https://towardsdatascience.com/data-science-skills-web-scraping-javascript-using-python-97a29738353f

×
Complete Data Science Training
Save 60%