September 2019 – Sugata Ray

Time:12:15:20 Price:302.10 Time:12:20:8 Price:302.08 Time:12:25:19 Price:302.05 Time:12:30:20 Price:302.07 Time:12:35:9 Price:302.17 Time:12:40:9 Price:302.09 Time:12:45:28 Price:302.22 Time:12:50:28 Price:302.24 Time:12:55:16 Price:302.26 Time:13:0:8 Price:302.18 Time:13:5:9 Price:302.01 Time:13:10:8 Price:301.96 Time:13:15:28 Price:302.01 Time:13:20:29 Price:302.04 Time:13:25:8 Price:301.96 Time:13:30:20 Price:301.96 Time:13:35:19 Price:302.10 Time:13:40:28 Price:302.27 Time:13:45:20 Price:302.24 Time:13:50:8 Price:302.21 Time:13:55:8 Price:302.19 Time:14:0:8 Price:302.16

After an interesting class of helping students install Jupyter Notebook and try to get some basic web automation up and running with selenium and chromedriver, I realized there were some common pitfalls with easy (or some not so easy fixes).

When you run code in Python, you will sometimes (in my case, often) get an error. Since Python is a package based language, the error will sometimes be long and complicated. The most important thing to look for is right at the end, which refers to the line of code that generates the error.

So, for example, if you try to copy and run the code in the first Webscraping tutorial, the first error you will receive is:

This is a result of the quotation marks on this website being much fancier than those Python can handle. Essentially, all quotation marks should be non-directional so ‘ and ” instead of ‘ and ” and ″ and “ and ”. Replace directional quotations with non-directional ones.

The next error you will likely receive is:

This simply says you need a package (or module) called selenium installed. On a Windows machine, this is done by opening the Anaconda Prompt (Start->Anacoda3->Anaconda Prompt) and typing in the following: pip install selenium <enter>

this should be followed by an installation taking place and some text indicating success. Something that looks like this.

If you use a Mac, you can do the same thing by opening up a terminal window and typing in the same thing.

The next error you might receive is one involving chromedriver. If might say Chromedriver is not in PATH or perhaps Chromedriver is not compatible with your version of chrome. On a PC, the first error is fixed by putting a copy of chromedriver.exe (not the zip file, and not a shortcut) in the same folder as your Python notebook. If you don’t know where your Python notebooks are in your directory structure, you can search for ipynb files in your computer. Jupyter notebook files have the extension *.ipynb so thye should be quite easy to find.

On a Mac the first error is fixed by adding the folder with Chromedriver to the system PATH (see instructions here and follow the 3rd set of instructions, adding a directory to PATH for all users, forever) . For more information on what PATH is, check out the delightful wiki on the subject.

Finally, the last error you will likely get will be:

This is a cryptic error and simply means that it could not find the snippet of text the re.search command was looking for. That’s because Yahoo often changes the source code and the tag number changes from 35 to something else. AS of the time of writing ,it is 52. With that final fix, the code should be able to run.

Notice the last line I added: print(found) – without this line, the code would run, but would not do anything. The final line generates feedback to indicate success! The price of SPY at the time of running was $299.35.

So… what can we do with this? Well, we can write a small loop to get the price of SPY every few minutes. More on that in a bit….

Monthly Archives: September 2019

Looping and scraping

Webscraping with Python 2