{"id":256,"date":"2019-09-12T19:03:07","date_gmt":"2019-09-12T19:03:07","guid":{"rendered":"http:\/\/www.sugata.in\/?p=256"},"modified":"2019-10-14T12:47:08","modified_gmt":"2019-10-14T12:47:08","slug":"looping-and-scraping","status":"publish","type":"post","link":"http:\/\/www.sugata.in\/index.php\/2019\/09\/12\/looping-and-scraping\/","title":{"rendered":"Looping and scraping"},"content":{"rendered":"<p>In the previous posts, I covered how to scrape some data (like a stock price) from a website. To get a workable dataset, we can write some code to continually loop, and collect that same data at a fixed interval.<\/p>\n<p>The code below does this. A few points. (1) Python uses indentation as part of the syntax. After starting a loop (the while 1==1: statement below) or a conditional (the if XXX==YYY statement below), everything you want looping or conditionally done has to be indented. (2) the while 1==1 line simply says keep doing this &#8230; forever. Since 1 will always be equal to 1. and (3) the if statement below checks if the current minute is divisible by 5 and runs the scraping code if it is. You can change the interval by changing 5 to another number, or using the now.second or now.hour numbers.<\/p>\n<blockquote><p>from selenium import webdriver<br \/>\nimport datetime<br \/>\nimport time<br \/>\nfrom multiprocessing import Pool,TimeoutError<br \/>\nimport urllib.request<br \/>\nimport re<br \/>\nfrom urllib.error import URLError, HTTPError<\/p>\n<p>while 1==1:<br \/>\nnow = datetime.datetime.now()<br \/>\nif now.minute\/5 == int(now.minute\/5):<br \/>\ndriverspy = webdriver.Chrome()<br \/>\ndriverspy.get(&#8216;https:\/\/finance.yahoo.com\/quote\/SPY?p=SPY&#8217;)<br \/>\nsourcespy = driverspy.page_source<br \/>\nnow = datetime.datetime.now()<br \/>\nfound = re.search(&#8216;&#8221;52&#8243;&gt;(\\d+\\.\\d+)&lt;\/span&gt;&#8217;, sourcespy).group(1)<br \/>\nprint(&#8220;Time:&#8221;+str(now.hour)+&#8221;:&#8221;+str(now.minute)+&#8221;:&#8221;+str(now.second)+&#8221; Price:&#8221;+str(found))<br \/>\ntime.sleep(75)<br \/>\ndriverspy.quit()<\/p><\/blockquote>\n<p>While the code runs, you&#8217;ll get output that looks like the following. You can then either copy paste this to a CSV file or use Python code to export it in order to start building a dataset.<\/p>\n<pre>Time:12:15:20 Price:302.10\r\nTime:12:20:8 Price:302.08\r\nTime:12:25:19 Price:302.05\r\nTime:12:30:20 Price:302.07\r\nTime:12:35:9 Price:302.17\r\nTime:12:40:9 Price:302.09\r\nTime:12:45:28 Price:302.22\r\nTime:12:50:28 Price:302.24\r\nTime:12:55:16 Price:302.26\r\nTime:13:0:8 Price:302.18\r\nTime:13:5:9 Price:302.01\r\nTime:13:10:8 Price:301.96\r\nTime:13:15:28 Price:302.01\r\nTime:13:20:29 Price:302.04\r\nTime:13:25:8 Price:301.96\r\nTime:13:30:20 Price:301.96\r\nTime:13:35:19 Price:302.10\r\nTime:13:40:28 Price:302.27\r\nTime:13:45:20 Price:302.24\r\nTime:13:50:8 Price:302.21\r\nTime:13:55:8 Price:302.19\r\nTime:14:0:8 Price:302.16<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In the previous posts, I covered how to scrape some data (like a stock price) from a website. To get a workable dataset, we can write some code to continually loop, and collect that same data at a fixed interval. The code below does this. A few points. (1) Python uses indentation as part of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[9,10],"tags":[],"_links":{"self":[{"href":"http:\/\/www.sugata.in\/index.php\/wp-json\/wp\/v2\/posts\/256"}],"collection":[{"href":"http:\/\/www.sugata.in\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.sugata.in\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.sugata.in\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.sugata.in\/index.php\/wp-json\/wp\/v2\/comments?post=256"}],"version-history":[{"count":3,"href":"http:\/\/www.sugata.in\/index.php\/wp-json\/wp\/v2\/posts\/256\/revisions"}],"predecessor-version":[{"id":261,"href":"http:\/\/www.sugata.in\/index.php\/wp-json\/wp\/v2\/posts\/256\/revisions\/261"}],"wp:attachment":[{"href":"http:\/\/www.sugata.in\/index.php\/wp-json\/wp\/v2\/media?parent=256"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.sugata.in\/index.php\/wp-json\/wp\/v2\/categories?post=256"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.sugata.in\/index.php\/wp-json\/wp\/v2\/tags?post=256"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}