We all have tried getting data from a website in many ways. In this article, we will learn how to web scrape using bots to extract content and data from a website.
We will use PHP cURL to scrape a web page, it looks like a typo from leaving caps lock on, but that’s really how you write it. cURL is the system used to make HTTP requests with PHP. It is the way of calling web pages from within your script.
The “Scrape Data, Not Content” this statement is for all the people who want to learn web scraping. cURL and web scraping are powerful tools that can be used to automate what would otherwise be somewhat long and tedious repetitive tasks. We should only scrape information, not full articles and content.
Example: The following example demonstrates the scraping of images from the article https://www.geeksforgeeks.org/matlab-data-types/
PHP
<?php // Initialize curl $ch = curl_init(); // URL for Scraping curl_setopt( $ch , CURLOPT_URL, // Return Transfer True curl_setopt( $ch , CURLOPT_RETURNTRANSFER, true); $output = curl_exec( $ch ); // Closing cURL curl_close( $ch ); // For web page display echo '<head>' ; echo '<meta http-equiv= "content-type" content= "text/html; charset=utf-8" />'; echo '</head>' ; echo '<body>' ; echo '<h1>Web Scraping using cURL</h1>' ; // Checking for images preg_match_all( '!https://media.geeksforgeeks.org/wp-content/uploads/(.*)/(.*).png!' , $output , $data ); foreach ( $data [0] as $list ) { echo "<img src='$list'/>" ; } echo '</body>' ; ?> |
Output: