Error message

The spam filter installed on this site is currently unavailable. Per site policy, we are unable to accept new submissions until that problem is resolved. Please try resubmitting the form in a couple of minutes.

Retrieve a URL's HTML using CURL and Drupal's Cache

Category: 

Need to get the HTML output from a URL and place it in Drupal's cache? Well then, you may do something like this:

function tf_crawl_url ($url) {
  // see if we have this url cached already, if we do pull the html from cache,
  // if we don't, then curl the url and store it in cache
  $html;
  $cache_key = $url;
  $cache = cache_get($cache_key);
  if ($cache) {
    drupal_set_message("Grabbed $url from cache.");
    $html = $cache->data;
  }
  else {
    $curl_handle = curl_init();
    curl_setopt($curl_handle,CURLOPT_URL,$url);
    curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2);
    curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);
    $html = curl_exec($curl_handle);
    curl_close($curl_handle);
    cache_set($cache_key,$html,"cache");
    drupal_set_message("Curled $url and placed in cache.");
  }
  return $html;
}

Example usage:

$html = tf_crawl_url("http://www.drupal.org");
print $html;

Add new comment