Extract Links From A HTML File With PHP

Note: This post is over two years old and so the information contained here might be out of date. If you do spot something please leave a comment and we will endeavour to correct.

6th March 2008 - 2 minutes read time

Use the following function to extract all of the links from a HTML string.

function linkExtractor($html)
{
 $linkArray = array();
 if(preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/a>/i', $html, $matches, PREG_SET_ORDER)){
  foreach ($matches as $match) {
   array_push($linkArray, array($match[1], $match[2]));
  }
 }
 return $linkArray;
}

To use it just read a web page or file into a string, and pass that string to the function. The following example reads a web page using the PHP CURL functions and then passes the result into the function to retrieve the links.

$url = 'http://www.hashbangcode.com';	
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12');
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,0);
curl_setopt($ch,CURLOPT_TIMEOUT,120);
$html = curl_exec($ch);
curl_close($ch);
echo '<pre>' . print_r(linkExtractor($html), true) . '<pre>';

The function will return an array, with each element being an array containing the link location and the text that the link contains.

PHP

Comments

Cool Scripts.

Submitted by Mark James on Thu, 09/04/2008 - 09:27

Permalink

Hi Philip, I have similar script:

\n");
PRINT("\n");
WHILE(!FEOF($page)) { $line = FGETS($page, 255); WHILE(EREGI("HREF=\"[^\"]*\"", $line, $match)) { PRINT(""); PRINT($match[0]); PRINT("
\n"); $replace = EREG_REPLACE("\?", "\?", $match[0]); $line = EREG_REPLACE($replace, "", $line); }
}
PRINT("\n");
FCLOSE($page);
?>

How do I get links only with .zip extension and without href=" and in the end of each line " as well?

Submitted by Bilal on Sun, 05/10/2015 - 21:24

Permalink

Add new comment

Related Content

A Look At Flood Fill Algorithms In PHP

8th June 2025

If you have ever used a paint program then you might have used a flood fill algorithm. This is a mechanism by which an area of an image can be filled with a different colour and is normally depicted by a pain can pouring paint.

A Look At Benford's Law

11th May 2025

Benford's Law is an interesting heuristic in data analysis. It states that in any large collection of numbers that are created naturally, you should expect to see numbers starting with the number 1 about 30% of the time. The frequency distribution of numbers states that 2 should appear about 17% of the time, down to 9 being seen just 5% of the time.

Protecting A Page From Being Directly Accessed With PHP

27th April 2025

I was thinking recently about the number of ways in which I could restrict access to a page using PHP.

The obvious option is to create a user authentication system, but in some situations that is overkill for what is required. If you just want to prevent users from going directly to a certain page then there are a few options open to you.

Generating Colour Palettes From Images In PHP

4th August 2024

A common web design pattern is to incorporate an image into the design of the page. This creates a tighter integration with the image and the rest of the page.

The main issue in designing a page around the image is that the colours of the page must match the image. Otherwise this creates a dissonance between the image and the styles of the site.

Validating XML Files With XML Schema Definitions In PHP

21st July 2024

XML is a useful format for configuration, data storage, and transmitting data from one system to another. As a human readable format that can be easily read by machines it quickly gained favor in lots of different systems as a mechanism for data storage.

Creating A Character Bitmap In PHP

23rd June 2024

I was watching a video from NDC by Dylan Beattie the other day on The Story of Typography and a particular section stood out to me.