Laravel 爬虫_curl 爬虫

在现代Web开发中，数据抓取（爬虫）是一项常见的需求。Laravel 是一个非常流行的 PHP 框架，可以方便地集成各种工具和技术来实现爬虫功能。介绍如何使用 Laravel 和 cURL 来实现简单的网页爬虫，并提供多种思路和示例代码。

解决方案

我们将使用 Laravel 的 Artisan 命令来创建一个命令行任务，该任务将使用 cURL 库从目标网站抓取数据。我们将展示如何发送 HTTP 请求、处理响应以及保存数据到数据库中。其他一些常用的爬虫技术，如使用 Guzzle HTTP 客户端和 Symfony 的 DomCrawler 组件。

使用 cURL 实现爬虫

创建 Artisan 命令

我们需要创建一个 Artisan 命令来执行爬虫任务。打开终端，运行以下命令：

bash php artisan make:command FetchData

这将在 app/Console/Commands 目录下生成一个 FetchData.php 文件。打开该文件并编辑如下：

php
<?php</p>

<p>namespace AppConsoleCommands;</p>

<p>use IlluminateConsoleCommand;
use IlluminateSupportFacadesDB;</p>

<p>class FetchData extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'fetch:data';</p>

<pre><code>/**
 * The console command description.
 *
 * @var string
 */
protected $description = 'Fetch data from a website using cURL';

/**
 * Execute the console command.
 *
 * @return int
 */
public function handle()
{
    $url = 'https://example.com';
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $response = curl_exec($ch);
    curl_close($ch);

    if ($response === false) {
        $this->error('Failed to fetch data from ' . $url);
        return 1;
    }

    // 解析响应内容
    $html = new DOMDocument();
    @$html->loadHTML($response);
    $xpath = new DOMXPath($html);

    // 提取所需数据
    $nodes = $xpath->query('//div[@class="item"]');

    foreach ($nodes as $node) {
        $title = $node->getElementsByTagName('h2')->item(0)->nodeValue;
        $content = $node->getElementsByTagName('p')->item(0)->nodeValue;

        // 保存到数据库
        DB::table('data')->insert([
            'title' => $title,
            'content' => $content,
            'created_at' => now(),
            'updated_at' => now(),
        ]);
    }

    $this->info('Data fetched successfully');
    return 0;
}

}

注册命令

接下来，我们需要在 AppConsoleKernel 中注册这个命令。打开 app/Console/Kernel.php 文件，找到 protected $commands 数组，添加以下内容：

php protected $commands = [ AppConsoleCommandsFetchData::class, ];

运行命令

现在，你可以通过以下命令来运行爬虫任务：

bash php artisan fetch:data

使用 Guzzle HTTP 客户端

Guzzle 是一个强大的 PHP HTTP 客户端库，可以简化 HTTP 请求的发送。我们可以通过 Composer 安装 Guzzle：

bash composer require guzzlehttp/guzzle

然后，修改 FetchData 命令的 handle 方法：

php
use GuzzleHttpClient;</p>

<p>public function handle()
{
    $url = 'https://example.com';
    $client = new Client();
    $response = $client->request('GET', $url);</p>

<pre><code>if ($response->getStatusCode() !== 200) {
    $this->error('Failed to fetch data from ' . $url);
    return 1;
}

$html = new DOMDocument();
@$html->loadHTML($response->getBody());
$xpath = new DOMXPath($html);

$nodes = $xpath->query('//div[@class="item"]');

foreach ($nodes as $node) {
    $title = $node->getElementsByTagName('h2')->item(0)->nodeValue;
    $content = $node->getElementsByTagName('p')->item(0)->nodeValue;

    DB::table('data')->insert([
        'title' => $title,
        'content' => $content,
        'created_at' => now(),
        'updated_at' => now(),
    ]);
}

$this->info('Data fetched successfully');
return 0;

}

使用 Symfony DomCrawler

Symfony 的 DomCrawler 组件可以帮助我们更方便地解析 HTML 文档。我们可以通过 Composer 安装该组件：

bash composer require symfony/dom-crawler

然后，修改 FetchData 命令的 handle 方法：

php
use GuzzleHttpClient;
use SymfonyComponentDomCrawlerCrawler;</p>

<p>public function handle()
{
    $url = 'https://example.com';
    $client = new Client();
    $response = $client->request('GET', $url);</p>

<pre><code>if ($response->getStatusCode() !== 200) {
    $this->error('Failed to fetch data from ' . $url);
    return 1;
}

$crawler = new Crawler((string) $response->getBody());

$nodes = $crawler->filter('div.item');

foreach ($nodes as $node) {
    $title = $node->filter('h2')->text();
    $content = $node->filter('p')->text();

    DB::table('data')->insert([
        'title' => $title,
        'content' => $content,
        'created_at' => now(),
        'updated_at' => now(),
    ]);
}

$this->info('Data fetched successfully');
return 0;

}

如何在 Laravel 中使用 cURL、Guzzle 和 Symfony DomCrawler 实现简单的网页爬虫。通过这些工具，我们可以轻松地发送 HTTP 请求、解析 HTML 内容并保存数据到数据库中。能帮助你在实际项目中实现高效的数据抓取。

文章来源网络，作者：运维，如若转载，请注明出处：https://shuyeidc.com/wp/67838.html<

laravel 爬虫_curl 爬虫

Laravel 爬虫_curl 爬虫

解决方案

使用 cURL 实现爬虫

创建 Artisan 命令

注册命令

运行命令

使用 Guzzle HTTP 客户端

使用 Symfony DomCrawler

发表回复

laravel 爬虫_curl 爬虫

Laravel 爬虫_curl 爬虫

解决方案

使用 cURL 实现爬虫

创建 Artisan 命令

注册命令

运行命令

使用 Guzzle HTTP 客户端

使用 Symfony DomCrawler

相关推荐

ORA-41612: scalar values expected for action preferences ORACLE 报错 故障修复 远程处理

ORA-13363: no valid ETYPE in the geometry ORACLE 报错 故障修复 远程处理

优化Redis缓存，降低运维开销（redis 缓存开销）

database MSSQL创建数据库的指导步骤（mssql creat）

对MySQL备份和恢复的具体描述

发表回复

ORA-41612: scalar values expected for action preferences ORACLE 报错故障修复远程处理

ORA-13363: no valid ETYPE in the geometry ORACLE 报错故障修复远程处理