如何在 PHP 中使用 RegexIterator

2022-01-10 00:00:00 regex iterator php spl

我还没有找到一个很好的例子来说明如何使用 php RegexIterator 递归遍历一个目录.

I have yet to find a good example of how to use the php RegexIterator to recursively traverse a directory.

最终结果是我想指定一个目录并在其中查找所有具有某些给定扩展名的文件.例如,仅说 html/php 扩展.此外,我想过滤掉 .Trash-0、.Trash-500 等类型的文件夹.

The end result would be I want to specify a directory and find all files in it with some given extensions. Say for example only html/php extensions. Furthermore, I want to filter out folders such of the type .Trash-0, .Trash-500 etc.

<?php 
$Directory = new RecursiveDirectoryIterator("/var/www/dev/");
$It = new RecursiveIteratorIterator($Directory);
$Regex = new RegexIterator($It,'/^.+.php$/i',RecursiveRegexIterator::GET_MATCH);

foreach($Regex as $v){
    echo $value."<br/>";
}
?>

到目前为止,我所拥有的结果是:致命错误:未捕获的异常UnexpectedValueException",消息为RecursiveDirectoryIterator::__construct(/media/hdmovies1/.Trash-0)"

Is what I have so far but result in : Fatal error: Uncaught exception 'UnexpectedValueException' with message 'RecursiveDirectoryIterator::__construct(/media/hdmovies1/.Trash-0)

有什么建议吗?

推荐答案

这样的事情有几种不同的方法,我将提供两种快速方法供您选择:快速和肮脏,与更长并且不那么脏(不过,这是星期五晚上,所以我们可以有点疯狂).

There are a couple of different ways of going about something like this, I'll give two quick approaches for you to choose from: quick and dirty, versus longer and less dirty (though, it's a Friday night so we're allowed to go a little bit crazy).

1.快速(又脏)

这涉及只需编写一个正则表达式(可以拆分为多个),以便快速过滤文件集合.

This involves just writing a regular expression (could be split into multiple) to use to filter the collection of files in one quick swoop.

(只有两行注释对这个概念非常重要.)

$directory = new RecursiveDirectoryIterator(__DIR__);
$flattened = new RecursiveIteratorIterator($directory);

// Make sure the path does not contain "/.Trash*" folders and ends eith a .php or .html file
$files = new RegexIterator($flattened, '#^(?:[A-Z]:)?(?:/(?!.Trash)[^/]+)+/[^/]+.(?:php|html)$#Di');

foreach($files as $file) {
    echo $file . PHP_EOL;
}

这种方法存在许多问题,尽管它可以快速实现为单行(尽管正则表达式可能很难破译).

This approach has a number of issues, though it is quick to implement being just a one-liner (though the regex might be a pain to decipher).

<强>2.不那么快(而且不那么脏)

一种更可重用的方法是创建几个定制过滤器(使用正则表达式,或任何你喜欢的!)以将初始 RecursiveDirectoryIterator 中的可用项目列表缩减到只有那些你要的那个.以下只是为您快速编写的一个示例,用于扩展 RecursiveRegexIterator.

A more re-usable approach is to create a couple of bespoke filters (using regex, or whatever you like!) to whittle down the list of available items in the initial RecursiveDirectoryIterator down to only those that you want. The following is only one example, written quickly just for you, of extending the RecursiveRegexIterator.

我们从一个基类开始,它的主要工作是保留我们想要过滤的正则表达式,其他一切都推迟到 RecursiveRegexIterator.请注意,该类是abstract,因为它实际上并没有做 任何有用的事情:实际的过滤将由两个扩展该类的类来完成.此外,它可能被称为 FilesystemRegexFilter 但没有什么迫使它(在这个级别)过滤与文件系统相关的类(如果我不是很困的话,我会选择一个更好的名字).

We start with a base class whose main job is to keep a hold of the regex that we want to filter with, everything else is deferred back to the RecursiveRegexIterator. Note that the class is abstract since it doesn't actually do anything useful: the actual filtering is to be done by the two classes which will extend this one. Also, it may be called FilesystemRegexFilter but there is nothing forcing it (at this level) to filter filesystem-related classes (I'd have chosen a better name, if I weren't quite so sleepy).

abstract class FilesystemRegexFilter extends RecursiveRegexIterator {
    protected $regex;
    public function __construct(RecursiveIterator $it, $regex) {
        $this->regex = $regex;
        parent::__construct($it, $regex);
    }
}

这两个类是非常基本的过滤器,分别作用于文件名和目录名.

These two classes are very basic filters, acting on the file name and directory name respectively.

class FilenameFilter extends FilesystemRegexFilter {
    // Filter files against the regex
    public function accept() {
        return ( ! $this->isFile() || preg_match($this->regex, $this->getFilename()));
    }
}

class DirnameFilter extends FilesystemRegexFilter {
    // Filter directories against the regex
    public function accept() {
        return ( ! $this->isDir() || preg_match($this->regex, $this->getFilename()));
    }
}

为了将这些付诸实践,以下内容递归地遍历脚本所在目录的内容(随意编辑!)并过滤掉 .Trash 文件夹(通过确保该文件夹名称​​匹配特制的正则表达式),并且只接受 PHP 和 HTML 文件.

To put those into practice, the following iterates recursively over the contents of the directory in which the script resides (feel free to edit this!) and filters out the .Trash folders (by making sure that folder names do match the specially crafted regex), and accepting only PHP and HTML files.

$directory = new RecursiveDirectoryIterator(__DIR__);
// Filter out ".Trash*" folders
$filter = new DirnameFilter($directory, '/^(?!.Trash)/');
// Filter PHP/HTML files 
$filter = new FilenameFilter($filter, '/.(?:php|html)$/');

foreach(new RecursiveIteratorIterator($filter) as $file) {
    echo $file . PHP_EOL;
}

特别值得注意的是,由于我们的过滤器是递归的,我们可以选择尝试如何迭代它们.例如,我们可以通过以下方式轻松地将自己限制为仅扫描最多 2 层(包括起始文件夹):

Of particular note is that since our filters are recursive, we can choose to play around with how to iterate over them. For example, we could easily limit ourselves to only scanning up to 2 levels deep (including the starting folder) by doing:

$files = new RecursiveIteratorIterator($filter);
$files->setMaxDepth(1); // Two levels, the parameter is zero-based.
foreach($files as $file) {
    echo $file . PHP_EOL;
}

添加更多过滤器(通过使用不同的正则表达式实例化更多过滤类;或者通过创建新过滤类)以满足更专业的过滤需求(例如文件大小、完整路径长度等)也非常容易.).

It is also super-easy to add yet more filters (by instantiating more of our filtering classes with different regexes; or, by creating new filtering classes) for more specialised filtering needs (e.g. file size, full-path length, etc.).

附注嗯,这个答案有点啰嗦;我试图让它尽可能简洁(甚至删除大量的超级废话).如果最终结果使答案不连贯,请道歉.

相关文章