使大型处理作业变小
这是我在寻找解决方案时使用的代码.
This is the code I'm using as I work my way to a solution.
public function indexAction()
{
//id3 options
$options = array("version" => 3.0, "encoding" => Zend_Media_Id3_Encoding::ISO88591, "compat" => true);
//path to collection
$path = APPLICATION_PATH . '/../public/Media/Music/';//Currently Approx 2000 files
//inner iterator
$dir = new RecursiveDirectoryIterator($path, RecursiveDirectoryIterator::SKIP_DOTS);
//iterator
$iterator = new RecursiveIteratorIterator($dir, RecursiveIteratorIterator::SELF_FIRST);
foreach ($iterator as $file) {
if (!$file->isDir() && $file->getExtension() === 'mp3') {
//real path to mp3 file
$filePath = $file->getRealPath();
Zend_Debug::dump($filePath);//current results: accepted path no errors
$id3 = new Zend_Media_Id3v2($filePath, $options);
foreach ($id3->getFramesByIdentifier("T*") as $frame) {
$data[$frame->identifier] = $frame->text;
}
Zend_Debug::dump($data);//currently can scan the whole collection without timing out, but APIC data not being processed.
}
}
}
问题: 处理多个目录中的 mp3 文件的文件系统.将 id3 标签数据提取到数据库(3 个表)中,并将标签中的封面图片提取到单独的文件中.
The problem: Process a file system of mp3 files in multiple directories. Extract id3 tag data to a database (3 tables) and extract the cover image from the tag to a separate file.
我可以处理实际的提取和数据处理.我的问题是输出.
I can handle the actual extraction and data handling. My issue is with output.
通过 Zend Framework 1.x 处理输出缓冲的方式,输出文件正在处理的指示符很困难.在没有输出缓冲的旧式 PHP 脚本中,您可以在循环的每次迭代中打印出一些 html 并获得一些进度指示.
With the way that Zend Framework 1.x handles output buffering, outputting an indicator that the files are being processed is difficult. In an old style PHP script, without output buffering, you could print out a bit of html with every iteration of the loop and have some indication of progress.
我希望能够处理每个专辑的目录,输出结果,然后继续处理下一个专辑的目录.仅需要用户对某些错误进行干预.
I would like to be able to process each album's directory, output the results and then continue on to the next album's directory. Only requiring user intervention on certain errors.
任何帮助将不胜感激.
Javascript 不是我正在寻找的解决方案.我觉得这在 PHP 和 ZF 1 MVC 的构造中应该是可能的.
Javascript is not the solution I'm looking for. I feel that this should be possible within the constructs of PHP and a ZF 1 MVC.
我这样做主要是为了我自己的启蒙,这似乎是学习一些重要概念的好方法.
好的,关于如何将其分解为更小的块的一些想法如何.处理一个块,提交,处理下一个块,诸如此类.进出采埃孚.
Ok, how about some ideas on how to break this down into smaller chunks. Process one chunk, commit, process next chunk, kind of thing. In or out of ZF.
我开始看到我正在努力完成的工作的问题.似乎输出缓冲不仅仅发生在 ZF 中,它发生在从 ZF 一直到浏览器的任何地方.嗯嗯...
I'm beginning to see the problem with what I'm trying to accomplish. It seems that output buffering is not just happening in ZF, it's happening everywhere from ZF all the way to the browser. Hmmmmm...
推荐答案
简介
这是你不应该做什么的典型例子,因为
您正在尝试使用 PHP 解析
ID3 标签
,这很慢,并且尝试一次拥有多个解析文件肯定会使速度更慢
You are trying to parse
ID3 tag
with PHP which is slow and trying to have multiple parse files at once would definitely make it even slower
RecursiveDirectoryIterator
将加载文件夹和子文件夹中的所有文件,我认为没有限制..它可以是 2,000
今天 100,000
第二天?总处理时间是不可预测的,在某些情况下这肯定需要几个小时
RecursiveDirectoryIterator
would load all the files in a folder and sub folder from what i see there is no limit .. it can be 2,000
today the 100,000
the next day ? Total processing time is unpredictable and this can definitely take some hours in some cases
对单个文件系统的高度依赖,根据您当前的架构,文件存储在本地系统中,因此很难拆分文件并进行适当的负载平衡
High dependence on single file system, with your current architecture the files are stored in local system so it would be difficult to split the files and do proper load balancing
您没有检查之前是否提取过文件信息,这会导致循环和提取重复
You are not checking if the file information has been extracted before and this results Loop and extraction Duplication
没有锁定系统
..这意味着这个过程可以同时启动,导致服务器的整体性能下降
No locking system
.. this means that this process can be initiated simultaneously resulting to general slow performance on the server
我的建议是不要使用 loop
或 RecursiveDirectoryIterator
来批量处理文件.
My advice is not to use loop
or RecursiveDirectoryIterator
to process the files in bulk.
在文件上传或传输到服务器后立即将其定位.这样您一次只能处理一个文件,这样可以分散处理时间.
Target the file as soon as they are uploaded or transferred to the server. That way you are only working with one file at a time this way to can spread the processing time.
您的问题正是 Job Queue 的设计目的您也不仅限于使用 PHP
实现解析......您可以利用 C
或 C++
以提高性能
Your problem is exactly what Job Queue are designed to do you are also not limited to implementing the parsing with PHP
.. you take advantage of C
or C++
for performance
优势
- 将作业转移到更适合工作的其他机器或流程
- 它允许您并行工作,以负载平衡处理
- 通过异步运行耗时的任务来减少大容量 Web 应用程序中页面查看的延迟
PHP
中的多语言客户端C
中的服务器
- Transfer Jobs to other machines or processes that are better suited to do the work
- It allows you to do work in parallel, to load balance processing
- Reduce the latency of page views in high-volume web applications by running time-consuming tasks asynchronously
- Multiple Languages client in
PHP
sever inC
示例已经测试
- ZemoMQ
- Gearman
- Beanstalkd
预期流程客户
- 连接到作业队列,例如德语
- 连接到数据库,例如 MongoDB 或 Redis
- 循环文件夹路径
- 检查文件扩展名
- 如果文件是 mp3 ,则生成文件哈希,例如.sha1_file
- 检查文件是否已发送进行处理
- 将哈希、文件发送到作业服务器
预期进程服务器
- 连接到作业队列,例如德语
- 连接到数据库,例如 MongoDB 或 Redis
- 接收哈希/文件
- 提取 ID3 标签;
- 使用 ID3 标签信息更新数据库
最后这个处理可以在多个服务器上并行完成
Finally this processing can be done on multiple servers in parallel
相关文章