如何从 Qt 中的大文件异步加载数据?
我正在使用 Qt 5.2.1 来实现一个程序,该程序从文件中读取数据(可能是几个字节到几 GB)并以依赖于每个字节的方式可视化该数据.我的示例是十六进制查看器.
I'm using Qt 5.2.1 to implement a program that reads in data from a file (could be a few bytes to a few GB) and visualises that data in a way that's dependent on every byte. My example here is a hex viewer.
一个对象进行读取,并在读取新数据块时发出信号 dataRead()
.该信号携带一个指向 QByteArray
的指针,如下所示:
One object does the reading, and emits a signal dataRead()
when it's read a new block of data. The signal carries a pointer to a QByteArray
like so:
void FileReader::startReading()
{
/* Object state code here... */
{
QFile inFile(fileName);
if (!inFile.open(QIODevice::ReadOnly))
{
changeState(STARTED, State(ERROR, QString()));
return;
}
while(!inFile.atEnd())
{
QByteArray *qa = new QByteArray(inFile.read(DATA_SIZE));
qDebug() << "emitting dataRead()";
emit dataRead(qa);
}
}
/* Emit EOF signal */
}
查看器有它的 loadData
槽连接到这个信号,这是显示数据的函数:
The viewer has its loadData
slot connected to this signal, and this is the function that displays the data:
void HexViewer::loadData(QByteArray *data)
{
QString hexString = data->toHex();
for (int i = 0; i < hexString.length(); i+=2)
{
_ui->hexTextView->insertPlainText(hexString.at(i));
_ui->hexTextView->insertPlainText(hexString.at(i+1));
_ui->hexTextView->insertPlainText(" ");
}
delete data;
}
第一个问题是,如果这只是按原样运行,GUI 线程将完全没有响应.所有 dataRead()
信号都将在 GUI 重新绘制之前发出.
The first problem is that if this is just run as-is, the GUI thread will become completely unresponsive. All of the dataRead()
signals will be emitted before the GUI is ever redrawn.
(完整代码可以运行,当你使用的文件大于大约1kB,你会看到这种行为.)
(The full code can be run, and when you use a file bigger than about 1kB, you will see this behaviour.)
根据对我论坛帖子的回复 Qt5 中的非阻塞本地文件 IO 以及另一个堆栈溢出问题的答案 如何在qt中做异步文件io?,答案是:使用线程.但是,这些答案都没有详细说明如何重新整理数据本身,也没有详细说明如何避免常见错误和陷阱.
Going by the response to my forum post Non-blocking local file IO in Qt5 and the answer to another Stack Overflow question How to do async file io in qt?, the answer is: use threads. But neither of these answers go into any detail as to how to shuffle the data itself around, nor how to avoid common errors and pitfalls.
如果数据很小(大约一百字节),我只会用信号发出它.但如果文件大小为 GB(编辑)或者文件位于基于网络的文件系统,例如.NFS,Samba共享,我不希望UI因为读取文件块而锁定.
If the data was small (of the order of a hundred bytes) I'd just emit it with the signal. But in the case the file is GB in size (edit) or if the file is on a network-based filesystem eg. NFS, Samba share, I don't want the UI to lock up just because reading the file blocks.
第二个问题是在发射器中使用 new
和在接收器中使用 delete
的机制似乎有点天真:我有效地将整个堆用作跨线程队列.
The second problem is that the mechanics of using new
in the emitter and delete
in the receiver seems a bit naive: I'm effectively using the entire heap as a cross-thread queue.
问题 1: Qt 是否有更好/惯用的方法来跨线程移动数据,同时限制内存消耗?它是否有一个线程安全队列或其他可以简化整个事情的结构?
Question 1: Does Qt have a better/idiomatic way to move data across threads while limiting memory consumption? Does it have a thread safe queue or other structures that can simplify this whole thing?
问题 2:我必须自己实现线程等吗?我不太喜欢重新发明轮子,尤其在内存管理和线程方面.是否有更高级别的结构已经可以做到这一点,比如网络传输?
Question 2: Does I have to implement the threading etc. myself? I'm not a huge fan of reinventing wheels, especially regarding memory management and threading. Are there higher level constructs that can already do this, like there are for network transport?
推荐答案
首先,您的应用程序中根本没有任何多线程.你的 FileReader
类是 QThread
的子类,但这并不意味着所有的 FileReader
方法都会在另一个线程中执行.实际上,您的所有操作都是在主(GUI)线程中执行的.
First of all, you don't have any multithreading in your app at all. Your FileReader
class is a subclass of QThread
, but it does not mean that all FileReader
methods will be executed in another thread. In fact, all your operations are performed in the main (GUI) thread.
FileReader
应该是一个 QObject
而不是 QThread
子类.然后你创建一个基本的 QThread
对象并使用 QObject::moveToThread
将你的工作者(读者)移动到它.您可以在此处阅读有关此技术的信息.
FileReader
should be a QObject
and not a QThread
subclass. Then you create a basic QThread
object and move your worker (reader) to it using QObject::moveToThread
. You can read about this technique here.
确保您已使用 qRegisterMetaType
注册了 FileReader::State
类型.这是 Qt 信号槽连接跨不同线程工作所必需的.
Make sure you have registered FileReader::State
type using qRegisterMetaType
. This is necessary for Qt signal-slot connections to work across different threads.
示例:
HexViewer::HexViewer(QWidget *parent) :
QMainWindow(parent),
_ui(new Ui::HexViewer),
_fileReader(new FileReader())
{
qRegisterMetaType<FileReader::State>("FileReader::State");
QThread *readerThread = new QThread(this);
readerThread->setObjectName("ReaderThread");
connect(readerThread, SIGNAL(finished()),
_fileReader, SLOT(deleteLater()));
_fileReader->moveToThread(readerThread);
readerThread->start();
_ui->setupUi(this);
...
}
void HexViewer::on_quitButton_clicked()
{
_fileReader->thread()->quit();
_fileReader->thread()->wait();
qApp->quit();
}
这里也没有必要在堆上分配数据:
Also it is not necessary to allocate data on the heap here:
while(!inFile.atEnd())
{
QByteArray *qa = new QByteArray(inFile.read(DATA_SIZE));
qDebug() << "emitting dataRead()";
emit dataRead(qa);
}
QByteArray
使用 隐式共享.这意味着当您以只读模式跨函数传递 QByteArray
对象时,它的内容不会一次又一次地复制.
QByteArray
uses implicit sharing. It means that its contents are not copied again and again when you pass a QByteArray
object across functions in a read-only mode.
将上面的代码改成这样,忘记手动内存管理:
Change the code above to this and forget about manual memory management:
while(!inFile.atEnd())
{
QByteArray qa = inFile.read(DATA_SIZE);
qDebug() << "emitting dataRead()";
emit dataRead(qa);
}
但无论如何,主要问题不在于多线程.问题是 QTextEdit::insertPlainText
操作并不便宜,尤其是当您有大量数据时.FileReader
非常快速地读取文件数据,然后用要显示的新数据部分填充您的小部件.
But anyway, the main problem is not with multithreading. The problem is that QText:insertPlainText
operation is not cheap, especially when you have a huge amount of data. FileReader
reads file data pretty quickly and then floods your widget with new portions of data to display.
必须注意的是,您对 HexViewer::loadData
的实现非常无效.您逐个字符插入文本数据,这使得 QTextEdit
不断重绘其内容并冻结 GUI.
It must be noted that you have a very ineffectual implementation of HexViewer::loadData
. You insert text data char by char which makes QTextEdit
constantly redraw its contents and freezes the GUI.
您应该首先准备生成的十六进制字符串(注意数据参数不再是指针):
You should prepare the resulting hex string first (note that data parameter is not a pointer anymore):
void HexViewer::loadData(QByteArray data)
{
QString tmp = data.toHex();
QString hexString;
hexString.reserve(tmp.size() * 1.5);
const int hexLen = 2;
for (int i = 0; i < tmp.size(); i += hexLen)
{
hexString.append(tmp.mid(i, hexLen) + " ");
}
_ui->hexTextView->insertPlainText(hexString);
}
无论如何,您的应用程序的瓶颈不是文件读取,而是QTextEdit
更新.按块加载数据,然后使用 QTextEdit::insertPlainText
将其附加到小部件不会加快任何速度.对于小于 1Mb 的文件,一次读取整个文件然后一步将结果文本设置到小部件会更快.
Anyway, the bottleneck of your application is not file reading but QTextEdit
updating. Loading data by chunks and then appending it to the widget using QText:insertPlainText
will not speed up anything. For files less than 1Mb it is faster to read the whole file at once and then set the resulting text to the widget in a single step.
我想您无法使用默认的 Qt 小部件轻松显示大于几兆字节的大文本.此任务需要一些重要的方法,通常与多线程或异步数据加载无关.这一切都是关于创建一些不会尝试一次显示其庞大内容的棘手小部件.
I suppose you can't easily display huge texts larger than several megabytes using default Qt widgets. This task requires some non-trivial approch that in general has nothing to do with multithreading or asynchronous data loading. It's all about creating some tricky widget which won't try to display its huge contents at once.
相关文章