加载 svmlight 格式错误

2022-01-15 00:00:00 python format import load svmlight

问题描述

当我尝试将 svmlight python 包 与我已转换为 svmlight 格式的数据一起使用时我得到一个错误.它应该是非常基本的,我不明白发生了什么.代码如下:

When I try to use the svmlight python package with data I already converted to svmlight format I get an error. It should be pretty basic, I don't understand what's happening. Here's the code:

import svmlight
training_data = open('thedata', "w")
model=svmlight.learn(training_data, type='classification', verbosity=0)

我也试过了:

training_data = numpy.load('thedata')

training_data = __import__('thedata')


解决方案

一个明显的问题是您在打开数据文件时会截断它,因为您指定了写入模式 "w".这意味着将没有要读取的数据.

One obvious problem is that you are truncating your data file when you open it because you are specifying write mode "w". This means that there will be no data to read.

无论如何,如果您的数据文件类似于此 example,因为是python文件,所以需要导入.这应该有效:

Anyway, you don't need to read the file like that if your data file is like the one in this example, you need to import it because it is a python file. This should work:

import svmlight
from data import train0 as training_data    # assuming your data file is named data.py
# or you could use __import__()
#training_data = __import__('data').train0

model = svmlight.learn(training_data, type='classification', verbosity=0)

您可能希望将您的数据与示例的数据进行比较.

You might want to compare your data against that of the example.

数据文件格式明确后编辑

输入文件需要被解析成这样的元组列表:

The input file needs to be parsed into a list of tuples like this:

[(target, [(feature_1, value_1), (feature_2, value_2), ... (feature_n, value_n)]),
 (target, [(feature_1, value_1), (feature_2, value_2), ... (feature_n, value_n)]),
 ...
]

svmlight 包似乎不支持读取 SVM 文件格式的文件,并且没有任何解析功能,因此必须在 Python 中实现.SVM 文件如下所示:

The svmlight package does not appear to support reading from a file in the SVM file format, and there aren't any parsing functions, so it will have to be implemented in Python. SVM files look like this:

<target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info>

所以这里有一个解析器,可以将文件格式转换为 svmlight 包所需的格式:

so here is a parser that converts from the file format to that required by the svmlight package:

def svm_parse(filename):

    def _convert(t):
        """Convert feature and value to appropriate types"""
        return (int(t[0]), float(t[1]))

    with open(filename) as f:
        for line in f:
            line = line.strip()
            if not line.startswith('#'):
                line = line.split('#')[0].strip() # remove any trailing comment
                data = line.split()
                target = float(data[0])
                features = [_convert(feature.split(':')) for feature in data[1:]]
                yield (target, features)

你可以这样使用它:

import svmlight

training_data = list(svm_parse('thedata'))
model=svmlight.learn(training_data, type='classification', verbosity=0)

相关文章