Support Vector Machine
Overview
SVMlight is an implementation of Support Vector Machines (SVMs) in C. The main features of the program are the following:
- fast optimization algorithm
- working set selection based on steepest feasible descent
- “shrinking” heuristic
- caching of kernel evaluations
- use of folding in the linear case
- solves classification and regression problems. For multivariate and structured outputs use SVMstruct.
- solves ranking problems (e. g. learning retrieval functions in STRIVER search engine).
- computes XiAlpha-estimates of the error rate, the precision, and the recall
- efficiently computes Leave-One-Out estimates of the error rate, the precision, and the recall
- includes algorithm for approximately training large transductive SVMs (TSVMs) (see also Spectral Graph Transducer)
- can train SVMs with cost models and example dependent costs
- allows restarts from specified vector of dual variables
- handles many thousands of support vectors
- handles several hundred-thousands of training examples
- supports standard kernel functions and lets you define your own
- uses sparse vector representation
Description
SVMlight is an implementation of Vapnik’s Support Vector Machine [Vapnik, 1995] for the problem of pattern recognition, for the problem of regression, and for the problem of learning a ranking function. The optimization algorithms used in SVMlight are described in [Joachims, 2002a ]. [Joachims, 1999a]. The algorithm has scalable memory requirements and can handle problems with many thousands of support vectors efficiently.
The software also provides methods for assessing the generalization performance efficiently. It includes two efficient estimation methods for both error rate and precision/recall. XiAlpha-estimates [Joachims, 2002a, Joachims, 2000b] can be computed at essentially no computational expense, but they are conservatively biased. Almost unbiased estimates provides leave-one-out testing. SVMlight exploits that the results of most leave-one-outs (often more than 99%) are predetermined and need not be computed [Joachims, 2002a].
New in this version is an algorithm for learning ranking functions [Joachims, 2002c]. The goal is to learn a function from preference examples, so that it orders a new set of objects as accurately as possible. Such ranking problems naturally occur in applications like search engines and recommender systems.
Futhermore, this version includes an algorithm for training large-scale transductive SVMs. The algorithm proceeds by solving a sequence of optimization problems lower-bounding the solution using a form of local search. A detailed description of the algorithm can be found in [Joachims, 1999c]. A similar transductive learner, which can be thought of as a transductive version of k-Nearest Neighbor is the Spectral Graph Transducer.
SVMlight can also train SVMs with cost models (see [Morik et al., 1999]).
The code has been used on a large range of problems, including text classification [Joachims, 1999c][Joachims, 1998a], image recognition tasks, bioinformatics and medical applications. Many tasks have the property of sparse instance vectors. This implementation makes use of this property which leads to a very compact and efficient representation.
Other SVM Resources
- GMD-First Berlin
- Kernel-Machines Web Site
- Bell Labs
- Microsoft Research
- Royal Holloway College
- ANU Canberra
- MIT
- Bristol CI-Group
source: http://svmlight.joachims.org/