Skip to content
Snippets Groups Projects
Commit a6526dca authored by Apertis package maintainers's avatar Apertis package maintainers
Browse files

Import Upstream version 0.5.0

parents
Branches debian/trixie
Tags debian/0.5.4-1
No related merge requests found
sudo: required
dist: trusty
language: cpp
addons:
apt:
packages:
- libxml++2.6-dev
- libarchive-dev
script:
- autoreconf -fi
- ./configure --disable-static --enable-zhfst
- make -j3
- make -j1 check
AUTHORS 0 → 100644
Authors of HFST ospell
----------------------
This lists authors relevant for copyright issues. See also THANKS.
2012-2017, Erik Axelson <erik.axelson@helsinki.fi>
2010-2016, Sam Hardwick <sam.hardwick@gmail.com>
2015-2017, Tino Didriksen <mail@tinodidriksen.com>
2010-2016, Sjur Nørstebø Moshagen <sjur.n.moshagen@uit.no>
2010-2016, Tommi Pirinen <flammie@iki.fi>
2013, Francis M. Tyers <ftyers@prompsit.com>
COPYING 0 → 100644
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
INSTALL 0 → 100644
Installation Instructions
*************************
Copyright (C) 1994, 1995, 1996, 1999, 2000, 2001, 2002, 2004, 2005,
2006 Free Software Foundation, Inc.
This file is free documentation; the Free Software Foundation gives
unlimited permission to copy, distribute and modify it.
Basic Installation
==================
Briefly, the shell commands `./configure; make; make install' should
configure, build, and install this package. The following
more-detailed instructions are generic; see the `README' file for
instructions specific to this package.
The `configure' shell script attempts to guess correct values for
various system-dependent variables used during compilation. It uses
those values to create a `Makefile' in each directory of the package.
It may also create one or more `.h' files containing system-dependent
definitions. Finally, it creates a shell script `config.status' that
you can run in the future to recreate the current configuration, and a
file `config.log' containing compiler output (useful mainly for
debugging `configure').
It can also use an optional file (typically called `config.cache'
and enabled with `--cache-file=config.cache' or simply `-C') that saves
the results of its tests to speed up reconfiguring. Caching is
disabled by default to prevent problems with accidental use of stale
cache files.
If you need to do unusual things to compile the package, please try
to figure out how `configure' could check whether to do them, and mail
diffs or instructions to the address given in the `README' so they can
be considered for the next release. If you are using the cache, and at
some point `config.cache' contains results you don't want to keep, you
may remove or edit it.
The file `configure.ac' (or `configure.in') is used to create
`configure' by a program called `autoconf'. You need `configure.ac' if
you want to change it or regenerate `configure' using a newer version
of `autoconf'.
The simplest way to compile this package is:
1. `cd' to the directory containing the package's source code and type
`./configure' to configure the package for your system.
Running `configure' might take a while. While running, it prints
some messages telling which features it is checking for.
2. Type `make' to compile the package.
3. Optionally, type `make check' to run any self-tests that come with
the package.
4. Type `make install' to install the programs and any data files and
documentation.
5. You can remove the program binaries and object files from the
source code directory by typing `make clean'. To also remove the
files that `configure' created (so you can compile the package for
a different kind of computer), type `make distclean'. There is
also a `make maintainer-clean' target, but that is intended mainly
for the package's developers. If you use it, you may have to get
all sorts of other programs in order to regenerate files that came
with the distribution.
Compilers and Options
=====================
Some systems require unusual options for compilation or linking that the
`configure' script does not know about. Run `./configure --help' for
details on some of the pertinent environment variables.
You can give `configure' initial values for configuration parameters
by setting variables in the command line or in the environment. Here
is an example:
./configure CC=c99 CFLAGS=-g LIBS=-lposix
*Note Defining Variables::, for more details.
Compiling For Multiple Architectures
====================================
You can compile the package for more than one kind of computer at the
same time, by placing the object files for each architecture in their
own directory. To do this, you can use GNU `make'. `cd' to the
directory where you want the object files and executables to go and run
the `configure' script. `configure' automatically checks for the
source code in the directory that `configure' is in and in `..'.
With a non-GNU `make', it is safer to compile the package for one
architecture at a time in the source code directory. After you have
installed the package for one architecture, use `make distclean' before
reconfiguring for another architecture.
Installation Names
==================
By default, `make install' installs the package's commands under
`/usr/local/bin', include files under `/usr/local/include', etc. You
can specify an installation prefix other than `/usr/local' by giving
`configure' the option `--prefix=PREFIX'.
You can specify separate installation prefixes for
architecture-specific files and architecture-independent files. If you
pass the option `--exec-prefix=PREFIX' to `configure', the package uses
PREFIX as the prefix for installing programs and libraries.
Documentation and other data files still use the regular prefix.
In addition, if you use an unusual directory layout you can give
options like `--bindir=DIR' to specify different values for particular
kinds of files. Run `configure --help' for a list of the directories
you can set and what kinds of files go in them.
If the package supports it, you can cause programs to be installed
with an extra prefix or suffix on their names by giving `configure' the
option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
Optional Features
=================
Some packages pay attention to `--enable-FEATURE' options to
`configure', where FEATURE indicates an optional part of the package.
They may also pay attention to `--with-PACKAGE' options, where PACKAGE
is something like `gnu-as' or `x' (for the X Window System). The
`README' should mention any `--enable-' and `--with-' options that the
package recognizes.
For packages that use the X Window System, `configure' can usually
find the X include and library files automatically, but if it doesn't,
you can use the `configure' options `--x-includes=DIR' and
`--x-libraries=DIR' to specify their locations.
Specifying the System Type
==========================
There may be some features `configure' cannot figure out automatically,
but needs to determine by the type of machine the package will run on.
Usually, assuming the package is built to be run on the _same_
architectures, `configure' can figure that out, but if it prints a
message saying it cannot guess the machine type, give it the
`--build=TYPE' option. TYPE can either be a short name for the system
type, such as `sun4', or a canonical name which has the form:
CPU-COMPANY-SYSTEM
where SYSTEM can have one of these forms:
OS KERNEL-OS
See the file `config.sub' for the possible values of each field. If
`config.sub' isn't included in this package, then this package doesn't
need to know the machine type.
If you are _building_ compiler tools for cross-compiling, you should
use the option `--target=TYPE' to select the type of system they will
produce code for.
If you want to _use_ a cross compiler, that generates code for a
platform different from the build platform, you should specify the
"host" platform (i.e., that on which the generated programs will
eventually be run) with `--host=TYPE'.
Sharing Defaults
================
If you want to set default values for `configure' scripts to share, you
can create a site shell script called `config.site' that gives default
values for variables like `CC', `cache_file', and `prefix'.
`configure' looks for `PREFIX/share/config.site' if it exists, then
`PREFIX/etc/config.site' if it exists. Or, you can set the
`CONFIG_SITE' environment variable to the location of the site script.
A warning: not all `configure' scripts look for a site script.
Defining Variables
==================
Variables not defined in a site shell script can be set in the
environment passed to `configure'. However, some packages may run
configure again during the build, and the customized values of these
variables may be lost. In order to avoid this problem, you should set
them in the `configure' command line, using `VAR=value'. For example:
./configure CC=/usr/local2/bin/gcc
causes the specified `gcc' to be used as the C compiler (unless it is
overridden in the site shell script).
Unfortunately, this technique does not work for `CONFIG_SHELL' due to
an Autoconf bug. Until the bug is fixed you can use this workaround:
CONFIG_SHELL=/bin/bash /bin/bash ./configure CONFIG_SHELL=/bin/bash
`configure' Invocation
======================
`configure' recognizes the following options to control how it operates.
`--help'
`-h'
Print a summary of the options to `configure', and exit.
`--version'
`-V'
Print the version of Autoconf used to generate the `configure'
script, and exit.
`--cache-file=FILE'
Enable the cache: use and save the results of the tests in FILE,
traditionally `config.cache'. FILE defaults to `/dev/null' to
disable caching.
`--config-cache'
`-C'
Alias for `--cache-file=config.cache'.
`--quiet'
`--silent'
`-q'
Do not print messages saying which checks are being made. To
suppress all normal output, redirect it to `/dev/null' (any error
messages will still be shown).
`--srcdir=DIR'
Look for the package's source code in directory DIR. Usually
`configure' can determine that directory automatically.
`configure' also accepts some other, not widely useful, options. Run
`configure --help' for more details.
## Process this file with automake to produce Makefile.in
# Copyright 2010 University of Helsinki
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# to silence:
# libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am.
ACLOCAL_AMFLAGS=-I m4
# targets
if EXTRA_DEMOS
CONFERENCE_DEMOS=hfst-ospell-norvig hfst-ospell-fsmnlp-2012 hfst-ospell-cicling\
hfst-ospell-survey hfst-ospell-lrec2013 hfst-ispell
endif # EXTRA_DEMOS
if HFST_OSPELL_OFFICE
MAYBE_HFST_OSPELL_OFFICE=hfst-ospell-office
endif # HFST_OSPELL_OFFICE
bin_PROGRAMS=hfst-ospell $(MAYBE_HFST_OSPELL_OFFICE) $(CONFERENCE_DEMOS)
lib_LTLIBRARIES=libhfstospell.la
man1_MANS=hfst-ospell.1 hfst-ospell-office.1
PKG_LIBS=
PKG_CXXFLAGS=
if WANT_ARCHIVE
PKG_LIBS+=$(LIBARCHIVE_LIBS)
PKG_CXXFLAGS+=$(LIBARCHIVE_CFLAGS)
endif
if WANT_LIBXMLPP
PKG_LIBS+=$(LIBXMLPP_LIBS)
PKG_CXXFLAGS+=$(LIBXMLPP_CFLAGS)
endif
if WANT_TINYXML2
PKG_LIBS+=$(TINYXML2_LIBS)
PKG_CXXFLAGS+=$(TINYXML2_CFLAGS)
endif
# library parts
libhfstospell_la_SOURCES=hfst-ol.cc ospell.cc \
ZHfstOspeller.cc ZHfstOspellerXmlMetadata.cc
libhfstospell_la_CXXFLAGS=$(AM_CXXFLAGS) $(CXXFLAGS) $(PKG_CXXFLAGS)
libhfstospell_la_LDFLAGS=-no-undefined -version-info 10:0:0 \
$(PKG_LIBS)
# link sample program against library here
hfst_ospell_SOURCES=main.cc
hfst_ospell_LDADD=libhfstospell.la
hfst_ospell_CXXFLAGS=$(AM_CXXFLAGS) $(CXXFLAGS) \
$(PKG_CXXFLAGS)
if HFST_OSPELL_OFFICE
hfst_ospell_office_SOURCES=office.cc
hfst_ospell_office_LDADD=libhfstospell.la
hfst_ospell_office_CXXFLAGS=$(AM_CXXFLAGS) $(CXXFLAGS) $(PKG_CXXFLAGS)
endif # HFST_OSPELL_OFFICE
if EXTRA_DEMOS
hfst_ospell_norvig_SOURCES=main-norvig.cc
hfst_ospell_norvig_LDADD=libhfstospell.la
hfst_ospell_norvig_CXXFLAGS=$(AM_CXXFLAGS) $(CXXFLAGS) \
$(PKG_CXXFLAGS)
hfst_ospell_cicling_SOURCES=main-cicling.cc
hfst_ospell_cicling_LDADD=libhfstospell.la
hfst_ospell_cicling_CXXFLAGS=$(AM_CXXFLAGS) $(CXXFLAGS) \
$(PKG_CXXFLAGS)
hfst_ospell_lrec2013_SOURCES=main-lrec2013.cc
hfst_ospell_lrec2013_LDADD=libhfstospell.la
hfst_ospell_lrec2013_CXXFLAGS=$(AM_CXXFLAGS) $(CXXFLAGS) \
$(PKG_CXXFLAGS)
hfst_ospell_survey_SOURCES=main-survey.cc
hfst_ospell_survey_LDADD=libhfstospell.la
hfst_ospell_survey_CXXFLAGS=$(AM_CXXFLAGS) $(CXXFLAGS) \
$(PKG_CXXFLAGS)
hfst_ospell_fsmnlp_2012_SOURCES=main-fsmnlp-2012.cc
hfst_ospell_fsmnlp_2012_LDADD=libhfstospell.la
hfst_ospell_fsmnlp_2012_CXXFLAGS=$(AM_CXXFLAGS) $(CXXFLAGS) \
$(PKG_CXXFLAGS)
endif # EXTRA_DEMOS
if EXTRA_DEMOS
hfst_ispell_SOURCES=main-ispell.cc
hfst_ispell_LDADD=libhfstospell.la
hfst_ispell_CXXFLAGS=$(AM_CXXFLAGS) $(CXXFLAGS) \
$(PKG_CXXFLAGS)
endif # EXTRA_DEMOS
# install headers for library in hfst's includedir
include_HEADERS=hfst-ol.h ospell.h ol-exceptions.h \
ZHfstOspeller.h ZHfstOspellerXmlMetadata.h \
hfstol-stdafx.h
# pkgconfig
pkgconfigdir=$(libdir)/pkgconfig
pkgconfig_DATA=hfstospell.pc
# tests
TESTS=tests/basic-zhfst.sh tests/basic-edit1.sh \
tests/empty-descriptions.sh tests/empty-titles.sh tests/empty-locale.sh \
tests/trailing-spaces.sh tests/bad-errormodel.sh tests/empty-zhfst.sh \
tests/analyse-spell.sh tests/no-errormodel.sh
if WANT_ARCHIVE
XFAIL_TESTS=tests/empty-descriptions.sh tests/empty-titles.sh tests/empty-locale.sh tests/empty-zhfst.sh
else
XFAIL_TESTS=tests/empty-descriptions.sh tests/empty-titles.sh tests/empty-locale.sh tests/empty-zhfst.sh \
tests/basic-zhfst.sh tests/basic-edit1.sh tests/trailing-spaces.sh tests/bad-errormodel.sh \
tests/analyse-spell.sh tests/no-errormodel.sh
endif
if CAN_DOXYGEN
doxygen:
$(DOXYGEN)
endif
EXTRA_DIST=hfst-ospell.1 hfst-ospell-office.1 tests/basic-zhfst.sh tests/basic-edit1.sh \
tests/empty-descriptions.sh tests/empty-titles.sh tests/empty-locale.sh \
tests/trailing-spaces.sh tests/bad-errormodel.sh tests/empty-zhfst.sh \
tests/analyse-spell.sh tests/no-errormodel.sh \
tests/empty-descriptions.sh tests/empty-titles.sh tests/empty-locale.sh tests/empty-zhfst.sh \
tests/acceptor.basic.txt tests/analyser.default.txt tests/errmodel.basic.txt tests/errmodel.edit1.txt tests/errmodel.extrachars.txt \
tests/test.strings \
tests/bad_errormodel.zhfst tests/empty_descriptions.zhfst tests/empty_locale.zhfst tests/empty_titles.zhfst tests/no_errormodel.zhfst \
tests/speller_analyser.zhfst tests/speller_basic.zhfst tests/speller_edit1.zhfst tests/trailing_spaces.zhfst \
tests/basic_test.xml tests/empty_descriptions.xml tests/empty_locale.xml tests/empty_titles.xml tests/no_errmodel.xml tests/trailing_spaces.xml
NEWS 0 → 100644
NEWS for hfst-ospell
====================
This file contains all noteworthy changes in HFST-ospell development between
releases. For full listing of changes see ChangeLog.
Noteworthy changes in 0.5.0
---------------------------
* rename hfst_ol namespace to hfst_ospell to avoid conflicts
* improve distinguishing between lemmas and tags in analysis
* fix issue #37
* avoid shadowing multicharacter ascii-beginning symbols
* use minimal XML parsing to get locale, title, and description for Voikko and other frontends
Noteworthy changes in 0.4.5
---------------------------
* this is a bugfix release
Noteworthy changes in 0.4.4
---------------------------
* restructure order of files
* remove HFST dependency
* fix issue #26
* check for failures in archive extraction
* fix dll issues on windows
* allow building without XML support
Noteworthy changes in 0.4.3
---------------------------
* fixes for big endian conversions
* use max version for tinyxml
Noteworthy changes in 0.4.2
---------------------------
* small modifications to tests and documentation
Noteworthy changes in 0.4.1
---------------------------
* set time cutoff to 6.0 seconds for ospell-office
* minor bug fixes
Noteworthy changes in 0.4.0
---------------------------
* add option --beam to hfst-ospell for restricting the search to a margin above the optimum
* add option --time-cutoff to hfst-ospell for restricting the time spent on searching for better corrections
* add option --enable-hfst-ospell-office to configure (defaults to yes)
* use libarchive2 if libarchive3 is not available
* determine and use highest supported C++ standard when compiling
* support unknown and identity arcs both in the lexicon and error model
Noteworthy changes in 0.3.0
---------------------------
* New API for analysing and suggesting
* Moved code from headers to implementation files (API change)
* Added Doxygen to mark stable API
* Fixes for bad and malformed metadara handling
* Limiting number of suggestions now works
Noteworthy changes in 0.2.5
---------------------------
* optional support for tinyxml2
* preliminary support for two-tape automata and *analysis* lookup
* conference demos are no longer built by default
* libarchive newer than 3 allowed
Noteworthy changes in 0.2.4
---------------------------
* renamed the package hfstospell (from hfst-ospell), the previous rename caused
build issues.
Noteworthy changes in 0.2.3
---------------------------
* fixed a bug that caused certain types of paths with flag diacritics not to
be accepted.
Noteworthy changes in 0.2.2
---------------------------
* Memory and speed improvements; data structures for automaton changed
* Tests and bug fixes for building
Noteworthy changes in 0.2.1
---------------------------
* Added support for extracting zipped transducer collections to memory instead
of temporary files
* Changed from libxml to libxml++ for XML parsing
Noteworthy changes in 0.2.0
---------------------------
* Added support for zipped XML based transducer collection format.
* Few new frontends for various experiments
* Lots of metadata everywhere
Noteworthy changes in 0.1.1
---------------------------
* Added autoconfiscation to avoid bugs like missing Makefile in tarball
Noteworthy changes in 0.1
-------------------------
* First release
README 0 → 100644
# Hfst-ospell library and toy commandline tester
This is a minimal hfst optimized lookup format based spell checker library and
a demonstrational implementation of command line based spell checker. The
library is licenced under Apache licence version 2, other licences can be
obtained from University of Helsinki.
[![Build Status](https://travis-ci.org/hfst/hfst-ospell.svg?branch=master)](https://travis-ci.org/hfst/hfst-ospell)
## Dependencies
- libxml++2
- libarchive
## Debian packages for dependencies
- libxml++2-dev
- libarchive-dev
## Usage
Usage in external programs:
#include <ospell.h>
and compile your project with:
$(pkg-config --cflags hfstospell)
and link with:
$(pkg-config --libs hfstospell)
## Programming examples
The library lives in a namespace called hfst_ospell. Pass (weighted!) Transducer
pointers to the Speller constructor, eg.:
FILE * error_source = fopen(error_filename, "r");
FILE * lexicon_file = fopen(lexicon_filename, "r");
hfst_ospell::Transducer * error;
hfst_ospell::Transducer * lexicon;
try {
error = new hfst_ospell::Transducer(error_source);
lexicon = new hfst_ospell::Transducer(lexicon_file);
} catch (hfst_ospell::TransducerParsingException& e) {
/* problem with transducer file, usually completely
different type of file - there's no magic number
in the header to check for this */
}
hfst_ospell::Speller * speller;
try {
speller = new hfst_ospell::Speller(error, lexicon);
} catch (hfst_ospell::AlphabetTranslationException& e) {
/* problem with translating between the two alphabets */
}
And use the functions:
// returns true if line is found in lexicon
bool hfst_ospell::Speller::check(char * line);
// CorrectionQueue is a priority queue, sorted by weight
hfst_ospell::CorrectionQueue hfst_ospell::Speller::correct(char * line);
to communicate with it. See main.cc for a concrete usage example.
## Command-line tool
Main.cc provides a demo utility with the following help message:
Usage: hfst-ospell [OPTIONS] ERRORSOURCE LEXICON
Run a composition of ERRORSOURCE and LEXICON on standard input and
print corrected output
-h, --help Print this help message
-V, --version Print version information
-v, --verbose Be verbose
-q, --quiet Don't be verbose (default)
-s, --silent Same as quiet
Report bugs to hfst-bugs@ling.helsinki.fi
# Use in real-world applications
The HFST based spellers can be used in real applications with help of
[voikko](http://voikko.sf.net). Voikko in turn can be used with enchant,
libreoffice, and firefox.
/* -*- Mode: C++ -*- */
// Copyright 2010 University of Helsinki
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#if HAVE_CONFIG_H
# include <config.h>
#endif
// C
#if HAVE_LIBARCHIVE
# include <archive.h>
# include <archive_entry.h>
#endif
// C++
#if HAVE_LIBXML
# include <libxml++/libxml++.h>
#endif
#include <string>
#include <map>
using std::string;
using std::map;
// local
#include "ospell.h"
#include "hfst-ol.h"
#include "ZHfstOspeller.h"
namespace hfst_ospell
{
#if HAVE_LIBARCHIVE
inline std::string extract_to_mem(archive* ar, archive_entry* entry) {
size_t full_length = 0;
const struct stat* st = archive_entry_stat(entry);
size_t buffsize = st->st_size;
if (buffsize == 0) {
std::cerr << archive_error_string(ar) << std::endl;
throw ZHfstZipReadingError("Reading archive resulted in zero length entry");
}
std::string buff(buffsize, 0);
for (;;) {
ssize_t curr = archive_read_data(ar, &buff[0] + full_length, buffsize - full_length);
if (0 == curr) {
break;
}
else if (ARCHIVE_RETRY == curr) {
continue;
}
else if (ARCHIVE_FAILED == curr) {
throw ZHfstZipReadingError("Archive broken (ARCHIVE_FAILED)");
}
else if (curr < 0) {
throw ZHfstZipReadingError("Archive broken...");
}
else {
full_length += curr;
}
}
if (full_length == 0) {
std::cerr << archive_error_string(ar) << std::endl;
throw ZHfstZipReadingError("Reading archive resulted in zero length");
}
return buff;
}
inline Transducer* transducer_to_mem(archive* ar, archive_entry* entry) {
std::string buff = extract_to_mem(ar, entry);
Transducer *trans = new Transducer(&buff[0]);
return trans;
}
inline char* extract_to_tmp_dir(archive* ar) {
char* rv = strdup("/tmp/zhfstospellXXXXXXXX");
int temp_fd = mkstemp(rv);
int rr = archive_read_data_into_fd(ar, temp_fd);
if ((rr != ARCHIVE_EOF) && (rr != ARCHIVE_OK)) {
throw ZHfstZipReadingError("Archive not EOF'd or OK'd");
}
close(temp_fd);
return rv;
}
inline Transducer* transducer_to_tmp_dir(archive* ar) {
char *filename = extract_to_tmp_dir(ar);
FILE* f = fopen(filename, "rb");
if (f == nullptr) {
throw ZHfstTemporaryWritingError("reading acceptor back from temp file");
}
return new Transducer(f);
}
#endif // HAVE_LIBARCHIVE
ZHfstOspeller::ZHfstOspeller() :
suggestions_maximum_(0),
maximum_weight_(-1.0),
beam_(-1.0),
time_cutoff_(0.0),
can_spell_(false),
can_correct_(false),
can_analyse_(true),
current_speller_(0),
current_sugger_(0)
{
}
ZHfstOspeller::~ZHfstOspeller()
{
if ((current_speller_ != NULL) && (current_sugger_ != NULL))
{
if (current_speller_ != current_sugger_)
{
delete current_speller_;
delete current_sugger_;
}
else
{
delete current_speller_;
}
current_sugger_ = 0;
current_speller_ = 0;
}
for (auto& acceptor : acceptors_)
{
delete acceptor.second;
}
for (auto& errmodel : errmodels_)
{
delete errmodel.second;
}
can_spell_ = false;
can_correct_ = false;
}
void
ZHfstOspeller::inject_speller(Speller * s)
{
current_speller_ = s;
current_sugger_ = s;
can_spell_ = true;
can_correct_ = true;
}
void
ZHfstOspeller::set_queue_limit(unsigned long limit)
{
suggestions_maximum_ = limit;
}
void
ZHfstOspeller::set_weight_limit(Weight limit)
{
maximum_weight_ = limit;
}
void
ZHfstOspeller::set_beam(Weight beam)
{
beam_ = beam;
}
void
ZHfstOspeller::set_time_cutoff(float time_cutoff)
{
time_cutoff_ = time_cutoff;
}
bool
ZHfstOspeller::spell(const string& wordform)
{
if (can_spell_ && (current_speller_ != 0))
{
char* wf = strdup(wordform.c_str());
bool rv = current_speller_->check(wf);
free(wf);
return rv;
}
return false;
}
CorrectionQueue
ZHfstOspeller::suggest(const string& wordform)
{
CorrectionQueue rv;
if ((can_correct_) && (current_sugger_ != 0))
{
char* wf = strdup(wordform.c_str());
rv = current_sugger_->correct(wf,
suggestions_maximum_,
maximum_weight_,
beam_,
time_cutoff_);
free(wf);
return rv;
}
return rv;
}
AnalysisQueue
ZHfstOspeller::analyse(const string& wordform, bool ask_sugger)
{
AnalysisQueue rv;
char* wf = strdup(wordform.c_str());
if ((can_analyse_) && (!ask_sugger) && (current_speller_ != 0))
{
rv = current_speller_->analyse(wf);
}
else if ((can_analyse_) && (ask_sugger) && (current_sugger_ != 0))
{
rv = current_sugger_->analyse(wf);
}
free(wf);
return rv;
}
AnalysisSymbolsQueue
ZHfstOspeller::analyseSymbols(const string& wordform, bool ask_sugger)
{
AnalysisSymbolsQueue rv;
char* wf = strdup(wordform.c_str());
if ((can_analyse_) && (!ask_sugger) && (current_speller_ != 0))
{
rv = current_speller_->analyseSymbols(wf);
}
else if ((can_analyse_) && (ask_sugger) && (current_sugger_ != 0))
{
rv = current_sugger_->analyseSymbols(wf);
}
free(wf);
return rv;
}
AnalysisCorrectionQueue
ZHfstOspeller::suggest_analyses(const string& wordform)
{
AnalysisCorrectionQueue rv;
// FIXME: should be atomic
CorrectionQueue cq = suggest(wordform);
while (cq.size() > 0)
{
AnalysisQueue aq = analyse(cq.top().first, true);
while (aq.size() > 0)
{
StringPair sp(cq.top().first, aq.top().first);
StringPairWeightPair spwp(sp, aq.top().second);
rv.push(spwp);
aq.pop();
}
cq.pop();
}
return rv;
}
void
ZHfstOspeller::read_zhfst(const string& filename)
{
#if HAVE_LIBARCHIVE
struct archive* ar = archive_read_new();
struct archive_entry* entry = 0;
#if USE_LIBARCHIVE_2
archive_read_support_compression_all(ar);
#else
archive_read_support_filter_all(ar);
#endif // USE_LIBARCHIVE_2
archive_read_support_format_all(ar);
int rr = archive_read_open_filename(ar, filename.c_str(), 10240);
if (rr != ARCHIVE_OK)
{
throw ZHfstZipReadingError("Archive not OK");
}
for (int rr = archive_read_next_header(ar, &entry);
rr != ARCHIVE_EOF;
rr = archive_read_next_header(ar, &entry))
{
if (rr != ARCHIVE_OK)
{
throw ZHfstZipReadingError("Archive not OK");
}
char* filename = strdup(archive_entry_pathname(entry));
if (strncmp(filename, "acceptor.", strlen("acceptor.")) == 0) {
Transducer* trans = nullptr;
#if ZHFST_EXTRACT_TO_MEM == 1
// Try to memory first...
try {
trans = transducer_to_mem(ar, entry);
}
catch (...) {
// If that failed, try to /tmp
//std::cerr << "Failed to memory - falling back to /tmp" << std::endl;
trans = transducer_to_tmp_dir(ar);
}
#else
// Try to /tmp first...
try {
trans = transducer_to_tmp_dir(ar);
}
catch (...) {
// If that failed, try to memory
//std::cerr << "Failed to /tmp - falling back to memory" << std::endl;
trans = transducer_to_mem(ar, entry);
}
#endif
if (trans == nullptr) {
throw ZHfstZipReadingError("Failed to extract acceptor");
}
char* p = filename;
p += strlen("acceptor.");
size_t descr_len = 0;
for (const char* q = p; *q != '\0'; q++)
{
if (*q == '.')
{
break;
}
else
{
descr_len++;
}
}
char* descr = hfst_strndup(p, descr_len);
acceptors_[descr] = trans;
free(descr);
}
else if (strncmp(filename, "errmodel.", strlen("errmodel.")) == 0) {
Transducer* trans = nullptr;
#if ZHFST_EXTRACT_TO_MEM == 1
// Try to memory first...
try {
trans = transducer_to_mem(ar, entry);
}
catch (...) {
// If that failed, try to /tmp
//std::cerr << "Failed to memory - falling back to /tmp" << std::endl;
trans = transducer_to_tmp_dir(ar);
}
#else
// Try to /tmp first...
try {
trans = transducer_to_tmp_dir(ar);
}
catch (...) {
// If that failed, try to memory
//std::cerr << "Failed to /tmp - falling back to memory" << std::endl;
trans = transducer_to_mem(ar, entry);
}
#endif
if (trans == nullptr) {
throw ZHfstZipReadingError("Failed to extract error model");
}
const char* p = filename;
p += strlen("errmodel.");
size_t descr_len = 0;
for (const char* q = p; *q != '\0'; q++)
{
if (*q == '.')
{
break;
}
else
{
descr_len++;
}
}
char* descr = hfst_strndup(p, descr_len);
errmodels_[descr] = trans;
free(descr);
} // if acceptor or errmodel
else if (strcmp(filename, "index.xml") == 0) {
// Always try to memory first, as index.xml is tiny
try {
std::string full_data = extract_to_mem(ar, entry);
metadata_.read_xml(&full_data[0], full_data.size());
}
catch (...) {
char* temporary = extract_to_tmp_dir(ar);
metadata_.read_xml(temporary);
}
}
else
{
fprintf(stderr, "Unknown file in archive %s\n", filename);
}
free(filename);
} // while r != ARCHIVE_EOF
archive_read_close(ar);
#if USE_LIBARCHIVE_2
archive_read_finish(ar);
#else
archive_read_free(ar);
#endif // USE_LIBARCHIVE_2
if ((errmodels_.find("default") != errmodels_.end()) &&
(acceptors_.find("default") != acceptors_.end()))
{
current_speller_ = new Speller(
errmodels_["default"],
acceptors_["default"]
);
current_sugger_ = current_speller_;
can_spell_ = true;
can_correct_ = true;
}
else if ((acceptors_.size() > 0) && (errmodels_.size() > 0))
{
fprintf(stderr, "Could not find default speller, using %s %s\n",
acceptors_.begin()->first.c_str(),
errmodels_.begin()->first.c_str());
current_speller_ = new Speller(
errmodels_.begin()->second,
acceptors_.begin()->second
);
current_sugger_ = current_speller_;
can_spell_ = true;
can_correct_ = true;
}
else if ((acceptors_.size() > 0) &&
(acceptors_.find("default") != acceptors_.end()))
{
current_speller_ = new Speller(0, acceptors_["default"]);
current_sugger_ = current_speller_;
can_spell_ = true;
can_correct_ = false;
}
else if (acceptors_.size() > 0)
{
current_speller_ = new Speller(0, acceptors_.begin()->second);
current_sugger_ = current_speller_;
can_spell_ = true;
can_correct_ = false;
}
else
{
throw ZHfstZipReadingError("No automata found in zip");
}
can_analyse_ = can_spell_ | can_correct_;
#else
throw ZHfstZipReadingError("Zip support was disabled");
#endif // HAVE_LIBARCHIVE
}
const ZHfstOspellerXmlMetadata&
ZHfstOspeller::get_metadata() const
{
return metadata_;
}
string
ZHfstOspeller::metadata_dump() const
{
return metadata_.debug_dump();
}
} // namespace hfst_ospell
/* -*- Mode: C++ -*- */
// Copyright 2010 University of Helsinki
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! @mainpage API to HFST ospell WFST spell-checking
//!
//! The hfst-ospell API has several layers for different end-users. A suggested
//! starting point for new user is the @c ZHfstOspeller object, which reads an
//! automaton set from zipped hfst file with metadata and provides high level
//! access to it with generic spell-checking, correction and analysis functions.
//! Second level of access is the Speller object, which can be used to
//! construct spell-checker with two automata and traverse it and query
//! low-level properties. The Speller is constructed with two Transducer objects
//! which are the low-level access point to the automata with all the gory
//! details of transition tables and symbol translations, headers and such.
#ifndef HFST_OSPELL_ZHFSTOSPELLER_H_
#define HFST_OSPELL_ZHFSTOSPELLER_H_
#include "hfstol-stdafx.h"
#if HAVE_CONFIG_H
# include <config.h>
#endif
#include <stdexcept>
#include <map>
#include "ospell.h"
#include "hfst-ol.h"
#include "ZHfstOspellerXmlMetadata.h"
namespace hfst_ospell
{
//! @brief ZHfstOspeller class holds one speller contained in one
//! zhfst file.
//! Ospeller can perform all basic writer tool functionality that
//! is supporte by the automata in the zhfst archive.
class ZHfstOspeller
{
public:
//! @brief create speller with default values for undefined
//! language.
OSPELL_API ZHfstOspeller();
//! @brief destroy all automata used by the speller.
OSPELL_API ~ZHfstOspeller();
//! @brief assign a speller-suggestor circumventing the ZHFST format
OSPELL_API void inject_speller(Speller * s);
//! @brief set upper limit to priority queue when performing
// suggestions or analyses.
OSPELL_API void set_queue_limit(unsigned long limit);
//! @brief set upper limit for weights
OSPELL_API void set_weight_limit(Weight limit);
//! @brief set search beam
OSPELL_API void set_beam(Weight beam);
//! @brief set time cutoff for correcting
OSPELL_API void set_time_cutoff(float time_cutoff);
//! @brief construct speller from named file containing valid
//! zhfst archive.
OSPELL_API void read_zhfst(const std::string& filename);
//! @brief check if the given word is spelled correctly
OSPELL_API bool spell(const std::string& wordform);
//! @brief construct an ordered set of corrections for misspelled
//! word form.
OSPELL_API CorrectionQueue suggest(const std::string& wordform);
//! @brief analyse word form morphologically
//! @param wordform the string to analyse
//! @param ask_sugger whether to use the spelling correction model
// instead of the detection model
AnalysisQueue analyse(const std::string& wordform,
bool ask_sugger = false);
//! @brief analyse word form morphologically, unconcatenated output
//! strings (making it easier to find Multichar_symbols of
//! the FST)
//! @param wordform the string to analyse
//! @param ask_sugger whether to use the spelling correction model
// instead of the detection model
AnalysisSymbolsQueue analyseSymbols(const std::string& wordform,
bool ask_sugger = false);
//! @brief construct an ordered set of corrections with analyses
AnalysisCorrectionQueue suggest_analyses(const std::string&
wordform);
//! @brief hyphenate word form
HyphenationQueue hyphenate(const std::string& wordform);
//! @brief get access to metadata read from XML.
const ZHfstOspellerXmlMetadata& get_metadata() const;
//! @brief create string representation of the speller for
//! programmer to debug
std::string metadata_dump() const;
private:
//! @brief file or path where the speller came from
std::string filename_;
//! @brief upper bound for suggestions generated and given
unsigned long suggestions_maximum_;
//! @brief upper bound for suggestion weight generated and given
Weight maximum_weight_;
//! @brief upper bound for search beam around best candidate
Weight beam_;
//! @brief upper bound for search time in seconds
float time_cutoff_;
//! @brief whether automatons loaded yet can be used to check
//! spelling
bool can_spell_;
//! @brief whether automatons loaded yet can be used to correct
//! word forms
bool can_correct_;
//! @brief whether automatons loaded yet can be used to analyse
//! word forms
bool can_analyse_;
//! @brief whether automatons loaded yet can be used to hyphenate
//! word forms
bool can_hyphenate_;
//! @brief dictionaries loaded
std::map<std::string, Transducer*> acceptors_;
//! @brief error models loaded
std::map<std::string, Transducer*> errmodels_;
//! @brief pointer to current speller
Speller* current_speller_;
//! @brief pointer to current correction model
Speller* current_sugger_;
//! @brief pointer to current morphological analyser
Speller* current_analyser_;
//! @brief pointer to current hyphenator
Transducer* current_hyphenator_;
//! @brief the metadata of loaded speller
ZHfstOspellerXmlMetadata metadata_;
};
//! @brief Top-level exception for zhfst handling.
//! Contains a human-readable error message that can be displayed to
//! end-user as additional info when either solving exception or exiting.
class ZHfstException : public std::runtime_error
{
public:
ZHfstException() : std::runtime_error("unknown") {}
//! @brief construct error with human readable message.
//!
//! the message will be displayed when recovering or dying from
//! exception
explicit ZHfstException(const std::string& message) : std::runtime_error(message) {}
};
//! @brief Generic error in metadata parsing.
//
//! Gets raised if metadata is erroneous or missing.
class ZHfstMetaDataParsingError : public ZHfstException
{
public:
explicit ZHfstMetaDataParsingError(const std::string& message) : ZHfstException(message) {}
};
//! @brief Exception for XML parser errors.
//
//! Gets raised if underlying XML parser finds an error in XML data.
//! Errors include non-valid XML, missing or erroneous attributes or
//! elements, etc.
class ZHfstXmlParsingError : public ZHfstException
{
public:
explicit ZHfstXmlParsingError(const std::string& message) : ZHfstException(message) {}
};
//! @brief Generic error while reading zip file.
//!
//! Happens when libarchive is unable to proceed reading zip file or
//! zip file is missing required files.
class ZHfstZipReadingError : public ZHfstException
{
public:
explicit ZHfstZipReadingError(const std::string& message) : ZHfstException(message) {}
};
//! @brief Error when writing to temporary location.
//
//! This exception gets thrown, when e.g., zip extraction is unable to
//! find or open temporary file for writing.
class ZHfstTemporaryWritingError : public ZHfstException
{
public:
explicit ZHfstTemporaryWritingError(const std::string& message) : ZHfstException(message) {}
};
} // namespace hfst_ospell
#endif // HFST_OSPELL_OSPELLER_SET_H_
// vim: set ft=cpp.doxygen:
This diff is collapsed.
/* -*- Mode: C++ -*- */
// Copyright 2010 University of Helsinki
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef HFST_OSPELL_ZHFSTOSPELLERXMLMETADATA_H_
#define HFST_OSPELL_ZHFSTOSPELLERXMLMETADATA_H_ 1
#include "hfstol-stdafx.h"
#if HAVE_CONFIG_H
# include <config.h>
#endif
#include <map>
using std::map;
#if HAVE_LIBXML
# include <libxml++/libxml++.h>
#elif HAVE_TINYXML2
# include <tinyxml2.h>
#endif
#include "ospell.h"
#include "hfst-ol.h"
namespace hfst_ospell
{
//! @brief data type for associating set of translations to languages.
typedef std::map<std::string,std::string> LanguageVersions;
//! @brief ZHfstOspellerInfo represents one info block of an zhfst file.
//! @see https://victorio.uit.no/langtech/trunk/plan/proof/doc/lexfile-spec.xml
struct ZHfstOspellerInfoMetadata
{
//! @brief active locale of speller in BCP format
std::string locale_;
//! @brief transalation of titles to all languages
LanguageVersions title_;
//! @brief translation of descriptions to all languages
LanguageVersions description_;
//! @brief version defintition as free form string
std::string version_;
//! @brief vcs revision as string
std::string vcsrev_;
//! @brief date for age of speller as string
std::string date_;
//! @brief producer of the speller
std::string producer_;
//! @brief email address of the speller
std::string email_;
//! @brief web address of the speller
std::string website_;
};
//! @brief Represents one acceptor block in XML metadata
struct ZHfstOspellerAcceptorMetadata
{
//! @brief unique id of acceptor
std::string id_;
//! @brief descr part of acceptor
std::string descr_;
//! @brief type of dictionary
std::string type_;
//! @brief type of transducer
std::string transtype_;
//! @brief titles of dictionary in languages
LanguageVersions title_;
//! @brief descriptions of dictionary in languages
LanguageVersions description_;
};
//! @brief Represents one errmodel block in XML metadata
struct ZHfstOspellerErrModelMetadata
{
//! @brief id of each error model in set
std::string id_;
//! @brief descr part of each id
std::string descr_;
//! @brief title of error models in languages
LanguageVersions title_;
//! @brief description of error models in languages
LanguageVersions description_;
//! @brief types of error models
std::vector<std::string> type_;
//! @brief models
std::vector<std::string> model_;
};
//! @brief holds one index.xml metadata for whole ospeller
class ZHfstOspellerXmlMetadata
{
public:
//! @brief construct metadata for undefined language and other default
//! values
ZHfstOspellerXmlMetadata();
//! @brief read metadata from XML file by @a filename.
void read_xml(const std::string& filename);
//! @brief read XML from in memory @a data pointer with given @a length
//!
//! Depending on the XML library compiled in, the data length may
//! be omitted and the buffer may be overflown.
void read_xml(const char* data, size_t data_length);
//! @brief create a programmer readable dump of XML metadata.
//!
//! shows linear serialisation of all header data in random order.
std::string debug_dump() const;
public:
ZHfstOspellerInfoMetadata info_; //!< The info node data
//! @brief data for acceptor nodes
std::map<std::string,ZHfstOspellerAcceptorMetadata> acceptor_;
//! @brief data for errmodel nodes
std::vector<ZHfstOspellerErrModelMetadata> errmodel_;
#if HAVE_LIBXML
private:
void parse_xml(const xmlpp::Document* doc);
void verify_hfstspeller(xmlpp::Node* hfstspellerNode);
void parse_info(xmlpp::Node* infoNode);
void parse_locale(xmlpp::Node* localeNode);
void parse_title(xmlpp::Node* titleNode);
void parse_description(xmlpp::Node* descriptionNode);
void parse_version(xmlpp::Node* versionNode);
void parse_date(xmlpp::Node* dateNode);
void parse_producer(xmlpp::Node* producerNode);
void parse_contact(xmlpp::Node* contactNode);
void parse_acceptor(xmlpp::Node* acceptorNode);
void parse_title(xmlpp::Node* titleNode, const std::string& accName);
void parse_description(xmlpp::Node* descriptionNode,
const std::string& accName);
void parse_errmodel(xmlpp::Node* errmodelNode);
void parse_title(xmlpp::Node* titleNode, size_t errm_count);
void parse_description(xmlpp::Node* descriptionNode, size_t errm_count);
void parse_type(xmlpp::Node* typeNode, size_t errm_count);
void parse_model(xmlpp::Node* modelNode, size_t errm_count);
#elif HAVE_TINYXML2
private:
void parse_xml(const tinyxml2::XMLDocument& doc);
void verify_hfstspeller(const tinyxml2::XMLElement& hfstspellerNode);
void parse_info(const tinyxml2::XMLElement& infoNode);
void parse_locale(const tinyxml2::XMLElement& localeNode);
void parse_title(const tinyxml2::XMLElement& titleNode);
void parse_description(const tinyxml2::XMLElement& descriptionNode);
void parse_version(const tinyxml2::XMLElement& versionNode);
void parse_date(const tinyxml2::XMLElement& dateNode);
void parse_producer(const tinyxml2::XMLElement& producerNode);
void parse_contact(const tinyxml2::XMLElement& contactNode);
void parse_acceptor(const tinyxml2::XMLElement& acceptorNode);
void parse_title(const tinyxml2::XMLElement& titleNode, const std::string& accName);
void parse_description(const tinyxml2::XMLElement& descriptionNode,
const std::string& accName);
void parse_errmodel(const tinyxml2::XMLElement& errmodelNode);
void parse_title(const tinyxml2::XMLElement& titleNode, size_t errm_count);
void parse_description(const tinyxml2::XMLElement& descriptionNode, size_t errm_count);
void parse_type(const tinyxml2::XMLElement& typeNode, size_t errm_count);
void parse_model(const tinyxml2::XMLElement& modelNode, size_t errm_count);
#endif
};
}
#endif // inclusion GUARD
// vim: set ft=cpp.doxygen:
<?xml version="1.0" encoding="utf-8"?>
<authors>
<author uid="mie">
Tommi A Pirinen &lt;tommi.pirinen@helsinki.fi&gt;
</author>
</authors>
This diff is collapsed.
## Process this file with autoconf to produce configure script
## Copyright (C) 2010 University of Helsinki
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# autoconf requirements
AC_PREREQ([2.62])
AC_INIT([hfstospell], [0.5.0], [hfst-bugs@helsinki.fi], [hfstospell], [http://hfst.github.io])
LT_PREREQ([2.2.6])
# init
AC_CONFIG_AUX_DIR([build-aux])
AM_INIT_AUTOMAKE([1.11 -Wall -Werror foreign check-news color-tests silent-rules])
AM_SILENT_RULES([yes])
AC_REVISION([$Revision$])
AC_CONFIG_MACRO_DIR([m4])
AC_CONFIG_SRCDIR([ospell.cc])
AC_CONFIG_HEADERS([config.h])
# Information on package
HFSTOSPELL_NAME=hfstospell
HFSTOSPELL_MAJOR=0
HFSTOSPELL_MINOR=5
HFSTOSPELL_EXTENSION=.0
HFSTOSPELL_VERSION=$HFSTOSPELL_MAJOR.$HFSTOSPELL_MINOR$HFSTOSPELL_EXTENSION
AC_SUBST(HFSTOSPELL_MAJOR)
AC_SUBST(HFSTOSPELL_MINOR)
AC_SUBST(HFSTOSPELL_VERSION)
AC_SUBST(HFSTOSPELL_NAME)
# Check for pkg-config first - the configuration won't work if it isn't available:
AC_PATH_PROG([PKGCONFIG], [pkg-config], [no])
AS_IF([test "x$PKGCONFIG" = xno], [AC_MSG_ERROR([pkg-config is required - please install])])
AC_PATH_PROG([DOXYGEN], [doxygen], [false])
AM_CONDITIONAL([CAN_DOXYGEN], [test "x$DOXYGEN" != xfalse])
# Settings
AC_ARG_ENABLE([extra_demos],
[AS_HELP_STRING([--enable-extra-demos],
[build conference demos for science reproduction @<:@default=no@:>@])],
[enable_extra_demos=$enableval], [enable_extra_demos=no])
AM_CONDITIONAL([EXTRA_DEMOS], [test x$enable_extra_demos != xno])
AC_ARG_ENABLE([hfst_ospell_office],
[AS_HELP_STRING([--enable-hfst-ospell-office],
[build hfst-ospell-office @<:@default=yes@:>@])],
[enable_hfst_ospell_ofiice=$enableval], [enable_hfst_ospell_office=yes])
AM_CONDITIONAL([HFST_OSPELL_OFFICE], [test x$enable_hfst_ospell_office != xno])
AC_ARG_ENABLE([zhfst],
[AS_HELP_STRING([--enable-zhfst],
[support zipped complex automaton sets @<:@default=check@:>@])],
[enable_zhfst=$enableval], [enable_zhfst=check])
AC_ARG_WITH([libxmlpp],
[AS_HELP_STRING([--with-libxmlpp],
[support xml metadata for zipped automaton sets with libxml++-2.6 @<:@default=yes@:>@])],
[with_libxmlpp=$withval], [with_libxmlpp=yes])
AC_ARG_WITH([tinyxml2],
[AS_HELP_STRING([--with-tinyxml2],
[support xml metadata for zipped automaton sets with tinyxml2 @<:@default=no@:>@])],
[with_tinyxml2=$withval], [with_tinyxml2=no])
AC_ARG_WITH([extract],
[AS_HELP_STRING([--with-extract=TARGET],
[extract zhfst archives to tmpdir or mem @<:@default=mem@:>@])],
[with_extract=$withval], [with_extract=mem])
AS_IF([test "x$with_extract" = xmem], [AC_DEFINE([ZHFST_EXTRACT_TO_MEM], [1],
[Define to extract zhfst archives to char buffer first])],
[AS_IF([test "x$with_extract" = xtmpdir],
[AC_DEFINE([ZHFST_EXTRACT_TO_MEM], [0],
[Define to extract zhfst to tmp dir first])],
[AC_MSG_ERROR([Use with-extract to mem or tmpdir])])])
# Checks for programs
m4_ifdef([AM_PROG_AR], [AM_PROG_AR])
AC_PROG_CC
AC_PROG_CXX
AC_LIBTOOL_WIN32_DLL
LT_INIT
AC_PROG_INSTALL
AC_PROG_LN_S
AC_PROG_MAKE_SET
# Checks for libraries
AS_IF([test x$enable_zhfst != xno],
[PKG_CHECK_MODULES([LIBARCHIVE], [libarchive > 3],
[AC_DEFINE([HAVE_LIBARCHIVE], [1], [Use archives])
enable_zhfst=yes],
[PKG_CHECK_MODULES([LIBARCHIVE], [libarchive > 2],
[AC_DEFINE([HAVE_LIBARCHIVE], [1], [Use archives])
AC_DEFINE([USE_LIBARCHIVE_2], [1], [Use libarchive2])
enable_zhfst=yes],
[AS_IF([test x$enable_zhfst != xcheck],
[AC_MSG_ERROR([zhfst support requires either libarchive or libarchive2])
enable_zhfst=no],
[enable_zhfst=no])])])])
AM_CONDITIONAL([WANT_ARCHIVE], [test x$enable_zhfst != xno])
AS_IF([test x$with_libxmlpp != xno],
[PKG_CHECK_MODULES([LIBXMLPP], [libxml++-2.6 >= 2.10.0],
[AC_DEFINE([HAVE_LIBXML], [1], [Use libxml++])
enable_xml=libxmlpp],
[AC_MSG_ERROR([libxml++ failed])
enable_xml=no])])
AM_CONDITIONAL([WANT_LIBXMLPP], [test x$enable_xml = xlibxmlpp])
AS_IF([test x$with_tinyxml2 != xno -a x$with_libxmlpp = xno],
[PKG_CHECK_MODULES([TINYXML2], [tinyxml2 >= 1.0.8 tinyxml2 < 3],
[AC_DEFINE([HAVE_TINYXML2], [1], [Use tinyxml])
enable_xml=tinyxml2],
[AC_MSG_ERROR([tinyxml missing])
enable_xml=no])])
AM_CONDITIONAL([WANT_TINYXML2], [test x$enable_xml = xtinyxml2])
# Find ICU in the new and old way
PKG_CHECK_MODULES(ICU, [icu-uc >= 4], [], [
AC_PATH_PROG([ICU_CONFIG], [icu-config], [false])
AS_IF([test x$ICU_CONFIG != xfalse], [
ICU_LIBS=$($ICU_CONFIG --ldflags)
])
])
LIBS="$LIBS $ICU_LIBS"
# Checks for header files
AC_CHECK_HEADERS([getopt.h error.h])
# Checks for types
AC_TYPE_SIZE_T
# Checks for structures
# Checks for compiler characteristics
AC_C_BIGENDIAN
# Checks for library functions
AC_FUNC_MALLOC
AC_CHECK_FUNCS([strndup error])
# Checks for system services
# Checks for highest supported C++ standard
AC_LANG(C++)
AX_CHECK_COMPILE_FLAG([-std=c++17], [CXXFLAGS="$CXXFLAGS -std=c++17"], [
AX_CHECK_COMPILE_FLAG([-std=c++1z], [CXXFLAGS="$CXXFLAGS -std=c++1z"], [
AX_CHECK_COMPILE_FLAG([-std=c++14], [CXXFLAGS="$CXXFLAGS -std=c++14"], [
AX_CHECK_COMPILE_FLAG([-std=c++1y], [CXXFLAGS="$CXXFLAGS -std=c++1y"], [
AX_CHECK_COMPILE_FLAG([-std=c++11], [CXXFLAGS="$CXXFLAGS -std=c++11"], [
AC_MSG_ERROR([could not enable C++11 or newer])
])
])
])
])
])
# config files
AC_CONFIG_FILES([Makefile hfstospell.pc])
# output
AC_OUTPUT
cat <<EOF
-- Building $PACKAGE_STRING
* zhfst support: $enable_zhfst
* extracting to: $with_extract
* xml support: $enable_xml
* hfst-ospell-office: $enable_hfst_ospell_office
* conference demos: $enable_extra_demos
EOF
AS_IF([test x$with_libxmlpp != xno -a x$with_tinyxml2 != xno],
[AC_MSG_ERROR([You can only have one xml library (e.g., --with-tinyxml2 --without-libxmlpp)])])
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>HFST ospell–Free WFST spell-checker and library</title>
</head>
<body>
<h1><img src="doc/html/edit2-small.png"
alt="[edit distance 2 automaton]"/>
HFST ospell</h1>
<p>
HFST ospell is a free open source spell-checker using weighted
finite-state automata. It is a light-weight library for using
combinations of two automata–a language model and an error model–for
spell-checking and correction.
</p>
<p>
It has optional support for XML-based metadata using libxml++2 or
tinyxml2. Automata compression is supported through libarchive,
currently with zip format.
</p>
<p>
The API of the library is stable to support updating the shared library
while keeping the automata and the plugins for enchant and LibreOffice
in place. The <a href="doc/html/">api documentation</a> is maintained with
doxygen.
</p>
<p>
You can download the library and small demo applications from
<a href="http://hfst.sf.net/">HFST’s main sourceforge site</a>.
</p>
</body>
</html>
edit2-small.png

21.1 KiB

This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment