You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
137 lines
3.3 KiB
137 lines
3.3 KiB
.\" Copyright 1997-2007 Glyph & Cog, LLC
|
|
.TH pdftotext 1 "27 Febuary 2007"
|
|
.SH NAME
|
|
pdftotext \- Portable Document Format (PDF) to text converter
|
|
(version 3.02)
|
|
.SH SYNOPSIS
|
|
.B pdftotext
|
|
[options]
|
|
.RI [ PDF-file
|
|
.RI [ text-file ]]
|
|
.SH DESCRIPTION
|
|
.B Pdftotext
|
|
converts Portable Document Format (PDF) files to plain text.
|
|
.PP
|
|
Pdftotext reads the PDF file,
|
|
.IR PDF-file ,
|
|
and writes a text file,
|
|
.IR text-file .
|
|
If
|
|
.I text-file
|
|
is not specified, pdftotext converts
|
|
.I file.pdf
|
|
to
|
|
.IR file.txt .
|
|
If
|
|
.I text-file
|
|
is \'-', the text is sent to stdout.
|
|
.SH CONFIGURATION FILE
|
|
Pdftotext reads a configuration file at startup. It first tries to
|
|
find the user's private config file, ~/.xpdfrc. If that doesn't
|
|
exist, it looks for a system-wide config file, typically
|
|
/usr/local/etc/xpdfrc (but this location can be changed when pdftotext
|
|
is built). See the
|
|
.BR xpdfrc (5)
|
|
man page for details.
|
|
.SH OPTIONS
|
|
Many of the following options can be set with configuration file
|
|
commands. These are listed in square brackets with the description of
|
|
the corresponding command line option.
|
|
.TP
|
|
.BI \-f " number"
|
|
Specifies the first page to convert.
|
|
.TP
|
|
.BI \-l " number"
|
|
Specifies the last page to convert.
|
|
.TP
|
|
.B \-layout
|
|
Maintain (as best as possible) the original physical layout of the
|
|
text. The default is to \'undo' physical layout (columns,
|
|
hyphenation, etc.) and output the text in reading order.
|
|
.TP
|
|
.B \-raw
|
|
Keep the text in content stream order. This is a hack which often
|
|
"undoes" column formatting, etc. Use of raw mode is no longer
|
|
recommended.
|
|
.TP
|
|
.B \-htmlmeta
|
|
Generate a simple HTML file, including the meta information. This
|
|
simply wraps the text in <pre> and </pre> and prepends the meta
|
|
headers.
|
|
.TP
|
|
.BI \-enc " encoding-name"
|
|
Sets the encoding to use for text output. The
|
|
.I encoding\-name
|
|
must be defined with the unicodeMap command (see
|
|
.BR xpdfrc (5)).
|
|
The encoding name is case-sensitive. This defaults to "Latin1" (which
|
|
is a built-in encoding).
|
|
.RB "[config file: " textEncoding ]
|
|
.TP
|
|
.BI \-eol " unix | dos | mac"
|
|
Sets the end-of-line convention to use for text output.
|
|
.RB "[config file: " textEOL ]
|
|
.TP
|
|
.B \-nopgbrk
|
|
Don't insert page breaks (form feed characters) between pages.
|
|
.RB "[config file: " textPageBreaks ]
|
|
.TP
|
|
.BI \-opw " password"
|
|
Specify the owner password for the PDF file. Providing this will
|
|
bypass all security restrictions.
|
|
.TP
|
|
.BI \-upw " password"
|
|
Specify the user password for the PDF file.
|
|
.TP
|
|
.B \-q
|
|
Don't print any messages or errors.
|
|
.RB "[config file: " errQuiet ]
|
|
.TP
|
|
.BI \-cfg " config-file"
|
|
Read
|
|
.I config-file
|
|
in place of ~/.xpdfrc or the system-wide config file.
|
|
.TP
|
|
.B \-v
|
|
Print copyright and version information.
|
|
.TP
|
|
.B \-h
|
|
Print usage information.
|
|
.RB ( \-help
|
|
and
|
|
.B \-\-help
|
|
are equivalent.)
|
|
.SH BUGS
|
|
Some PDF files contain fonts whose encodings have been mangled beyond
|
|
recognition. There is no way (short of OCR) to extract text from
|
|
these files.
|
|
.SH EXIT CODES
|
|
The Xpdf tools use the following exit codes:
|
|
.TP
|
|
0
|
|
No error.
|
|
.TP
|
|
1
|
|
Error opening a PDF file.
|
|
.TP
|
|
2
|
|
Error opening an output file.
|
|
.TP
|
|
3
|
|
Error related to PDF permissions.
|
|
.TP
|
|
99
|
|
Other error.
|
|
.SH AUTHOR
|
|
The pdftotext software and documentation are copyright 1996-2007 Glyph
|
|
& Cog, LLC.
|
|
.SH "SEE ALSO"
|
|
.BR xpdf (1),
|
|
.BR pdftops (1),
|
|
.BR pdfinfo (1),
|
|
.BR pdffonts (1),
|
|
.BR pdftoppm (1),
|
|
.BR pdfimages (1),
|
|
.BR xpdfrc (5)
|
|
.br
|
|
.B http://www.foolabs.com/xpdf/
|
|
|