PDFXML Conversion Library V2.0
Convert large volumes of PDF to XML. Bring new life to old documents. No more wasted hours recreating old content for new formats.
Unlock the Power of XML
Antenna House PDFXML Conversion Library V2.0 allows you to convert PDF to XML and unlock the content from your legacy PDFs. If you want to reuse content from old PDFs, you no longer need to retype or go through the trouble of reconstructing your documents’ content from the PDF binary format.
Antenna House PDFXML is designed for those organizations that need to convert large volumes of PDFs into XML, HTML5, XSL-FO, DocBook, or any other file formats. The Antenna House PDFXML Conversion Library extracts text, tables and images from PDFs and convert them to an XML format which we call “AHPDFXML”. The data can then be transformed to any desired output by applying XSLT stylesheets. Benefits and uses for XML include:
- Content Re-usability
- Improved Search-ability
- Good for Accessibility
- Promotes Interoperability and Data Integration
- Platform Independent
- Vendor Independent
- EMF has been added as an output option for image files
- The process for judging vertical pages from vertical writing symbols has been added when CID fonts with a mixture of vertical and horizontal writing are used.
- The process of judging the number of horizontal writing existing on the vertical writing page as the page number has been improved.
- The analysis of merged cells both vertical and horizontal has been improved.
To ensure the product meets user requirements, we encourage all potential customers to first try a free trial of the software by contacting us.
- Conversion Details
- PDF Support
- Product Limitations
- System Requirements
How it works:
- Loads the information for each page from the PDF
- Extracts vertical and horizontal lines from line drawings
- Analyzes the tables
- Creates text in the table cell
- Creates text lines of the body
- Creates paragraphs from lines
- Creates the area information from paragraphs
- Creates sections (columns)
- Outputs each page into XML (Antenna House PDFXML Format)
Antenna House PDFXML consists of multiple files:
- Catalog File (input file for stylesheets) – manages the AHPDFXML files
- Document File – stores the main body of a PDF document configuration
- Style File – defines the style applied to the respective elements of a document
- External Files – outputs JPEG, PNG, BMP, SVG, etc.
See Antenna House PDFXML Schema Documentation for more detail.
The resulting XML can then be transformed with XSLT to any format that displays the document structure such as XSL-FO, DocBook, HTML5, or simply text. With Antenna House PDFXML, you now have the means to take advantage of PDF content for a wide range of environments. Transforming PDF content to XML makes it much easier to reuse, transform, manipulate, and search for data. By applying an XSLT stylesheet, there is more flexibility to processing data depending on how it’s being used.
- Windows Server 2016
- Windows Server 2012 R2
- Windows Server 2008 R2
- Windows Server 2008 (32bit/64bit)
- Windows 10 (32bit/64bit)
- Windows 8.1 (32bit/64bit)
- Windows 7 (32bit/64bit)
Linux 64bit (built with GCC4.8)
- Linux Red Hat Enterprise series
- Linux CentOS series
- Linux Fedora series
- Needs Run Time Library libc.so.6 (glibc-2.17), libstdc++.so.6（libstdc++.so.6.0.19)
|Antenna House PDFXML Conversion Library V2.0||Price|
|Production Server + Development License||$10,000|