PDF.Extractor for .NET
PDF.Extractor for .NET is a plug-in that aims to extract text from PDF document. It supports three modes of operation: pure, raw and plain. If the mode is not set by the developer, the default mode is 'raw'.
Features:
1. Extracts text from the PDF file.
2. Supports three types of mode: pure, raw, plain.
3. The default mode is 'raw'.
4. Supports combinations of the file path and file streams in input and output.
Extract Text from PDF Document via .NET Library
PDF.Extractor for .NET supports three types of operating mode:
1. Pure gives the possibility to extrat the text from the PDF file using several formatting procedures, which include taking into account relative positions and adding extra spaces align text to the width of the page.
2. Raw extracts text the PDF file without formatting it.
3. Plain extracts text from the PDF file considering relative positioning of the text fragments but (unlike the Pure Mode) without adding extra space.
If the mode isn't set by developer the default mode is 'Raw'
Sample code for extracting text with 'default' options:
|
|
Sample code to extract text with mode set:
|
|
Multiple input files (or streams) can be specified as input. Then the ResultCollection will contain the corresponding number of results. Example:
|
|
How to Extract Text from PDF Document
- Install PDF.Extractor for .NET.
- Create and object of TextDevice class.
- Use the object of TextExtractOptions class to specify extraction options.
- Save the text to the output file.
System Requirements
Just make sure that you have the following prerequisites.
- Microsoft Windows or a compatible OS with .NET Framework or .NET Core
- VBScript, Delphi, C++ via COM Interop.
- Development environment like Microsoft Visual Studio.
- Aspose.Imaging Conversion for .NET DLL referenced in your project.