The "GetText" utility of Kryloff Technologies, Inc. (http://www.kryltech.com)

 
1. General information
"GetText.exe" is a free console file-to-text conversion utility, which extracts textual content from HTML, MS Office®, RTF, PDF®, HLP and other documents, and saves it into text files. To perform text conversion, GetText uses KT Text Filters.

Important: Text Filters are provided as free components for Kryloff's products only. If you wish to use them in your own software applications, you should purchase a corresponding license. Apart from the right to use KT Text Filters in and distribute them along with your software products world-wide any royalty-free, upon purchasing a license you will be provided with:

  • enhanced versions of the filters along with full documentation disclosing additional capabilities that are not made publicly available (such as, the possibility to provide Unicode output, which has been excluded from GetText; proper selection of an applicable filter DLL, and some others);
  • sample code in C++ and Borland Delphi demonstrating the use of KT Text Filters for memory-to-memory, memory-to-file, file-to-memory, and file-to-file filtering;
  • also, you will be enrolled into the Kryloff Technologies technical support.

    If you have not yet obtained a license from Kryloff Technologies, you may not distribute any of the filters nor you may count on our technical support. See also: KT Text Filters End-User License Agreement.

    KT Text Filters are also used in the rest of the Kryloff Subject Search™ family of products, namely:
      Subject Search Spider your personal information retrieval intelligent Web engine.
      Subject Search Scanner scans files on local and network drives looking for a given phrase.
      Subject Search Siter investigates Web sites looking for a phrase and finds information buried in them.
      Subject Search Pad opens documents in text mode and locates files with similar contents.
      Subject Search Summarizer creates brief summaries of and translates documents or Web pages you are reading.
      Subject Search Sleuth searches in huge collections of files on PC and LAN; includes API for developers.
      Subject Search Server lets visitors search your Web site for the information they are looking for.


    2. Calling GetText
    2.1. GetText.exe Free Edition accepts the following command-line parameters: Source Document, Destination Text File, and optionally, KT Text Filter. To extract textual contents from a single document, use the following command either under the Windows Command Prompt or in a batch file:
    GetText.exe "Full or relative path to Source Document" "Full or relative path to Destination Text File" or
    GetText.exe "Source Document" "Destination Text File" "KT Filter DLL File Name"
    Enclose command-line parameters into the quotation marks (") if they contain one or more spaces.

    If the optional parameter KT Text Filter is not specified, GetText scans the "Filters" subfolder of its root folder and locates an appropriate Text Filter in the following order: first, DLL's which do not contain "98" as part of the file name, after which the rest of the files; this order remains the same regardless of the platform. The required filter is selected solely upon the extension of the file being filtered without reading its contents. For example, to convert "MyFile.doc", GetText selects "DOCDLL.dll", etc.

    2.2. You may call GetText.exe directly for example, by selecting the Windows menu items "Start", then "Run", after which typing in the full path to GetText.exe following by corresponding command-line parameters. To extract text from several documents at a time, you may also use it in batch (.BAT) files.

    2.3. Examples:
    a) to obtain textual contents of "c:\My Documents\My File.htm" and save it into the file "c:\My Documents\My Filtered File.txt", issue the following command (it is assumed further that you have placed "GetText.exe" into the folder "c:\Kryloff"):
    c:\Kryloff\GetText.exe "c:\My Documents\My File.htm" "c:\My Documents\My Filtered File.txt"

    b) If you want GetText to apply a particular KT Text Filter, specify the third parameter. For example:
    c:\Kryloff\GetText.exe "c:\My Documents\My File.htm" "c:\My Documents\My Filtered File.txt" HTMDLL.dll

    c) to filter several documents with one command, first, create a batch file (file with the ".BAT" extension) using any text editor. For example, this one:
    c:\Kryloff\GetText.exe "c:\My Documents\File1.htm" "c:\My Documents\File1.htm.txt" HTML98ME.dll
    c:\Kryloff\GetText.exe "c:\My Documents\File2.doc" "c:\My Documents\File2.doc.txt"
    ...
    c:\Kryloff\GetText.exe "c:\My Documents\File100.xls" "c:\My Documents\File100.xls.txt" XLSDLL.dll
    After composing a batch file, just execute it: for example, double-click its icon in the Windows Explorer. Once you do it, all documents mentioned in the batch file will be processed, and corresponding textual (.txt) files will be created. You may compose even more complex batch files; for example, you may check the GetText.exe Exit Code by including the "IF ERRORLEVEL" statements after each call to GetText -- the utility terminates with a non-zero exit code when it fails to filter a particular file due to some reason (which GetText displays as well).

    GetText.exe displays this document only when you call it without command-line parameters; to prevent the appearance of this window, call the utility as specified above in this section.


    3. Filtering components included into the original shipment
    Kryloff Technologies. Inc. supplies GetText with the following filters:
    Filter file nameDescriptionMinimum platforms requiredFile extensions supported
    HLP2TXT.dllconverts MS Help (.HLP) files into plain textWindows 95 and laterHLP
    HTML98ME.dllconverts HTM and HTML files into text*Windows 95 and laterHTM, HTML, HTW, ASCX, ASP, ASPX, HHC, HTX, ODC, STM, XML
    HTMDLL.dllconverts HTM and HTML files into text**Windows 2000, XP and laterHTM, HTML, HTW, ASCX, ASP, ASPX, HHC, HTX, ODC, STM
    PDF2TXT.dllconverts Adobe PDF® files into textWindows 95 and laterPDF
    PPTDLL.dllconverts MS PowerPoint® (.PPT) presentations into textWindows 2000, XP and laterPPT, POT, PPS
    RTF2TXT.dllextracts text from RTF (Rich Text) files**Windows 95 and laterRTF
    UNCD2TXT.dllconverts Unicode TXT files into plain text ones**Windows 95 and laterTXT, LST, INI, LOG, CSS, INF, SCP, SCT, WSC, WTX, ZAP
    WPD2TXT.dllextracts plain text from WPD (Word Perfect®) filesWindows 95 and laterWPD
    XLSDLL.dllextracts text from MS Excel® (.XLS) spreadsheetsWindows 2000, XP and laterXLS, XLB, XLC, XLT
    DOCDLL.dllconverts MS Word® (.DOC) documents into text***Windows 2000, XP and laterDOC, DOT
    *       Recommended for use under Windows 95/98/ME/NT 4.0 as a replacement for HTMDLL.dll under these platforms.
    **     Recommended for use under Windows 2000, XP, 2003 and higher editions.
    *** GetText.exe skips files which have the .DOC extension but are actually RTF ones. A more advanced procedure of selecting an appropriate KT Text Filter will be given to you upon purchasing a license to use KT Text Filters.

    GetText selects a required filter solely upon the extension of the file being filtered without checking its contents . For example, to convert "MyFile.doc", applied is "DOCDLL.dll", etc. If the filter which performs the required type of conversion is not found in the "Filters" subfolder, the utility copies the source file into the destination text file without any changes. If you execute GetText under an obsolete platform in which some of the filters do not function (for example, when you run GetText under Windows 98 and instruct it to process a MS Word® file), the utility produces a sound warning you of the necessity to execute it under a higher version of MS Windows®.

    Important: As it has been mentioned above, GetText.exe builds plain text files only. The ability to generate text files in Unicode comes to you with purchasing a license to use and distribute KT Text Filters.


    4. Using, distributing, and purchasing KT Text Filters
    If you have obtained the GetText utility or/and KT Text Filters from the Kryloff Technologies Web site or other sources, and have not purchased a license to use the filters yet, you may use GetText and KT Text Filters personally on one computer on the royalty-free basis (as long as you need). To redistribute or reproduce any components of the software, either in part or in whole, you must purchase a license. Any reproduction or redistribution of GetText, KT Text Filters, supplementary files and documentation not in accordance with the KT Text Filters End-User License Agreement is expressly prohibited by law, and may result in severe civil and criminal penalties.

    Additional information about your right to use and (re-)distribute GetText and KT Text Filters is provided in the KT Text Filters End-User License Agreement, Basic License. Should you need to obtain the source code of KT Text Filters (for example, to use them under operating systems other than Windows, etc.), have any other questions of concerns regarding GetText or KT Text Filters, contact Kryloff Technologies at http://www.kryltech.com/feedback.htm


    5. System requirements

  • Windows 95/98/ME or Windows NT 4.0/2000/XP/2003 (painted in bold are platforms under which the entire functionality of the utility becomes available);
  • 16 Mb of memory;
  • Copyright © Kryloff Technologies, Inc. http://www.kryltech.com

    KT Text filters™, Subject Search™, Subject Search Spider (SSSpider™), Subject Search Scanner (SSScanner™), Subject Search Siter (SSSiter™), Subject Search Pad™ (SSPad™), Subject Search Summarizer (SSSummarizer™), Subject Search Sleuth (SSSleuth™), Subject Search Server (SSServer™), and Subject Search Suite™ (SSSuite™) are trademarks of Kryloff Technologies, Inc. Other products or companies mentioned in this document are copyright and/or trademarks of the respective companies.