Text Mining Tool. Vitaliy sends me notice of his new freeware product which quickly and easily grabs the text from inside PDF, HTML and other files. I tried it on a 76 page PDF e-book I had lying around and it worked a treat, extracting all the text into one easily copy and paste friendly chunk. I can think of lots of uses for this in terms of web site text, PDF text extraction without Adobe software etc etc. You’ll need the .NET framework to use it unfortunately, but it’s still a great tool anyway.
Text Mining Tool is a freeware program for extraction of text from files of the next types: pdf, doc, rtf, chm, html without need to have installed any other programs like Word, Arcrobat, etc. The beauty of the program is that it works, extremely simply, on almost all common forms of documents. That includes HTML web pages, both DOC and RTF document formats from Microsoft Word and others like Open Office, Windows Help files ending in CHM, and portable documents using PDF format.
Very good program. What’s it can work in console batch mode extracting text from many files all together!