Home »
Imported messages »
comp.lang.php »
PDF extract text
PDF extract text [message #185508] |
Mon, 07 April 2014 04:53 |
Philipp Kraus
Messages: 14 Registered: December 2010
Karma: 0
|
Junior Member |
|
|
Hello,
how can I extract text, images and other structures can be ignored,
with PHP from a PDF file?
We have a lot of LaTeX PDFs and Powerpoint PDFs and would like to
extract only the text content
to create a text analysis of the content eg for LaTeX scripts we would
like the chapter structure as well.
Is there any solution to do this with build-in PHP functions?
Thanks
Phil
|
|
|
|
|
|
|
Re: PDF extract text [message #185518 is a reply to message #185517] |
Mon, 07 April 2014 20:12 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma: 0
|
Senior Member |
|
|
Christoph Michael Becker wrote:
> Thomas 'PointedEars' Lahn wrote:
>> Philipp Kraus wrote:
>>> Is there any solution to do this with build-in PHP functions?
>> ^t
>> No.
>
> Well, there may not be a solution to do this with built-in PHP functions
> (whatever a built-in PHP function might be; actually (almost) all PHP
> functions are part of an extension), but at least *theoretically* it
> would be possible by processing the PDF file "bytewise". (The PDF
> specification is available online for free.)
*rolls eyes*
*bags collected trolls’ eyes*
--
PointedEars
Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.
|
|
|
Goto Forum:
Current Time: Wed Jan 22 20:31:28 GMT 2025
Total time taken to generate the page: 0.02322 seconds