PHP Classes
elePHPant
Icontem

What is the best PHP search string in pdf class?: Search string in PDF and return page number

Recommend this page to a friend!
  All requests RSS feed  >  What is the best PHP search string in...  >  Request new recommendation  >  A request is featured when there is no good recommended package on the site when it is posted. Featured requests  >  No recommendations No recommendations  

What is the best PHP search string in pdf class?

A request is featured when there is no good recommended package on the site when it is posted. Edit

by srizoophari - 2 years ago (2016-05-03)

Search string in PDF and return page number

This request is clear and relevant.
This request is not clear or is not relevant.

+3

I need a library or class to search some string in PDF and return the matched string page number.

  • 1 Clarification request
  • 1. by Manuel Lemos - 2 years ago (2016-05-06) Reply

    There are classes to extract PDF to text but also return the original page of the text I am not sure if the existing ones can do it.

    Ask clarification

    2 Recommendations

    PHP PDF to HTML: Convert PDF to HTML using Poppler

    This recommendation solves the problem.
    This recommendation does not solve the problem.

    0

    by Anton N Nikolaev package author package author Reputation 180 - 1 year ago (2016-12-02) Comment

    I like it.


    PHP PDF to Text: Extract text contents from PDF files

    This recommendation solves the problem.
    This recommendation does not solve the problem.

    +2

    by Christian Vigh package author package author Reputation 380 - 2 years ago (2016-05-06) Comment

    I have made a class to extract text contents from pdf files ; however it does not take care of the page number. Maybe it could be a first step ?

    • 7 Comments
    • 1. by Manuel Lemos - 2 years ago (2016-05-09) Reply

      It would be better if you could count pages to also give the page number of each text block. Is that difficult?

    • 2. by Christian Vigh package author package author - 2 years ago (2016-05-16) in reply to comment 1 by Manuel Lemos Reply

      well, it could range from somewhere between tedious and a nightmare... :-) I'm kidding ; in fact, I already put that on my to-do list when posting my initial answer because, although my original concern was only extracting text, I thought it was a good idea to be able to locate text in the whole document.

      I will add a "Pages" array property that will contain the text of individual pages. I will also add a GetPageOf ( $offset ) that will return the page number given a byte offset in the Text property. And maybe, some methods to simply find the page number(s) of some text.

      I think everything should be ready by the end of this week.

    • 3. by Manuel Lemos - 2 years ago (2016-05-17) in reply to comment 2 by Christian Vigh Reply

      Great. That would make your package innovative. There are already classes to extract text from PDF but none would get the pages of the text objects.

    • 3. by Manuel Lemos - 2 years ago (2016-05-17) in reply to comment 2 by Christian Vigh Reply

      Great. That would make your package innovative. There are already classes to extract text from PDF but none would get the pages of the text objects.

    • 4. by Christian Vigh package author package author - 2 years ago (2016-05-20) in reply to comment 3 by Manuel Lemos Reply

      Hi everybody,

      I'm glad to announce that the PdfToText class is now able to retrieve the page number of any text located in a pdf document.

      7 new methods are available to retrieve this information : GetPageFromOffset, text_strpos/text_stripos, document_strpos/document_stripos, and text_match/document_match (see README.md).

      There is also a Pages array property that holds the text contents of individual pages in the document

    • 5. by Manuel Lemos - 2 years ago (2016-05-20) in reply to comment 4 by Christian Vigh Reply

      That is great. I have not seen a package, PHP or other language that could do that.

    • 6. by Marcelo - 2 months ago (2018-07-24) in reply to comment 5 by Manuel Lemos Reply

      Hello Christian, I have very large PDFs (200MB) and I can not extract all the text from them. Would you have any solution for this? Within these PDFs there are images too, so the size is excessive. I just need the text. Your function can read the file but can not process. I await your suggestion.


    Recommend package
    : 
    :