FUDforum: comp.lang.php » insert PDF table in database

Home » Imported messages » comp.lang.php » insert PDF table in database

Show: Today's Messages :: Polls :: Message Navigator

insert PDF table in database [message #181540]

Tue, 21 May 2013 07:04

sarika
Messages: 1
Registered: May 2013

Karma: 0

Junior Member

Hi All

What i want is to read content in PDF table and convert it into either XML or associative array to be inserted in database on the fly.

I have gone through many libraries on net providing text extraction from PDF and converting in array but that array does not seem to be useful as its not associative array and array indexing is also not proper.

Thanks in advance for the replies but i am really stuck with this major issue.
My project manager wants me to implement as soon as possible.

Report message to a moderator

Re: insert PDF table in database [message #181541 is a reply to message #181540]

Tue, 21 May 2013 08:48

Goran
Messages: 38
Registered: January 2011

Karma: 0

Member

On 21.5.2013 9:04, sarika wrote:
> My project manager wants me to implement as soon as possible.
>

Tell him "you do it" :)

Report message to a moderator

Re: insert PDF table in database [message #181547 is a reply to message #181540]

Tue, 21 May 2013 10:07

Jerry Stuckle
Messages: 2598
Registered: September 2010

Karma: 0

Senior Member

On 5/21/2013 3:04 AM, sarika wrote:
> Hi All
>
> What i want is to read content in PDF table and convert it into either XML or associative array to be inserted in database on the fly.
>
> I have gone through many libraries on net providing text extraction from PDF and converting in array but that array does not seem to be useful as its not associative array and array indexing is also not proper.
>
> Thanks in advance for the replies but i am really stuck with this major issue.
> My project manager wants me to implement as soon as possible.
>

It's impossible to help you when you don't show any code and don't tell
us *exactly* what's wrong with your code. "It's not an associative
array" and "array indexing is not proper" don't provide enough information.

If you want help, we need things like:

1). A sample input document
2). The code you're using
3). The expected output
4). The output you got

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================

Report message to a moderator

Re: insert PDF table in database [message #181548 is a reply to message #181540]

Tue, 21 May 2013 10:08

Arno Welzel
Messages: 317
Registered: October 2011

Karma: 0

Senior Member

sarika, 2013-05-21 09:04:

[...]
> My project manager wants me to implement as soon as possible.

And what is your manager willing to pay for a solution? ;-)

--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de

Report message to a moderator

Re: insert PDF table in database [message #181550 is a reply to message #181548]

Tue, 21 May 2013 11:46

The Natural Philosoph
Messages: 993
Registered: September 2010

Karma: 0

Senior Member

On 21/05/13 11:08, Arno Welzel wrote:
> sarika, 2013-05-21 09:04:
>
> [...]
>> My project manager wants me to implement as soon as possible.
> And what is your manager willing to pay for a solution? ;-)
>
>
The problem is that two apparently identical PDFs can be totally
different internally.

Yea even down to being simply bitmaps with no text, that at intended
resolution appear identical.

at best you can strip out most of the text and with a bit of
intelligence, sometimes even get it in the right order.

--
Ineptocracy

(in-ep-toc’-ra-cy) – a system of government where the least capable to lead are elected by the least capable of producing, and where the members of society least likely to sustain themselves or succeed, are rewarded with goods and services paid for by the confiscated wealth of a diminishing number of producers.

Report message to a moderator

Re: insert PDF table in database [message #181555 is a reply to message #181540]

Tue, 21 May 2013 16:27

Michael Vilain
Messages: 88
Registered: September 2010

Karma: 0

Member

In article <b6c1cfb3-1f8b-48c5-8822-25d10402d896(at)googlegroups(dot)com>,
sarika <sarikasoni12(at)gmail(dot)com> wrote:

> Hi All
>
> What i want is to read content in PDF table and convert it into either XML or
> associative array to be inserted in database on the fly.
>
> I have gone through many libraries on net providing text extraction from PDF
> and converting in array but that array does not seem to be useful as its not
> associative array and array indexing is also not proper.
>
> Thanks in advance for the replies but i am really stuck with this major
> issue.
> My project manager wants me to implement as soon as possible.

I ran across this problem with various bank statements that I downloaded
via my bank's personal web site. The PDFs were encrypted and set with
certain properties that didn't allow scanning of the text layer. Unless
you are able to decrypt and do OCR on the PDFs, you're wasting your time
here. The problem isn't as simple as your manager would think. At
best, you could offer a partial solution of being able to scan "some"
PDF files but without libraries to decrypt and OCR the text, that's all
you can do.

Those libraries are probably on-line somewhere for a fee. Buy the
solution if you're in a time crunch. Beating the fastest horse on your
team is poor project management skills and won't get him the code any
faster.

--
DeeDee, don't press that button! DeeDee! NO! Dee...
[I filter all Goggle Groups posts, so any reply may be automatically ignored]

Report message to a moderator

Re: insert PDF table in database [message #181560 is a reply to message #181555]

Tue, 21 May 2013 19:48

J.O. Aho
Messages: 194
Registered: September 2010

Karma: 0

Senior Member

On 21/05/13 18:27, Michael Vilain wrote:
> In article <b6c1cfb3-1f8b-48c5-8822-25d10402d896(at)googlegroups(dot)com>,
> sarika <sarikasoni12(at)gmail(dot)com> wrote:
>
>> Hi All
>>
>> What i want is to read content in PDF table and convert it into either XML or
>> associative array to be inserted in database on the fly.
>>
>> I have gone through many libraries on net providing text extraction from PDF
>> and converting in array but that array does not seem to be useful as its not
>> associative array and array indexing is also not proper.
>>
>> Thanks in advance for the replies but i am really stuck with this major
>> issue.
>> My project manager wants me to implement as soon as possible.
>
> I ran across this problem with various bank statements that I downloaded
> via my bank's personal web site. The PDFs were encrypted and set with
> certain properties that didn't allow scanning of the text layer. Unless
> you are able to decrypt and do OCR on the PDFs, you're wasting your time
> here. The problem isn't as simple as your manager would think. At
> best, you could offer a partial solution of being able to scan "some"
> PDF files but without libraries to decrypt and OCR the text, that's all
> you can do.
>
> Those libraries are probably on-line somewhere for a fee. Buy the
> solution if you're in a time crunch. Beating the fastest horse on your
> team is poor project management skills and won't get him the code any
> faster.
>

Most likely the company said they can do this to their western customer,
then a manager gets the task to see to that his team solves the problem,
the work is then pushed to a "shadow resource" who looks for solutions
online. If not managing to solve the issue, there is always hundreds of
others to replace that person with. At least that is my experience how
things work in India.

--

//Aho

Report message to a moderator

Previous Topic:	no date header in mail()
Next Topic:	values not changing

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Wed Nov 27 11:49:20 GMT 2024

Total time taken to generate the page: 0.02432 seconds