insert PDF table in database [message #181540] |
Tue, 21 May 2013 07:04 |
sarika
Messages: 1 Registered: May 2013
Karma: 0
|
Junior Member |
|
|
Hi All
What i want is to read content in PDF table and convert it into either XML or associative array to be inserted in database on the fly.
I have gone through many libraries on net providing text extraction from PDF and converting in array but that array does not seem to be useful as its not associative array and array indexing is also not proper.
Thanks in advance for the replies but i am really stuck with this major issue.
My project manager wants me to implement as soon as possible.
|
|
|
|
Re: insert PDF table in database [message #181547 is a reply to message #181540] |
Tue, 21 May 2013 10:07 |
Jerry Stuckle
Messages: 2598 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 5/21/2013 3:04 AM, sarika wrote:
> Hi All
>
> What i want is to read content in PDF table and convert it into either XML or associative array to be inserted in database on the fly.
>
> I have gone through many libraries on net providing text extraction from PDF and converting in array but that array does not seem to be useful as its not associative array and array indexing is also not proper.
>
> Thanks in advance for the replies but i am really stuck with this major issue.
> My project manager wants me to implement as soon as possible.
>
It's impossible to help you when you don't show any code and don't tell
us *exactly* what's wrong with your code. "It's not an associative
array" and "array indexing is not proper" don't provide enough information.
If you want help, we need things like:
1). A sample input document
2). The code you're using
3). The expected output
4). The output you got
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
|
|
|
|
Re: insert PDF table in database [message #181550 is a reply to message #181548] |
Tue, 21 May 2013 11:46 |
The Natural Philosoph
Messages: 993 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 21/05/13 11:08, Arno Welzel wrote:
> sarika, 2013-05-21 09:04:
>
> [...]
>> My project manager wants me to implement as soon as possible.
> And what is your manager willing to pay for a solution? ;-)
>
>
The problem is that two apparently identical PDFs can be totally
different internally.
Yea even down to being simply bitmaps with no text, that at intended
resolution appear identical.
at best you can strip out most of the text and with a bit of
intelligence, sometimes even get it in the right order.
--
Ineptocracy
(in-ep-toc’-ra-cy) – a system of government where the least capable to lead are elected by the least capable of producing, and where the members of society least likely to sustain themselves or succeed, are rewarded with goods and services paid for by the confiscated wealth of a diminishing number of producers.
|
|
|
Re: insert PDF table in database [message #181555 is a reply to message #181540] |
Tue, 21 May 2013 16:27 |
Michael Vilain
Messages: 88 Registered: September 2010
Karma: 0
|
Member |
|
|
In article <b6c1cfb3-1f8b-48c5-8822-25d10402d896(at)googlegroups(dot)com>,
sarika <sarikasoni12(at)gmail(dot)com> wrote:
> Hi All
>
> What i want is to read content in PDF table and convert it into either XML or
> associative array to be inserted in database on the fly.
>
> I have gone through many libraries on net providing text extraction from PDF
> and converting in array but that array does not seem to be useful as its not
> associative array and array indexing is also not proper.
>
> Thanks in advance for the replies but i am really stuck with this major
> issue.
> My project manager wants me to implement as soon as possible.
I ran across this problem with various bank statements that I downloaded
via my bank's personal web site. The PDFs were encrypted and set with
certain properties that didn't allow scanning of the text layer. Unless
you are able to decrypt and do OCR on the PDFs, you're wasting your time
here. The problem isn't as simple as your manager would think. At
best, you could offer a partial solution of being able to scan "some"
PDF files but without libraries to decrypt and OCR the text, that's all
you can do.
Those libraries are probably on-line somewhere for a fee. Buy the
solution if you're in a time crunch. Beating the fastest horse on your
team is poor project management skills and won't get him the code any
faster.
--
DeeDee, don't press that button! DeeDee! NO! Dee...
[I filter all Goggle Groups posts, so any reply may be automatically ignored]
|
|
|
Re: insert PDF table in database [message #181560 is a reply to message #181555] |
Tue, 21 May 2013 19:48 |
J.O. Aho
Messages: 194 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 21/05/13 18:27, Michael Vilain wrote:
> In article <b6c1cfb3-1f8b-48c5-8822-25d10402d896(at)googlegroups(dot)com>,
> sarika <sarikasoni12(at)gmail(dot)com> wrote:
>
>> Hi All
>>
>> What i want is to read content in PDF table and convert it into either XML or
>> associative array to be inserted in database on the fly.
>>
>> I have gone through many libraries on net providing text extraction from PDF
>> and converting in array but that array does not seem to be useful as its not
>> associative array and array indexing is also not proper.
>>
>> Thanks in advance for the replies but i am really stuck with this major
>> issue.
>> My project manager wants me to implement as soon as possible.
>
> I ran across this problem with various bank statements that I downloaded
> via my bank's personal web site. The PDFs were encrypted and set with
> certain properties that didn't allow scanning of the text layer. Unless
> you are able to decrypt and do OCR on the PDFs, you're wasting your time
> here. The problem isn't as simple as your manager would think. At
> best, you could offer a partial solution of being able to scan "some"
> PDF files but without libraries to decrypt and OCR the text, that's all
> you can do.
>
> Those libraries are probably on-line somewhere for a fee. Buy the
> solution if you're in a time crunch. Beating the fastest horse on your
> team is poor project management skills and won't get him the code any
> faster.
>
Most likely the company said they can do this to their western customer,
then a manager gets the task to see to that his team solves the problem,
the work is then pushed to a "shadow resource" who looks for solutions
online. If not managing to solve the issue, there is always hundreds of
others to replace that person with. At least that is my experience how
things work in India.
--
//Aho
|
|
|