Discussion:
[scribus] script to convert e-mail and web addresses to PDF links
Gary Dale
2018-11-07 23:05:10 UTC
Permalink
New to Scribus scripting and because of the version of Scribus and
Ghostscript I'm running, exporting to PDFs is a pain. I have to save the
.sla file then fire up my laptop, which is running an older version of
Debian, to create the PDF. Debian/Buster has a problem with importing
postscript & PDFs and with creating PDFs.

However the far larger problem I'm currently having is that I have a
hundred-page document with hundreds of e-mail addresses and a lot of web
addresses that I'd like to automatically locate and create external
links for. Doing this by hand will take forever.

After manually creating a few then looking at the .sla file, I note that
the PDF links are, as expected, not really associated with the text.
Instead, as the create sequence implies, they are associated with a bit
of real estate on the page that (hopefully) aligns with some underlying
text or image.

This suggests that a script would need to:

1) seek e-mail addresses (identified by text that looks like an e-mail
address),

2) identify the position and size of the address on the page,

3) create a PDF link element with that address.

Has anyone seen / used such a script or something similar that I can
work from?

Thanks for any assistance.


___
Scribus Mailing List: ***@lists.scribus.net
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.scribus.net
ale rimoldi
2018-11-09 08:23:42 UTC
Permalink
hi gare
Post by Gary Dale
New to Scribus scripting and because of the version of Scribus and
Ghostscript I'm running, exporting to PDFs is a pain. I have to save
the .sla file then fire up my laptop, which is running an older
version of Debian, to create the PDF. Debian/Buster has a problem
with importing postscript & PDFs and with creating PDFs.
However the far larger problem I'm currently having is that I have a
hundred-page document with hundreds of e-mail addresses and a lot of
web addresses that I'd like to automatically locate and create
external links for. Doing this by hand will take forever.
After manually creating a few then looking at the .sla file, I note
that the PDF links are, as expected, not really associated with the
text. Instead, as the create sequence implies, they are associated
with a bit of real estate on the page that (hopefully) aligns with
some underlying text or image.
1) seek e-mail addresses (identified by text that looks like an
e-mail address),
2) identify the position and size of the address on the page,
3) create a PDF link element with that address.
Has anyone seen / used such a script or something similar that I can
work from?
i don't think there is a way to do this through scripting.
and automatically adding the pdf link frames from c++ is imo not
a good idea at all.

one simple solution, is to rely on the pdf reader to make links
clickable. many readers do it.

the other way -- a more long term one -- is to come up with a good way
to do that through "formatting".
does anybody have an idea how to edit and display links in scribus.
if yes, please make a proposal in this list or add a ticket to the bug
tracker (https://bugs.scribus.net).
it's probably not trivial to find the optimal way to do that, but me
might have a few very clever heads around here...
just keep in mind that scribus is mainly a tool for creating pdf that
will be professional printed, so the solution will probably have to
ensure that printed document do look good! (probably, you will not have
underlined urls in a printed document... but you might want them
highlighted in a pdf for online reading...)

sorry, that i cannot give you *the* solution... maybe, somebody
will come up with a better idea than the ones of mine...

ciao
a.l.e

___
Scribus Mailing List: ***@lists.scribus.net
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.scribus.net
Jonas Bechtel
2018-11-09 08:52:27 UTC
Permalink
On Fri, 9 Nov 2018 09:23:42 +0100
Post by ale rimoldi
hi gare
Post by Gary Dale
New to Scribus scripting and because of the version of Scribus and
Ghostscript I'm running, exporting to PDFs is a pain. I have to save
the .sla file then fire up my laptop, which is running an older
version of Debian, to create the PDF. Debian/Buster has a problem
with importing postscript & PDFs and with creating PDFs.
However the far larger problem I'm currently having is that I have
a hundred-page document with hundreds of e-mail addresses and a lot
of web addresses that I'd like to automatically locate and create
external links for. Doing this by hand will take forever.
After manually creating a few then looking at the .sla file, I note
that the PDF links are, as expected, not really associated with the
text. Instead, as the create sequence implies, they are associated
with a bit of real estate on the page that (hopefully) aligns with
some underlying text or image.
1) seek e-mail addresses (identified by text that looks like an
e-mail address),
2) identify the position and size of the address on the page,
3) create a PDF link element with that address.
Has anyone seen / used such a script or something similar that I
can work from?
i don't think there is a way to do this through scripting.
and automatically adding the pdf link frames from c++ is imo not
a good idea at all.
one simple solution, is to rely on the pdf reader to make links
clickable. many readers do it.
the other way -- a more long term one -- is to come up with a good way
to do that through "formatting".
does anybody have an idea how to edit and display links in scribus.
if yes, please make a proposal in this list or add a ticket to the bug
tracker (https://bugs.scribus.net).
it's probably not trivial to find the optimal way to do that, but me
might have a few very clever heads around here...
just keep in mind that scribus is mainly a tool for creating pdf that
will be professional printed, so the solution will probably have to
ensure that printed document do look good! (probably, you will not
have underlined urls in a printed document... but you might want them
highlighted in a pdf for online reading...)
sorry, that i cannot give you *the* solution... maybe, somebody
will come up with a better idea than the ones of mine...
ciao
a.l.e
___
http://lists.scribus.net/mailman/listinfo/scribus
http://wiki.scribus.net
http://forums.scribus.net
Hi there

Some time ago I did something similar. This involved following steps:

* Export file as pdf (Stage 1)
* Use an external tool to find keywords with positions
* Re-use these positions with a python script in Scribus
* Export file as pdf (Stage 2 - final)

The key problem is that you cannot link/reference single words of a text frame from python scripter engine. But this has to be implemented in C++ and I've got no time. (Don't even be able to update the python3 patch to development branch which seems to be moving.)

BR
Jonas





___
Scribus Mailing List: ***@lists.scribus.net
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.scribus.net
JLuc
2018-11-09 12:11:17 UTC
Permalink
Post by Jonas Bechtel
* Export file as pdf (Stage 1)
* Use an external tool to find keywords with positions
what kind of tool was that ?
and what kind of data did it output ?

JL
Post by Jonas Bechtel
* Re-use these positions with a python script in Scribus
* Export file as pdf (Stage 2 - final)
The key problem is that you cannot link/reference single words of a text frame from python scripter engine. But this has to be implemented in C++ and I've got no time. (Don't even be able to update the python3 patch to development branch which seems to be moving.)
___
Scribus Mailing List: ***@lists.scribus.net
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See als
Jonas Bechtel
2018-11-09 17:03:34 UTC
Permalink
On Fri, 9 Nov 2018 13:11:17 +0100
Post by JLuc
Post by Jonas Bechtel
* Export file as pdf (Stage 1)
* Use an external tool to find keywords with positions
what kind of tool was that ?
and what kind of data did it output ?
I use: pdftotext -bbox infile.pdf outfile.xml

Given a page which contains the ending of sentence "[Diese
Nachbearbeitung der vorläufigen Zeitpunkte bewirkt, dass eine Vibration
zu einem zukünftigen Zeitpunkt (also 1-9 Messwerte später) nicht zum
aktuellen Zeitpunkt als Bewegung gewertet wird, während] ansonsten die
korrekten Bewegungsmeldungen beibehalten werden."

The tool takes lots of CPU power.


It outputs following:
<page width="595.280000" height="841.890000">
<word xMin="33.579210" yMin="46.010160" xMax="81.819210"
yMax="58.970160">ansonsten</word>
<word xMin="85.719210" yMin="46.010160" xMax="100.155210"
yMax="58.970160">die</word>
<word xMin="104.055210" yMin="46.010160" xMax="149.811210"
yMax="58.970160">korrekten</word>
<word xMin="153.711210" yMin="46.010160" xMax="260.043210"
yMax="58.970160">Bewegungsmeldungen</word>
<word xMin="263.943210" yMin="46.010160" xMax="319.059210"
yMax="58.970160">beibehalten</word>
<word xMin="322.959210" yMin="46.010160" xMax="360.807210"
yMax="58.970160">werden.</word>


That get's parsed by python.

I used it for my study thesis, see
jbechtel.de/site/Tools/Zellortung/

I had a in-text scripting language which had ##IF(), ##SET(), ##GET(),
etc. functions. The actual function using word position was ##PAGEOF()
which could find words in the document after stage 1 has run. (Always
returned "1" before stage 1 has run)

The most relevant file is
Zellortung-0.451_Ausschnitt.tar.bz2:/doc/Studienarbeit/Switch_Text.py


BR
Jonas
Post by JLuc
JL
Post by Jonas Bechtel
* Re-use these positions with a python script in Scribus
* Export file as pdf (Stage 2 - final)
The key problem is that you cannot link/reference single words of a
text frame from python scripter engine. But this has to be
implemented in C++ and I've got no time. (Don't even be able to
update the python3 patch to development branch which seems to be
moving.)
___
http://lists.scribus.net/mailman/listinfo/scribus
http://wiki.scribus.net
http://forums.scribus.net
___
Scribus Mailing List: ***@lists.scribus.net
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.s
Gregory Pittman
2018-11-09 14:37:38 UTC
Permalink
Post by ale rimoldi
hi gare
Post by Gary Dale
New to Scribus scripting and because of the version of Scribus and
Ghostscript I'm running, exporting to PDFs is a pain. I have to save
the .sla file then fire up my laptop, which is running an older
version of Debian, to create the PDF. Debian/Buster has a problem
with importing postscript & PDFs and with creating PDFs.
However the far larger problem I'm currently having is that I have a
hundred-page document with hundreds of e-mail addresses and a lot of
web addresses that I'd like to automatically locate and create
external links for. Doing this by hand will take forever.
After manually creating a few then looking at the .sla file, I note
that the PDF links are, as expected, not really associated with the
text. Instead, as the create sequence implies, they are associated
with a bit of real estate on the page that (hopefully) aligns with
some underlying text or image.
1) seek e-mail addresses (identified by text that looks like an
e-mail address),
2) identify the position and size of the address on the page,
3) create a PDF link element with that address.
Has anyone seen / used such a script or something similar that I can
work from?
i don't think there is a way to do this through scripting.
and automatically adding the pdf link frames from c++ is imo not
a good idea at all.
one simple solution, is to rely on the pdf reader to make links
clickable. many readers do it.
the other way -- a more long term one -- is to come up with a good way
to do that through "formatting".
does anybody have an idea how to edit and display links in scribus.
if yes, please make a proposal in this list or add a ticket to the bug
tracker (https://bugs.scribus.net).
it's probably not trivial to find the optimal way to do that, but me
might have a few very clever heads around here...
just keep in mind that scribus is mainly a tool for creating pdf that
will be professional printed, so the solution will probably have to
ensure that printed document do look good! (probably, you will not have
underlined urls in a printed document... but you might want them
highlighted in a pdf for online reading...)
sorry, that i cannot give you *the* solution... maybe, somebody
will come up with a better idea than the ones of mine...
One curious thing is that Adobe Reader used to recognize addresses and turn them into links (once to set this in Preferences), but it doesn't anymore. Perhaps this is related to the fact that Adobe hasn't updated AR since 9.5.5.1... I know it used to work.

I also see that when I hover over an email address, a tooltip pops up with mailto: and the address, but that doesn't do anything either.

Greg


___
Scribus Mailing List: ***@lists.scribus.net
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.scribus.net
Gary Dale
2018-11-13 16:04:47 UTC
Permalink
Post by Gregory Pittman
Post by ale rimoldi
hi gare
Post by Gary Dale
New to Scribus scripting and because of the version of Scribus and
Ghostscript I'm running, exporting to PDFs is a pain. I have to save
the .sla file then fire up my laptop, which is running an older
version of Debian, to create the PDF. Debian/Buster has a problem
with importing postscript & PDFs and with creating PDFs.
However the far larger problem I'm currently having is that I have a
hundred-page document with hundreds of e-mail addresses and a lot of
web addresses that I'd like to automatically locate and create
external links for. Doing this by hand will take forever.
After manually creating a few then looking at the .sla file, I note
that the PDF links are, as expected, not really associated with the
text. Instead, as the create sequence implies, they are associated
with a bit of real estate on the page that (hopefully) aligns with
some underlying text or image.
1) seek e-mail addresses (identified by text that looks like an
e-mail address),
2) identify the position and size of the address on the page,
3) create a PDF link element with that address.
Has anyone seen / used such a script or something similar that I can
work from?
i don't think there is a way to do this through scripting.
and automatically adding the pdf link frames from c++ is imo not
a good idea at all.
one simple solution, is to rely on the pdf reader to make links
clickable. many readers do it.
the other way -- a more long term one -- is to come up with a good way
to do that through "formatting".
does anybody have an idea how to edit and display links in scribus.
if yes, please make a proposal in this list or add a ticket to the bug
tracker (https://bugs.scribus.net).
it's probably not trivial to find the optimal way to do that, but me
might have a few very clever heads around here...
just keep in mind that scribus is mainly a tool for creating pdf that
will be professional printed, so the solution will probably have to
ensure that printed document do look good! (probably, you will not have
underlined urls in a printed document... but you might want them
highlighted in a pdf for online reading...)
sorry, that i cannot give you *the* solution... maybe, somebody
will come up with a better idea than the ones of mine...
One curious thing is that Adobe Reader used to recognize addresses and turn them into links (once to set this in Preferences), but it doesn't anymore. Perhaps this is related to the fact that Adobe hasn't updated AR since 9.5.5.1... I know it used to work.
I also see that when I hover over an email address, a tooltip pops up with mailto: and the address, but that doesn't do anything either.
Greg
I checked with Acrobat Reader XI running in wine and it did identify web
and e-mail addresses and made them clickable. Okular, on the other hand,
didn't. Neither did the Evince Document Viewer.

___
Scribus Mailing List: ***@lists.scribus.net
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.scribus.net

Loading...