Wrong mappings when copying from PDF with pdfgentounicode with newtxTrouble with the newtx packageHow to fix...
What's the rationale behind the objections to these measures against human trafficking?
Talents during the time of Achasverous
Why was it necessary for Jesus to go through Samaria in John 4:4
80’s/90’s fantasy book where men travelled across the land and inhabited creatures such as deer, beavers and pixies
Meth dealer reference in Family Guy
How to define a macro with multiple optional parameters?
Compare four integers, return word based on maximum
What's the purpose of these copper coils with resitors inside them in A Yamaha RX-V396RDS amplifier?
Can I retract my name from an already published manuscript?
Is my plan for fixing my water heater leak bad?
Do my Windows system binaries contain sensitive information?
Soft question- The Bashing Technique and Other powerful techniques for Olympiads
Should the .gitignore include an entry for .vscode when using Git and VSCode
How to get Region text from address id (Magento 2)
Criticizing long fiction. How is it different from short?
How to properly claim credit for peer review?
What are these green text/line displays shown during the livestream of Crew Dragon's approach to dock with the ISS?
Can the Count of Monte Cristo's calculation of poison dosage be explained?
Skis versus snow shoes - when to choose which for travelling the backcountry?
Auto Insert date into Notepad
Could quantum mechanics be necessary to analyze some biology scenarios?
awk unexpectedly removes dot from string
How can I mix up weapons for large groups of similar monsters/characters?
What's the difference between a cart and a wagon?
Wrong mappings when copying from PDF with pdfgentounicode with newtx
Trouble with the newtx packageHow to fix missing or incorrect mappings from glyphtounicode.texThe newtx isn't compatible with pifont?Where should I report wrong Unicode mappings?Problem with new MNRAS style files / newtx on arXivinserting a single unicode character with pdflatexNested fonts fail with the newtx (newtxtext) package (specifically textsc, textup, and textbf)Italicized footnote number within equation (with newtx/newpx fonts)Newtx is blocking creation of a pdfError on rendering mathematical symbol - getting source file of rfxlri-alt from newtx package
It seems that newtx
is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.
Consider the following MWE:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
begin{document}
a b c d e š đ č ć ž Š Đ Č Ć Ž
bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž
sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
[
x ne neq y quad
x le leq leqslant y quad
x ge geq geqslant y
]
[
x coloneq coloneqq y quad
y eqcolon eqqcolon x
]
end{document}
Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x
Found issues:
- I expected both
ne
andneq
to be mapped toU+2260
, but they are mapped toU+002C
(regular comma). - Both
le
(ge
) andleq
(le
) are mapped correctly, but slanted variants are not - they should be mapped toU+2A7D
(U+2A7E
). - Both
coloneq
(coloneqq
) andeqcolon
(eqqcolon
) are not mapped correctly (toU+2254
andU+2255
), but to regularB
andC
. - Although it looks like
đ
andĐ
are copied correctly, they are not.đ
is mapped toU+0111
whileĐ
is mapped toU+00D0
(not an lowercase-uppercase pair). Considering these two pairs:
- U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE
... I would say that c3*
s are used in Icelandic language, while the latter two c4*
s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ
and Đ
from the keyboard ended as c4*
s encoding the source file in UTF-8.
Questions:
ne
andneq
: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?- slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
coloneq
and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
đ
andĐ
: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
I am very keen to contribute for the improvements for all issues.
Kind regards, Ivan
EDIT: The question about đ
and Đ
is related to T1 font encoding, not the newtx
, as explained in comments below.
unicode times newtx
|
show 1 more comment
It seems that newtx
is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.
Consider the following MWE:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
begin{document}
a b c d e š đ č ć ž Š Đ Č Ć Ž
bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž
sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
[
x ne neq y quad
x le leq leqslant y quad
x ge geq geqslant y
]
[
x coloneq coloneqq y quad
y eqcolon eqqcolon x
]
end{document}
Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x
Found issues:
- I expected both
ne
andneq
to be mapped toU+2260
, but they are mapped toU+002C
(regular comma). - Both
le
(ge
) andleq
(le
) are mapped correctly, but slanted variants are not - they should be mapped toU+2A7D
(U+2A7E
). - Both
coloneq
(coloneqq
) andeqcolon
(eqqcolon
) are not mapped correctly (toU+2254
andU+2255
), but to regularB
andC
. - Although it looks like
đ
andĐ
are copied correctly, they are not.đ
is mapped toU+0111
whileĐ
is mapped toU+00D0
(not an lowercase-uppercase pair). Considering these two pairs:
- U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE
... I would say that c3*
s are used in Icelandic language, while the latter two c4*
s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ
and Đ
from the keyboard ended as c4*
s encoding the source file in UTF-8.
Questions:
ne
andneq
: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?- slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
coloneq
and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
đ
andĐ
: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
I am very keen to contribute for the improvements for all issues.
Kind regards, Ivan
EDIT: The question about đ
and Đ
is related to T1 font encoding, not the newtx
, as explained in comments below.
unicode times newtx
1
Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or withoutglyphtounicode
for the text part. Symbols are only partially recognized. Withglyphtounicode
I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.
– egreg
yesterday
Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:DeclareUnicodeCharacter{00D0}{DH}
,DeclareUnicodeCharacter{00F0}{dh}
,DeclareUnicodeCharacter{0110}{DJ}
,DeclareUnicodeCharacter{0111}{dj}
,DeclareTextSymbol{DH}{T1}{208}
,DeclareTextSymbol{DJ}{T1}{208}
,DeclareTextSymbol{dh}{T1}{240}
,DeclareTextSymbol{dj}{T1}{158}
. As you wrote,DJ
andDH
point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?
– Ivan Kokan
yesterday
1
Dopdflatex nfssfont
from the terminal and hit return at the prompts until you get*
. Then typetablebye
and the PDF file will show the full table for a T1 encoded font.
– egreg
yesterday
OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings forđ
andĐ
("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.
– Ivan Kokan
yesterday
I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue aboutĐ
is explained very well, I should have read that way before.) I have even tried changing/Eth
to/Dcroat
inlm-ec.enc
and tested new MWE usinglmodern
andð Ð đ Đ
; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and/Eth
was the decision (some history here tug.org/fontname/ec.enc)?
– Ivan Kokan
12 hours ago
|
show 1 more comment
It seems that newtx
is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.
Consider the following MWE:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
begin{document}
a b c d e š đ č ć ž Š Đ Č Ć Ž
bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž
sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
[
x ne neq y quad
x le leq leqslant y quad
x ge geq geqslant y
]
[
x coloneq coloneqq y quad
y eqcolon eqqcolon x
]
end{document}
Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x
Found issues:
- I expected both
ne
andneq
to be mapped toU+2260
, but they are mapped toU+002C
(regular comma). - Both
le
(ge
) andleq
(le
) are mapped correctly, but slanted variants are not - they should be mapped toU+2A7D
(U+2A7E
). - Both
coloneq
(coloneqq
) andeqcolon
(eqqcolon
) are not mapped correctly (toU+2254
andU+2255
), but to regularB
andC
. - Although it looks like
đ
andĐ
are copied correctly, they are not.đ
is mapped toU+0111
whileĐ
is mapped toU+00D0
(not an lowercase-uppercase pair). Considering these two pairs:
- U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE
... I would say that c3*
s are used in Icelandic language, while the latter two c4*
s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ
and Đ
from the keyboard ended as c4*
s encoding the source file in UTF-8.
Questions:
ne
andneq
: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?- slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
coloneq
and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
đ
andĐ
: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
I am very keen to contribute for the improvements for all issues.
Kind regards, Ivan
EDIT: The question about đ
and Đ
is related to T1 font encoding, not the newtx
, as explained in comments below.
unicode times newtx
It seems that newtx
is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.
Consider the following MWE:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
begin{document}
a b c d e š đ č ć ž Š Đ Č Ć Ž
bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž
sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
[
x ne neq y quad
x le leq leqslant y quad
x ge geq geqslant y
]
[
x coloneq coloneqq y quad
y eqcolon eqqcolon x
]
end{document}
Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x
Found issues:
- I expected both
ne
andneq
to be mapped toU+2260
, but they are mapped toU+002C
(regular comma). - Both
le
(ge
) andleq
(le
) are mapped correctly, but slanted variants are not - they should be mapped toU+2A7D
(U+2A7E
). - Both
coloneq
(coloneqq
) andeqcolon
(eqqcolon
) are not mapped correctly (toU+2254
andU+2255
), but to regularB
andC
. - Although it looks like
đ
andĐ
are copied correctly, they are not.đ
is mapped toU+0111
whileĐ
is mapped toU+00D0
(not an lowercase-uppercase pair). Considering these two pairs:
- U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE
... I would say that c3*
s are used in Icelandic language, while the latter two c4*
s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ
and Đ
from the keyboard ended as c4*
s encoding the source file in UTF-8.
Questions:
ne
andneq
: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?- slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
coloneq
and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
đ
andĐ
: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
I am very keen to contribute for the improvements for all issues.
Kind regards, Ivan
EDIT: The question about đ
and Đ
is related to T1 font encoding, not the newtx
, as explained in comments below.
unicode times newtx
unicode times newtx
edited 12 hours ago
Ivan Kokan
asked yesterday
Ivan KokanIvan Kokan
5510
5510
1
Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or withoutglyphtounicode
for the text part. Symbols are only partially recognized. Withglyphtounicode
I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.
– egreg
yesterday
Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:DeclareUnicodeCharacter{00D0}{DH}
,DeclareUnicodeCharacter{00F0}{dh}
,DeclareUnicodeCharacter{0110}{DJ}
,DeclareUnicodeCharacter{0111}{dj}
,DeclareTextSymbol{DH}{T1}{208}
,DeclareTextSymbol{DJ}{T1}{208}
,DeclareTextSymbol{dh}{T1}{240}
,DeclareTextSymbol{dj}{T1}{158}
. As you wrote,DJ
andDH
point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?
– Ivan Kokan
yesterday
1
Dopdflatex nfssfont
from the terminal and hit return at the prompts until you get*
. Then typetablebye
and the PDF file will show the full table for a T1 encoded font.
– egreg
yesterday
OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings forđ
andĐ
("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.
– Ivan Kokan
yesterday
I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue aboutĐ
is explained very well, I should have read that way before.) I have even tried changing/Eth
to/Dcroat
inlm-ec.enc
and tested new MWE usinglmodern
andð Ð đ Đ
; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and/Eth
was the decision (some history here tug.org/fontname/ec.enc)?
– Ivan Kokan
12 hours ago
|
show 1 more comment
1
Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or withoutglyphtounicode
for the text part. Symbols are only partially recognized. Withglyphtounicode
I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.
– egreg
yesterday
Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:DeclareUnicodeCharacter{00D0}{DH}
,DeclareUnicodeCharacter{00F0}{dh}
,DeclareUnicodeCharacter{0110}{DJ}
,DeclareUnicodeCharacter{0111}{dj}
,DeclareTextSymbol{DH}{T1}{208}
,DeclareTextSymbol{DJ}{T1}{208}
,DeclareTextSymbol{dh}{T1}{240}
,DeclareTextSymbol{dj}{T1}{158}
. As you wrote,DJ
andDH
point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?
– Ivan Kokan
yesterday
1
Dopdflatex nfssfont
from the terminal and hit return at the prompts until you get*
. Then typetablebye
and the PDF file will show the full table for a T1 encoded font.
– egreg
yesterday
OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings forđ
andĐ
("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.
– Ivan Kokan
yesterday
I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue aboutĐ
is explained very well, I should have read that way before.) I have even tried changing/Eth
to/Dcroat
inlm-ec.enc
and tested new MWE usinglmodern
andð Ð đ Đ
; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and/Eth
was the decision (some history here tug.org/fontname/ec.enc)?
– Ivan Kokan
12 hours ago
1
1
Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without
glyphtounicode
for the text part. Symbols are only partially recognized. With glyphtounicode
I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.– egreg
yesterday
Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without
glyphtounicode
for the text part. Symbols are only partially recognized. With glyphtounicode
I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.– egreg
yesterday
Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:
DeclareUnicodeCharacter{00D0}{DH}
, DeclareUnicodeCharacter{00F0}{dh}
, DeclareUnicodeCharacter{0110}{DJ}
, DeclareUnicodeCharacter{0111}{dj}
, DeclareTextSymbol{DH}{T1}{208}
, DeclareTextSymbol{DJ}{T1}{208}
, DeclareTextSymbol{dh}{T1}{240}
, DeclareTextSymbol{dj}{T1}{158}
. As you wrote, DJ
and DH
point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?– Ivan Kokan
yesterday
Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:
DeclareUnicodeCharacter{00D0}{DH}
, DeclareUnicodeCharacter{00F0}{dh}
, DeclareUnicodeCharacter{0110}{DJ}
, DeclareUnicodeCharacter{0111}{dj}
, DeclareTextSymbol{DH}{T1}{208}
, DeclareTextSymbol{DJ}{T1}{208}
, DeclareTextSymbol{dh}{T1}{240}
, DeclareTextSymbol{dj}{T1}{158}
. As you wrote, DJ
and DH
point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?– Ivan Kokan
yesterday
1
1
Do
pdflatex nfssfont
from the terminal and hit return at the prompts until you get *
. Then type tablebye
and the PDF file will show the full table for a T1 encoded font.– egreg
yesterday
Do
pdflatex nfssfont
from the terminal and hit return at the prompts until you get *
. Then type tablebye
and the PDF file will show the full table for a T1 encoded font.– egreg
yesterday
OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for
đ
and Đ
("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.– Ivan Kokan
yesterday
OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for
đ
and Đ
("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.– Ivan Kokan
yesterday
I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about
Đ
is explained very well, I should have read that way before.) I have even tried changing /Eth
to /Dcroat
in lm-ec.enc
and tested new MWE using lmodern
and ð Ð đ Đ
; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth
was the decision (some history here tug.org/fontname/ec.enc)?– Ivan Kokan
12 hours ago
I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about
Đ
is explained very well, I should have read that way before.) I have even tried changing /Eth
to /Dcroat
in lm-ec.enc
and tested new MWE using lmodern
and ð Ð đ Đ
; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth
was the decision (some history here tug.org/fontname/ec.enc)?– Ivan Kokan
12 hours ago
|
show 1 more comment
1 Answer
1
active
oldest
votes
glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne
to copy as ≠ add a suitable pdfglyphtounicode
:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}
begin{document}
$ne$
end{document}
(I found the nequal
by looking in the txsyc.pfb).
Thanks! I guess that adding such definitions innewtx
itself (or other fonts it is compound of) would be even better than defining it locally, right?
– Ivan Kokan
yesterday
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "85"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f477433%2fwrong-mappings-when-copying-from-pdf-with-pdfgentounicode-with-newtx%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne
to copy as ≠ add a suitable pdfglyphtounicode
:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}
begin{document}
$ne$
end{document}
(I found the nequal
by looking in the txsyc.pfb).
Thanks! I guess that adding such definitions innewtx
itself (or other fonts it is compound of) would be even better than defining it locally, right?
– Ivan Kokan
yesterday
add a comment |
glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne
to copy as ≠ add a suitable pdfglyphtounicode
:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}
begin{document}
$ne$
end{document}
(I found the nequal
by looking in the txsyc.pfb).
Thanks! I guess that adding such definitions innewtx
itself (or other fonts it is compound of) would be even better than defining it locally, right?
– Ivan Kokan
yesterday
add a comment |
glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne
to copy as ≠ add a suitable pdfglyphtounicode
:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}
begin{document}
$ne$
end{document}
(I found the nequal
by looking in the txsyc.pfb).
glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne
to copy as ≠ add a suitable pdfglyphtounicode
:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}
begin{document}
$ne$
end{document}
(I found the nequal
by looking in the txsyc.pfb).
answered yesterday
Ulrike FischerUlrike Fischer
194k8302687
194k8302687
Thanks! I guess that adding such definitions innewtx
itself (or other fonts it is compound of) would be even better than defining it locally, right?
– Ivan Kokan
yesterday
add a comment |
Thanks! I guess that adding such definitions innewtx
itself (or other fonts it is compound of) would be even better than defining it locally, right?
– Ivan Kokan
yesterday
Thanks! I guess that adding such definitions in
newtx
itself (or other fonts it is compound of) would be even better than defining it locally, right?– Ivan Kokan
yesterday
Thanks! I guess that adding such definitions in
newtx
itself (or other fonts it is compound of) would be even better than defining it locally, right?– Ivan Kokan
yesterday
add a comment |
Thanks for contributing an answer to TeX - LaTeX Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f477433%2fwrong-mappings-when-copying-from-pdf-with-pdfgentounicode-with-newtx%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without
glyphtounicode
for the text part. Symbols are only partially recognized. Withglyphtounicode
I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.– egreg
yesterday
Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:
DeclareUnicodeCharacter{00D0}{DH}
,DeclareUnicodeCharacter{00F0}{dh}
,DeclareUnicodeCharacter{0110}{DJ}
,DeclareUnicodeCharacter{0111}{dj}
,DeclareTextSymbol{DH}{T1}{208}
,DeclareTextSymbol{DJ}{T1}{208}
,DeclareTextSymbol{dh}{T1}{240}
,DeclareTextSymbol{dj}{T1}{158}
. As you wrote,DJ
andDH
point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?– Ivan Kokan
yesterday
1
Do
pdflatex nfssfont
from the terminal and hit return at the prompts until you get*
. Then typetablebye
and the PDF file will show the full table for a T1 encoded font.– egreg
yesterday
OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for
đ
andĐ
("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.– Ivan Kokan
yesterday
I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about
Đ
is explained very well, I should have read that way before.) I have even tried changing/Eth
to/Dcroat
inlm-ec.enc
and tested new MWE usinglmodern
andð Ð đ Đ
; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and/Eth
was the decision (some history here tug.org/fontname/ec.enc)?– Ivan Kokan
12 hours ago