Wrong mappings when copying from PDF with pdfgentounicode with newtxTrouble with the newtx packageHow to fix...
What's the rationale behind the objections to these measures against human trafficking?
                
                    Talents during the time of Achasverous
                
                    Why was it necessary for Jesus to go through Samaria in John 4:4
                
                    80’s/90’s fantasy book where men travelled across the land and inhabited creatures such as deer, beavers and pixies
                
                    Meth dealer reference in Family Guy
                
                    How to define a macro with multiple optional parameters?
                
                    Compare four integers, return word based on maximum
                
                    What's the purpose of these copper coils with resitors inside them in A Yamaha RX-V396RDS amplifier?
                
                    Can I retract my name from an already published manuscript?
                
                    Is my plan for fixing my water heater leak bad?
                
                    Do my Windows system binaries contain sensitive information?
                
                    Soft question- The Bashing Technique and Other powerful techniques for Olympiads
                
                    Should the .gitignore include an entry for .vscode when using Git and VSCode
                
                    How to get Region text from address id (Magento 2)
                
                    Criticizing long fiction. How is it different from short?
                
                    How to properly claim credit for peer review?
                
                    What are these green text/line displays shown during the livestream of Crew Dragon's approach to dock with the ISS?
                
                    Can the Count of Monte Cristo's calculation of poison dosage be explained?
                
                    Skis versus snow shoes - when to choose which for travelling the backcountry?
                
                    Auto Insert date into Notepad
                
                    Could quantum mechanics be necessary to analyze some biology scenarios?
                
                    awk unexpectedly removes dot from string
                
                    How can I mix up weapons for large groups of similar monsters/characters?
                
                    What's the difference between a cart and a wagon?
Wrong mappings when copying from PDF with pdfgentounicode with newtx
Trouble with the newtx packageHow to fix missing or incorrect mappings from glyphtounicode.texThe newtx isn't compatible with pifont?Where should I report wrong Unicode mappings?Problem with new MNRAS style files / newtx on arXivinserting a single unicode character with pdflatexNested fonts fail with the newtx (newtxtext) package (specifically textsc, textup, and textbf)Italicized footnote number within equation (with newtx/newpx fonts)Newtx is blocking creation of a pdfError on rendering mathematical symbol - getting source file of rfxlri-alt from newtx package
It seems that newtx is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.
Consider the following MWE:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
begin{document}
a b c d e š đ č ć ž Š Đ Č Ć Ž
bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž
sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
[
  x ne neq y quad
  x le leq leqslant y quad
  x ge geq geqslant y
]
[
  x coloneq coloneqq y quad
  y eqcolon eqqcolon x
]
end{document}
Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x

Found issues:
- I expected both neandneqto be mapped toU+2260, but they are mapped toU+002C(regular comma).
- Both le(ge) andleq(le) are mapped correctly, but slanted variants are not - they should be mapped toU+2A7D(U+2A7E).
- Both coloneq(coloneqq) andeqcolon(eqqcolon) are not mapped correctly (toU+2254andU+2255), but to regularBandC.
- Although it looks like đandĐare copied correctly, they are not.đis mapped toU+0111whileĐis mapped toU+00D0(not an lowercase-uppercase pair). Considering these two pairs:
 
 - U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE
 
 
 
 
 
 
 
... I would say that c3*s are used in Icelandic language, while the latter two c4*s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ and Đ from the keyboard ended as c4*s encoding the source file in UTF-8.
Questions:
 - neand- neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?
- slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
 - coloneqand others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
 - đand- Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
I am very keen to contribute for the improvements for all issues.
Kind regards, Ivan
EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.
unicode times newtx
|
show 1 more comment
It seems that newtx is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.
Consider the following MWE:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
begin{document}
a b c d e š đ č ć ž Š Đ Č Ć Ž
bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž
sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
[
  x ne neq y quad
  x le leq leqslant y quad
  x ge geq geqslant y
]
[
  x coloneq coloneqq y quad
  y eqcolon eqqcolon x
]
end{document}
Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x

Found issues:
- I expected both neandneqto be mapped toU+2260, but they are mapped toU+002C(regular comma).
- Both le(ge) andleq(le) are mapped correctly, but slanted variants are not - they should be mapped toU+2A7D(U+2A7E).
- Both coloneq(coloneqq) andeqcolon(eqqcolon) are not mapped correctly (toU+2254andU+2255), but to regularBandC.
- Although it looks like đandĐare copied correctly, they are not.đis mapped toU+0111whileĐis mapped toU+00D0(not an lowercase-uppercase pair). Considering these two pairs:
 
 - U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE
 
 
 
 
 
 
 
... I would say that c3*s are used in Icelandic language, while the latter two c4*s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ and Đ from the keyboard ended as c4*s encoding the source file in UTF-8.
Questions:
 - neand- neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?
- slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
 - coloneqand others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
 - đand- Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
I am very keen to contribute for the improvements for all issues.
Kind regards, Ivan
EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.
unicode times newtx
 
 
 1
 
 
 
 
 
 Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without- glyphtounicodefor the text part. Symbols are only partially recognized. With- glyphtounicodeI get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.
 
 – egreg
 yesterday
 
 
 
 
 
 
 
 
 
 
 
 
 Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:- DeclareUnicodeCharacter{00D0}{DH},- DeclareUnicodeCharacter{00F0}{dh},- DeclareUnicodeCharacter{0110}{DJ},- DeclareUnicodeCharacter{0111}{dj},- DeclareTextSymbol{DH}{T1}{208},- DeclareTextSymbol{DJ}{T1}{208},- DeclareTextSymbol{dh}{T1}{240},- DeclareTextSymbol{dj}{T1}{158}. As you wrote,- DJand- DHpoint to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?
 
 – Ivan Kokan
 yesterday
 
 
 
 
 
 
 
 1
 
 
 
 
 
 Do- pdflatex nfssfontfrom the terminal and hit return at the prompts until you get- *. Then type- tablebyeand the PDF file will show the full table for a T1 encoded font.
 
 – egreg
 yesterday
 
 
 
 
 
 
 
 
 
 
 OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for- đand- Đ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.
 
 – Ivan Kokan
 yesterday
 
 
 
 
 
 
 
 
 
 
 I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about- Đis explained very well, I should have read that way before.) I have even tried changing- /Ethto- /Dcroatin- lm-ec.encand tested new MWE using- lmodernand- ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and- /Ethwas the decision (some history here tug.org/fontname/ec.enc)?
 
 – Ivan Kokan
 12 hours ago
 
 
 
 
 
|
show 1 more comment
It seems that newtx is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.
Consider the following MWE:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
begin{document}
a b c d e š đ č ć ž Š Đ Č Ć Ž
bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž
sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
[
  x ne neq y quad
  x le leq leqslant y quad
  x ge geq geqslant y
]
[
  x coloneq coloneqq y quad
  y eqcolon eqqcolon x
]
end{document}
Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x

Found issues:
- I expected both neandneqto be mapped toU+2260, but they are mapped toU+002C(regular comma).
- Both le(ge) andleq(le) are mapped correctly, but slanted variants are not - they should be mapped toU+2A7D(U+2A7E).
- Both coloneq(coloneqq) andeqcolon(eqqcolon) are not mapped correctly (toU+2254andU+2255), but to regularBandC.
- Although it looks like đandĐare copied correctly, they are not.đis mapped toU+0111whileĐis mapped toU+00D0(not an lowercase-uppercase pair). Considering these two pairs:
 
 - U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE
 
 
 
 
 
 
 
... I would say that c3*s are used in Icelandic language, while the latter two c4*s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ and Đ from the keyboard ended as c4*s encoding the source file in UTF-8.
Questions:
 - neand- neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?
- slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
 - coloneqand others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
 - đand- Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
I am very keen to contribute for the improvements for all issues.
Kind regards, Ivan
EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.
unicode times newtx
It seems that newtx is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.
Consider the following MWE:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
begin{document}
a b c d e š đ č ć ž Š Đ Č Ć Ž
bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž
sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž
[
  x ne neq y quad
  x le leq leqslant y quad
  x ge geq geqslant y
]
[
  x coloneq coloneqq y quad
  y eqcolon eqqcolon x
]
end{document}
Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x

Found issues:
- I expected both neandneqto be mapped toU+2260, but they are mapped toU+002C(regular comma).
- Both le(ge) andleq(le) are mapped correctly, but slanted variants are not - they should be mapped toU+2A7D(U+2A7E).
- Both coloneq(coloneqq) andeqcolon(eqqcolon) are not mapped correctly (toU+2254andU+2255), but to regularBandC.
- Although it looks like đandĐare copied correctly, they are not.đis mapped toU+0111whileĐis mapped toU+00D0(not an lowercase-uppercase pair). Considering these two pairs:
 
 - U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE
 
 
 
 
 
 
 
... I would say that c3*s are used in Icelandic language, while the latter two c4*s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ and Đ from the keyboard ended as c4*s encoding the source file in UTF-8.
Questions:
 - neand- neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?
- slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
 - coloneqand others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
 - đand- Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?
I am very keen to contribute for the improvements for all issues.
Kind regards, Ivan
EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.
unicode times newtx
unicode times newtx
edited 12 hours ago
Ivan Kokan
asked yesterday


Ivan KokanIvan Kokan
5510
5510
 
 
 1
 
 
 
 
 
 Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without- glyphtounicodefor the text part. Symbols are only partially recognized. With- glyphtounicodeI get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.
 
 – egreg
 yesterday
 
 
 
 
 
 
 
 
 
 
 
 
 Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:- DeclareUnicodeCharacter{00D0}{DH},- DeclareUnicodeCharacter{00F0}{dh},- DeclareUnicodeCharacter{0110}{DJ},- DeclareUnicodeCharacter{0111}{dj},- DeclareTextSymbol{DH}{T1}{208},- DeclareTextSymbol{DJ}{T1}{208},- DeclareTextSymbol{dh}{T1}{240},- DeclareTextSymbol{dj}{T1}{158}. As you wrote,- DJand- DHpoint to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?
 
 – Ivan Kokan
 yesterday
 
 
 
 
 
 
 
 1
 
 
 
 
 
 Do- pdflatex nfssfontfrom the terminal and hit return at the prompts until you get- *. Then type- tablebyeand the PDF file will show the full table for a T1 encoded font.
 
 – egreg
 yesterday
 
 
 
 
 
 
 
 
 
 
 OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for- đand- Đ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.
 
 – Ivan Kokan
 yesterday
 
 
 
 
 
 
 
 
 
 
 I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about- Đis explained very well, I should have read that way before.) I have even tried changing- /Ethto- /Dcroatin- lm-ec.encand tested new MWE using- lmodernand- ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and- /Ethwas the decision (some history here tug.org/fontname/ec.enc)?
 
 – Ivan Kokan
 12 hours ago
 
 
 
 
 
|
show 1 more comment
 
 
 1
 
 
 
 
 
 Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without- glyphtounicodefor the text part. Symbols are only partially recognized. With- glyphtounicodeI get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.
 
 – egreg
 yesterday
 
 
 
 
 
 
 
 
 
 
 
 
 Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:- DeclareUnicodeCharacter{00D0}{DH},- DeclareUnicodeCharacter{00F0}{dh},- DeclareUnicodeCharacter{0110}{DJ},- DeclareUnicodeCharacter{0111}{dj},- DeclareTextSymbol{DH}{T1}{208},- DeclareTextSymbol{DJ}{T1}{208},- DeclareTextSymbol{dh}{T1}{240},- DeclareTextSymbol{dj}{T1}{158}. As you wrote,- DJand- DHpoint to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?
 
 – Ivan Kokan
 yesterday
 
 
 
 
 
 
 
 1
 
 
 
 
 
 Do- pdflatex nfssfontfrom the terminal and hit return at the prompts until you get- *. Then type- tablebyeand the PDF file will show the full table for a T1 encoded font.
 
 – egreg
 yesterday
 
 
 
 
 
 
 
 
 
 
 OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for- đand- Đ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.
 
 – Ivan Kokan
 yesterday
 
 
 
 
 
 
 
 
 
 
 I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about- Đis explained very well, I should have read that way before.) I have even tried changing- /Ethto- /Dcroatin- lm-ec.encand tested new MWE using- lmodernand- ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and- /Ethwas the decision (some history here tug.org/fontname/ec.enc)?
 
 – Ivan Kokan
 12 hours ago
 
 
 
 
 
1
1
Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without
glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.– egreg
yesterday
Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without
glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.– egreg
yesterday
Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:
DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?– Ivan Kokan
yesterday
Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:
DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?– Ivan Kokan
yesterday
1
1
Do
pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.– egreg
yesterday
Do
pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.– egreg
yesterday
OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for
đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.– Ivan Kokan
yesterday
OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for
đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.– Ivan Kokan
yesterday
I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about
Đ is explained very well, I should have read that way before.)  I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?– Ivan Kokan
12 hours ago
I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about
Đ is explained very well, I should have read that way before.)  I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?– Ivan Kokan
12 hours ago
|
show 1 more comment
                                1 Answer
                            1
                        
active
oldest
votes
glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠  add a suitable pdfglyphtounicode:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}
begin{document}
$ne$
end{document}
(I found the nequal by looking in the txsyc.pfb).
 
 
 
 
 
 
 
 Thanks! I guess that adding such definitions in- newtxitself (or other fonts it is compound of) would be even better than defining it locally, right?
 
 – Ivan Kokan
 yesterday
 
 
 
 
 
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "85"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f477433%2fwrong-mappings-when-copying-from-pdf-with-pdfgentounicode-with-newtx%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
                                1 Answer
                            1
                        
active
oldest
votes
                                1 Answer
                            1
                        
active
oldest
votes
active
oldest
votes
active
oldest
votes
glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠  add a suitable pdfglyphtounicode:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}
begin{document}
$ne$
end{document}
(I found the nequal by looking in the txsyc.pfb).
 
 
 
 
 
 
 
 Thanks! I guess that adding such definitions in- newtxitself (or other fonts it is compound of) would be even better than defining it locally, right?
 
 – Ivan Kokan
 yesterday
 
 
 
 
 
add a comment |
glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠  add a suitable pdfglyphtounicode:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}
begin{document}
$ne$
end{document}
(I found the nequal by looking in the txsyc.pfb).
 
 
 
 
 
 
 
 Thanks! I guess that adding such definitions in- newtxitself (or other fonts it is compound of) would be even better than defining it locally, right?
 
 – Ivan Kokan
 yesterday
 
 
 
 
 
add a comment |
glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠  add a suitable pdfglyphtounicode:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}
begin{document}
$ne$
end{document}
(I found the nequal by looking in the txsyc.pfb).
glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠  add a suitable pdfglyphtounicode:
documentclass{article}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}
input{glyphtounicode}
pdfgentounicode=1
usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}
begin{document}
$ne$
end{document}
(I found the nequal by looking in the txsyc.pfb).
answered yesterday


Ulrike FischerUlrike Fischer
194k8302687
194k8302687
 
 
 
 
 
 
 
 Thanks! I guess that adding such definitions in- newtxitself (or other fonts it is compound of) would be even better than defining it locally, right?
 
 – Ivan Kokan
 yesterday
 
 
 
 
 
add a comment |
 
 
 
 
 
 
 
 Thanks! I guess that adding such definitions in- newtxitself (or other fonts it is compound of) would be even better than defining it locally, right?
 
 – Ivan Kokan
 yesterday
 
 
 
 
 
Thanks! I guess that adding such definitions in
newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?– Ivan Kokan
yesterday
Thanks! I guess that adding such definitions in
newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?– Ivan Kokan
yesterday
add a comment |
Thanks for contributing an answer to TeX - LaTeX Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f477433%2fwrong-mappings-when-copying-from-pdf-with-pdfgentounicode-with-newtx%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without
glyphtounicodefor the text part. Symbols are only partially recognized. WithglyphtounicodeI get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.– egreg
yesterday
Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these:
DeclareUnicodeCharacter{00D0}{DH},DeclareUnicodeCharacter{00F0}{dh},DeclareUnicodeCharacter{0110}{DJ},DeclareUnicodeCharacter{0111}{dj},DeclareTextSymbol{DH}{T1}{208},DeclareTextSymbol{DJ}{T1}{208},DeclareTextSymbol{dh}{T1}{240},DeclareTextSymbol{dj}{T1}{158}. As you wrote,DJandDHpoint to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?– Ivan Kokan
yesterday
1
Do
pdflatex nfssfontfrom the terminal and hit return at the prompts until you get*. Then typetablebyeand the PDF file will show the full table for a T1 encoded font.– egreg
yesterday
OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for
đandĐ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.– Ivan Kokan
yesterday
I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about
Đis explained very well, I should have read that way before.) I have even tried changing/Ethto/Dcroatinlm-ec.encand tested new MWE usinglmodernandð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and/Ethwas the decision (some history here tug.org/fontname/ec.enc)?– Ivan Kokan
12 hours ago