Wrong mappings when copying from PDF with pdfgentounicode with newtxTrouble with the newtx packageHow to fix...

What's the rationale behind the objections to these measures against human trafficking?

Talents during the time of Achasverous

Why was it necessary for Jesus to go through Samaria in John 4:4

80’s/90’s fantasy book where men travelled across the land and inhabited creatures such as deer, beavers and pixies

Meth dealer reference in Family Guy

How to define a macro with multiple optional parameters?

Compare four integers, return word based on maximum

What's the purpose of these copper coils with resitors inside them in A Yamaha RX-V396RDS amplifier?

Can I retract my name from an already published manuscript?

Is my plan for fixing my water heater leak bad?

Do my Windows system binaries contain sensitive information?

Soft question- The Bashing Technique and Other powerful techniques for Olympiads

Should the .gitignore include an entry for .vscode when using Git and VSCode

How to get Region text from address id (Magento 2)

Criticizing long fiction. How is it different from short?

How to properly claim credit for peer review?

What are these green text/line displays shown during the livestream of Crew Dragon's approach to dock with the ISS?

Can the Count of Monte Cristo's calculation of poison dosage be explained?

Skis versus snow shoes - when to choose which for travelling the backcountry?

Auto Insert date into Notepad

Could quantum mechanics be necessary to analyze some biology scenarios?

awk unexpectedly removes dot from string

How can I mix up weapons for large groups of similar monsters/characters?

What's the difference between a cart and a wagon?



Wrong mappings when copying from PDF with pdfgentounicode with newtx


Trouble with the newtx packageHow to fix missing or incorrect mappings from glyphtounicode.texThe newtx isn't compatible with pifont?Where should I report wrong Unicode mappings?Problem with new MNRAS style files / newtx on arXivinserting a single unicode character with pdflatexNested fonts fail with the newtx (newtxtext) package (specifically textsc, textup, and textbf)Italicized footnote number within equation (with newtx/newpx fonts)Newtx is blocking creation of a pdfError on rendering mathematical symbol - getting source file of rfxlri-alt from newtx package













1















It seems that newtx is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.



Consider the following MWE:



documentclass{article}

usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}

input{glyphtounicode}
pdfgentounicode=1

usepackage{newtxtext}
usepackage{newtxmath}


begin{document}

a b c d e š đ č ć ž Š Đ Č Ć Ž

bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž

sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž

ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž

[
x ne neq y quad
x le leq leqslant y quad
x ge geq geqslant y
]

[
x coloneq coloneqq y quad
y eqcolon eqqcolon x
]

end{document}


Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:



a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x


Hex dump



Found issues:




  • I expected both ne and neq to be mapped to U+2260, but they are mapped to U+002C (regular comma).

  • Both le (ge) and leq (le) are mapped correctly, but slanted variants are not - they should be mapped to U+2A7D (U+2A7E).

  • Both coloneq (coloneqq) and eqcolon (eqqcolon) are not mapped correctly (to U+2254 and U+2255), but to regular B and C.

  • Although it looks like đ and Đ are copied correctly, they are not. đ is mapped to U+0111 while Đ is mapped to U+00D0 (not an lowercase-uppercase pair). Considering these two pairs:


    • U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH

    • U+00F0 ð c3 b0 LATIN SMALL LETTER ETH

    • U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE

    • U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE




... I would say that c3*s are used in Icelandic language, while the latter two c4*s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ and Đ from the keyboard ended as c4*s encoding the source file in UTF-8.



Questions:





  • ne and neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?

  • slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


  • coloneq and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


  • đ and Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


I am very keen to contribute for the improvements for all issues.



Kind regards, Ivan



EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.










share|improve this question




















  • 1





    Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.

    – egreg
    yesterday













  • Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these: DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?

    – Ivan Kokan
    yesterday








  • 1





    Do pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.

    – egreg
    yesterday











  • OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.

    – Ivan Kokan
    yesterday











  • I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about Đ is explained very well, I should have read that way before.) I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?

    – Ivan Kokan
    12 hours ago


















1















It seems that newtx is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.



Consider the following MWE:



documentclass{article}

usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}

input{glyphtounicode}
pdfgentounicode=1

usepackage{newtxtext}
usepackage{newtxmath}


begin{document}

a b c d e š đ č ć ž Š Đ Č Ć Ž

bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž

sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž

ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž

[
x ne neq y quad
x le leq leqslant y quad
x ge geq geqslant y
]

[
x coloneq coloneqq y quad
y eqcolon eqqcolon x
]

end{document}


Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:



a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x


Hex dump



Found issues:




  • I expected both ne and neq to be mapped to U+2260, but they are mapped to U+002C (regular comma).

  • Both le (ge) and leq (le) are mapped correctly, but slanted variants are not - they should be mapped to U+2A7D (U+2A7E).

  • Both coloneq (coloneqq) and eqcolon (eqqcolon) are not mapped correctly (to U+2254 and U+2255), but to regular B and C.

  • Although it looks like đ and Đ are copied correctly, they are not. đ is mapped to U+0111 while Đ is mapped to U+00D0 (not an lowercase-uppercase pair). Considering these two pairs:


    • U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH

    • U+00F0 ð c3 b0 LATIN SMALL LETTER ETH

    • U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE

    • U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE




... I would say that c3*s are used in Icelandic language, while the latter two c4*s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ and Đ from the keyboard ended as c4*s encoding the source file in UTF-8.



Questions:





  • ne and neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?

  • slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


  • coloneq and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


  • đ and Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


I am very keen to contribute for the improvements for all issues.



Kind regards, Ivan



EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.










share|improve this question




















  • 1





    Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.

    – egreg
    yesterday













  • Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these: DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?

    – Ivan Kokan
    yesterday








  • 1





    Do pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.

    – egreg
    yesterday











  • OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.

    – Ivan Kokan
    yesterday











  • I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about Đ is explained very well, I should have read that way before.) I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?

    – Ivan Kokan
    12 hours ago
















1












1








1


2






It seems that newtx is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.



Consider the following MWE:



documentclass{article}

usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}

input{glyphtounicode}
pdfgentounicode=1

usepackage{newtxtext}
usepackage{newtxmath}


begin{document}

a b c d e š đ č ć ž Š Đ Č Ć Ž

bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž

sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž

ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž

[
x ne neq y quad
x le leq leqslant y quad
x ge geq geqslant y
]

[
x coloneq coloneqq y quad
y eqcolon eqqcolon x
]

end{document}


Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:



a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x


Hex dump



Found issues:




  • I expected both ne and neq to be mapped to U+2260, but they are mapped to U+002C (regular comma).

  • Both le (ge) and leq (le) are mapped correctly, but slanted variants are not - they should be mapped to U+2A7D (U+2A7E).

  • Both coloneq (coloneqq) and eqcolon (eqqcolon) are not mapped correctly (to U+2254 and U+2255), but to regular B and C.

  • Although it looks like đ and Đ are copied correctly, they are not. đ is mapped to U+0111 while Đ is mapped to U+00D0 (not an lowercase-uppercase pair). Considering these two pairs:


    • U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH

    • U+00F0 ð c3 b0 LATIN SMALL LETTER ETH

    • U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE

    • U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE




... I would say that c3*s are used in Icelandic language, while the latter two c4*s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ and Đ from the keyboard ended as c4*s encoding the source file in UTF-8.



Questions:





  • ne and neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?

  • slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


  • coloneq and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


  • đ and Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


I am very keen to contribute for the improvements for all issues.



Kind regards, Ivan



EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.










share|improve this question
















It seems that newtx is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.



Consider the following MWE:



documentclass{article}

usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}

input{glyphtounicode}
pdfgentounicode=1

usepackage{newtxtext}
usepackage{newtxmath}


begin{document}

a b c d e š đ č ć ž Š Đ Č Ć Ž

bfseries
a b c d e š đ č ć ž Š Đ Č Ć Ž

sffamily
a b c d e š đ č ć ž Š Đ Č Ć Ž

ttfamily
a b c d e š đ č ć ž Š Đ Č Ć Ž

[
x ne neq y quad
x le leq leqslant y quad
x ge geq geqslant y
]

[
x coloneq coloneqq y quad
y eqcolon eqqcolon x
]

end{document}


Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:



a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
a b c d e š đ č ć ž Š Ð Č Ć Ž
x ,, y x ≤≤6 y x ≥≥> y
x BB y y CC x


Hex dump



Found issues:




  • I expected both ne and neq to be mapped to U+2260, but they are mapped to U+002C (regular comma).

  • Both le (ge) and leq (le) are mapped correctly, but slanted variants are not - they should be mapped to U+2A7D (U+2A7E).

  • Both coloneq (coloneqq) and eqcolon (eqqcolon) are not mapped correctly (to U+2254 and U+2255), but to regular B and C.

  • Although it looks like đ and Đ are copied correctly, they are not. đ is mapped to U+0111 while Đ is mapped to U+00D0 (not an lowercase-uppercase pair). Considering these two pairs:


    • U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH

    • U+00F0 ð c3 b0 LATIN SMALL LETTER ETH

    • U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE

    • U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE




... I would say that c3*s are used in Icelandic language, while the latter two c4*s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ and Đ from the keyboard ended as c4*s encoding the source file in UTF-8.



Questions:





  • ne and neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?

  • slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


  • coloneq and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


  • đ and Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?


I am very keen to contribute for the improvements for all issues.



Kind regards, Ivan



EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.







unicode times newtx






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 12 hours ago







Ivan Kokan

















asked yesterday









Ivan KokanIvan Kokan

5510




5510








  • 1





    Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.

    – egreg
    yesterday













  • Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these: DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?

    – Ivan Kokan
    yesterday








  • 1





    Do pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.

    – egreg
    yesterday











  • OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.

    – Ivan Kokan
    yesterday











  • I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about Đ is explained very well, I should have read that way before.) I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?

    – Ivan Kokan
    12 hours ago
















  • 1





    Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.

    – egreg
    yesterday













  • Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these: DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?

    – Ivan Kokan
    yesterday








  • 1





    Do pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.

    – egreg
    yesterday











  • OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.

    – Ivan Kokan
    yesterday











  • I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about Đ is explained very well, I should have read that way before.) I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?

    – Ivan Kokan
    12 hours ago










1




1





Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.

– egreg
yesterday







Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.

– egreg
yesterday















Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these: DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?

– Ivan Kokan
yesterday







Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these: DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?

– Ivan Kokan
yesterday






1




1





Do pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.

– egreg
yesterday





Do pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.

– egreg
yesterday













OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.

– Ivan Kokan
yesterday





OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.

– Ivan Kokan
yesterday













I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about Đ is explained very well, I should have read that way before.) I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?

– Ivan Kokan
12 hours ago







I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about Đ is explained very well, I should have read that way before.) I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?

– Ivan Kokan
12 hours ago












1 Answer
1






active

oldest

votes


















3














glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠ add a suitable pdfglyphtounicode:



documentclass{article}

usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}

input{glyphtounicode}
pdfgentounicode=1

usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}

begin{document}
$ne$
end{document}


(I found the nequal by looking in the txsyc.pfb).






share|improve this answer
























  • Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

    – Ivan Kokan
    yesterday













Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "85"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f477433%2fwrong-mappings-when-copying-from-pdf-with-pdfgentounicode-with-newtx%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠ add a suitable pdfglyphtounicode:



documentclass{article}

usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}

input{glyphtounicode}
pdfgentounicode=1

usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}

begin{document}
$ne$
end{document}


(I found the nequal by looking in the txsyc.pfb).






share|improve this answer
























  • Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

    – Ivan Kokan
    yesterday


















3














glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠ add a suitable pdfglyphtounicode:



documentclass{article}

usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}

input{glyphtounicode}
pdfgentounicode=1

usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}

begin{document}
$ne$
end{document}


(I found the nequal by looking in the txsyc.pfb).






share|improve this answer
























  • Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

    – Ivan Kokan
    yesterday
















3












3








3







glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠ add a suitable pdfglyphtounicode:



documentclass{article}

usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}

input{glyphtounicode}
pdfgentounicode=1

usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}

begin{document}
$ne$
end{document}


(I found the nequal by looking in the txsyc.pfb).






share|improve this answer













glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠ add a suitable pdfglyphtounicode:



documentclass{article}

usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage[croatian]{babel}

input{glyphtounicode}
pdfgentounicode=1

usepackage{newtxtext}
usepackage{newtxmath}
pdfglyphtounicode{nequal}{2260}

begin{document}
$ne$
end{document}


(I found the nequal by looking in the txsyc.pfb).







share|improve this answer












share|improve this answer



share|improve this answer










answered yesterday









Ulrike FischerUlrike Fischer

194k8302687




194k8302687













  • Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

    – Ivan Kokan
    yesterday





















  • Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

    – Ivan Kokan
    yesterday



















Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

– Ivan Kokan
yesterday







Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

– Ivan Kokan
yesterday




















draft saved

draft discarded




















































Thanks for contributing an answer to TeX - LaTeX Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f477433%2fwrong-mappings-when-copying-from-pdf-with-pdfgentounicode-with-newtx%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

El tren de la libertad Índice Antecedentes "Porque yo decido" Desarrollo de la...

Castillo d'Acher Características Menú de navegación

Connecting two nodes from the same mother node horizontallyTikZ: What EXACTLY does the the |- notation for...