Wrong mappings when copying from PDF with pdfgentounicode with newtxTrouble with the newtx packageHow to fix...

What's the rationale behind the objections to these measures against human trafficking?

Talents during the time of Achasverous

Why was it necessary for Jesus to go through Samaria in John 4:4

80’s/90’s fantasy book where men travelled across the land and inhabited creatures such as deer, beavers and pixies

Meth dealer reference in Family Guy

How to define a macro with multiple optional parameters?

Compare four integers, return word based on maximum

What's the purpose of these copper coils with resitors inside them in A Yamaha RX-V396RDS amplifier?

Can I retract my name from an already published manuscript?

Is my plan for fixing my water heater leak bad?

Do my Windows system binaries contain sensitive information?

Soft question- The Bashing Technique and Other powerful techniques for Olympiads

Should the .gitignore include an entry for .vscode when using Git and VSCode

How to get Region text from address id (Magento 2)

Criticizing long fiction. How is it different from short?

How to properly claim credit for peer review?

What are these green text/line displays shown during the livestream of Crew Dragon's approach to dock with the ISS?

Can the Count of Monte Cristo's calculation of poison dosage be explained?

Skis versus snow shoes - when to choose which for travelling the backcountry?

Auto Insert date into Notepad

Could quantum mechanics be necessary to analyze some biology scenarios?

awk unexpectedly removes dot from string

How can I mix up weapons for large groups of similar monsters/characters?

What's the difference between a cart and a wagon?

Wrong mappings when copying from PDF with pdfgentounicode with newtx

Trouble with the newtx packageHow to fix missing or incorrect mappings from glyphtounicode.texThe newtx isn't compatible with pifont?Where should I report wrong Unicode mappings?Problem with new MNRAS style files / newtx on arXivinserting a single unicode character with pdflatexNested fonts fail with the newtx (newtxtext) package (specifically textsc, textup, and textbf)Italicized footnote number within equation (with newtx/newpx fonts)Newtx is blocking creation of a pdfError on rendering mathematical symbol - getting source file of rfxlri-alt from newtx package

It seems that newtx is nowadays the best option for Times font, and I wanted to test it with respect to Croatian specifics, but also with respect to some common math symbols, and copying the content from generated PDF.

Consider the following MWE:

documentclass{article}



usepackage[T1]{fontenc}

usepackage[utf8]{inputenc}

usepackage[croatian]{babel}



input{glyphtounicode}

pdfgentounicode=1



usepackage{newtxtext}

usepackage{newtxmath}





begin{document}



a b c d e š đ č ć ž Š Đ Č Ć Ž



bfseries

a b c d e š đ č ć ž Š Đ Č Ć Ž



sffamily

a b c d e š đ č ć ž Š Đ Č Ć Ž



ttfamily

a b c d e š đ č ć ž Š Đ Č Ć Ž



[

  x ne neq y quad

  x le leq leqslant y quad

  x ge geq geqslant y

]



[

  x coloneq coloneqq y quad

  y eqcolon eqqcolon x

]



end{document}

Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

x ,, y x ≤≤6 y x ≥≥> y

x BB y y CC x

Hex dump

Found issues:

I expected both ne and neq to be mapped to U+2260, but they are mapped to U+002C (regular comma).

Both le (ge) and leq (le) are mapped correctly, but slanted variants are not - they should be mapped to U+2A7D (U+2A7E).

Both coloneq (coloneqq) and eqcolon (eqqcolon) are not mapped correctly (to U+2254 and U+2255), but to regular B and C.

Although it looks like đ and Đ are copied correctly, they are not. đ is mapped to U+0111 while Đ is mapped to U+00D0 (not an lowercase-uppercase pair). Considering these two pairs:
- U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE

... I would say that c3*s are used in Icelandic language, while the latter two c4*s correspond to Croatian language. I also checked my source code, and hereby I confirm that my đ and Đ from the keyboard ended as c4*s encoding the source file in UTF-8.

Questions:

ne and neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?

slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

coloneq and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

đ and Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

I am very keen to contribute for the improvements for all issues.

Kind regards, Ivan

EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.

edited 12 hours ago

asked yesterday

Ivan Kokan

5510

1

Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.

– egreg
yesterday

Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these: DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?

– Ivan Kokan
yesterday

1

Do pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.

– egreg
yesterday

OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.

– Ivan Kokan
yesterday

I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about Đ is explained very well, I should have read that way before.) I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?

– Ivan Kokan
12 hours ago

|
show 1 more comment

Consider the following MWE:

documentclass{article}



usepackage[T1]{fontenc}

usepackage[utf8]{inputenc}

usepackage[croatian]{babel}



input{glyphtounicode}

pdfgentounicode=1



usepackage{newtxtext}

usepackage{newtxmath}





begin{document}



a b c d e š đ č ć ž Š Đ Č Ć Ž



bfseries

a b c d e š đ č ć ž Š Đ Č Ć Ž



sffamily

a b c d e š đ č ć ž Š Đ Č Ć Ž



ttfamily

a b c d e š đ č ć ž Š Đ Č Ć Ž



[

  x ne neq y quad

  x le leq leqslant y quad

  x ge geq geqslant y

]



[

  x coloneq coloneqq y quad

  y eqcolon eqqcolon x

]



end{document}

Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

x ,, y x ≤≤6 y x ≥≥> y

x BB y y CC x

Hex dump

Found issues:

I expected both ne and neq to be mapped to U+2260, but they are mapped to U+002C (regular comma).

Both le (ge) and leq (le) are mapped correctly, but slanted variants are not - they should be mapped to U+2A7D (U+2A7E).

Both coloneq (coloneqq) and eqcolon (eqqcolon) are not mapped correctly (to U+2254 and U+2255), but to regular B and C.

Although it looks like đ and Đ are copied correctly, they are not. đ is mapped to U+0111 while Đ is mapped to U+00D0 (not an lowercase-uppercase pair). Considering these two pairs:
- U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE

Questions:

ne and neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?

slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

coloneq and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

đ and Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

I am very keen to contribute for the improvements for all issues.

Kind regards, Ivan

EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.

edited 12 hours ago

asked yesterday

Ivan Kokan

5510

1

Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.

– egreg
yesterday

Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these: DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?

– Ivan Kokan
yesterday

1

Do pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.

– egreg
yesterday

OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.

– Ivan Kokan
yesterday

I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about Đ is explained very well, I should have read that way before.) I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?

– Ivan Kokan
12 hours ago

|
show 1 more comment

Consider the following MWE:

documentclass{article}



usepackage[T1]{fontenc}

usepackage[utf8]{inputenc}

usepackage[croatian]{babel}



input{glyphtounicode}

pdfgentounicode=1



usepackage{newtxtext}

usepackage{newtxmath}





begin{document}



a b c d e š đ č ć ž Š Đ Č Ć Ž



bfseries

a b c d e š đ č ć ž Š Đ Č Ć Ž



sffamily

a b c d e š đ č ć ž Š Đ Č Ć Ž



ttfamily

a b c d e š đ č ć ž Š Đ Č Ć Ž



[

  x ne neq y quad

  x le leq leqslant y quad

  x ge geq geqslant y

]



[

  x coloneq coloneqq y quad

  y eqcolon eqqcolon x

]



end{document}

Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

x ,, y x ≤≤6 y x ≥≥> y

x BB y y CC x

Hex dump

Found issues:

I expected both ne and neq to be mapped to U+2260, but they are mapped to U+002C (regular comma).

Both le (ge) and leq (le) are mapped correctly, but slanted variants are not - they should be mapped to U+2A7D (U+2A7E).

Both coloneq (coloneqq) and eqcolon (eqqcolon) are not mapped correctly (to U+2254 and U+2255), but to regular B and C.

Although it looks like đ and Đ are copied correctly, they are not. đ is mapped to U+0111 while Đ is mapped to U+00D0 (not an lowercase-uppercase pair). Considering these two pairs:
- U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE

Questions:

ne and neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?

slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

coloneq and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

đ and Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

I am very keen to contribute for the improvements for all issues.

Kind regards, Ivan

EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.

edited 12 hours ago

asked yesterday

Ivan Kokan

5510

Consider the following MWE:

documentclass{article}



usepackage[T1]{fontenc}

usepackage[utf8]{inputenc}

usepackage[croatian]{babel}



input{glyphtounicode}

pdfgentounicode=1



usepackage{newtxtext}

usepackage{newtxmath}





begin{document}



a b c d e š đ č ć ž Š Đ Č Ć Ž



bfseries

a b c d e š đ č ć ž Š Đ Č Ć Ž



sffamily

a b c d e š đ č ć ž Š Đ Č Ć Ž



ttfamily

a b c d e š đ č ć ž Š Đ Č Ć Ž



[

  x ne neq y quad

  x le leq leqslant y quad

  x ge geq geqslant y

]



[

  x coloneq coloneqq y quad

  y eqcolon eqqcolon x

]



end{document}

Opening the generated PDF in the latest Adobe Acrobat Reader DC, and copying the content into Unicode-friendly editor, I get this:

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

a b c d e š đ č ć ž Š Ð Č Ć Ž

x ,, y x ≤≤6 y x ≥≥> y

x BB y y CC x

Hex dump

Found issues:

I expected both ne and neq to be mapped to U+2260, but they are mapped to U+002C (regular comma).

Both le (ge) and leq (le) are mapped correctly, but slanted variants are not - they should be mapped to U+2A7D (U+2A7E).

Both coloneq (coloneqq) and eqcolon (eqqcolon) are not mapped correctly (to U+2254 and U+2255), but to regular B and C.

Although it looks like đ and Đ are copied correctly, they are not. đ is mapped to U+0111 while Đ is mapped to U+00D0 (not an lowercase-uppercase pair). Considering these two pairs:
- U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH
- U+00F0 ð c3 b0 LATIN SMALL LETTER ETH
- U+0110 Đ c4 90 LATIN CAPITAL LETTER D WITH STROKE
- U+0111 đ c4 91 LATIN SMALL LETTER D WITH STROKE

Questions:

ne and neq: Can someone explain why the mapping is wrong, and can that be improved so they are both correctly mapped?

slanted variants: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

coloneq and others: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

đ and Đ: Can someone explain why the mapping is wrong, and can that be improved so they are correctly mapped?

I am very keen to contribute for the improvements for all issues.

Kind regards, Ivan

EDIT: The question about đ and Đ is related to T1 font encoding, not the newtx, as explained in comments below.

unicode times newtx

edited 12 hours ago

asked yesterday

Ivan Kokan

5510

edited 12 hours ago

asked yesterday

Ivan Kokan

5510

edited 12 hours ago

asked yesterday

Ivan Kokan

5510

asked yesterday

Ivan Kokan

5510

asked yesterday

Ivan Kokan

5510

1

Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.

– egreg
yesterday

Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these: DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?

– Ivan Kokan
yesterday

1

Do pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.

– egreg
yesterday

OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.

– Ivan Kokan
yesterday

I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about Đ is explained very well, I should have read that way before.) I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?

– Ivan Kokan
12 hours ago

|
show 1 more comment

1

Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.

– egreg
yesterday

Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these: DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?

– Ivan Kokan
yesterday

1

Do pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.

– egreg
yesterday

OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.

– Ivan Kokan
yesterday

I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about Đ is explained very well, I should have read that way before.) I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?

– Ivan Kokan
12 hours ago

Curiously enough, if I copy from a different PDF viewer (Skim, in my case), I get the correct glyphs, with or without glyphtounicode for the text part. Symbols are only partially recognized. With glyphtounicode I get all the text glyphs correct on Adobe Acrobat Reader. The fact that Ð is incorrectly mapped to ETH is kind of expected, as the T1 encoding doesn't have two different glyphs for the D with stroke and the ETH.

– egreg
yesterday

Thanks for the prompt reply. I dug into T1-specific definition files in my MiKTeX and found these: DeclareUnicodeCharacter{00D0}{DH}, DeclareUnicodeCharacter{00F0}{dh}, DeclareUnicodeCharacter{0110}{DJ}, DeclareUnicodeCharacter{0111}{dj}, DeclareTextSymbol{DH}{T1}{208}, DeclareTextSymbol{DJ}{T1}{208}, DeclareTextSymbol{dh}{T1}{240}, DeclareTextSymbol{dj}{T1}{158}. As you wrote, DJ and DH point to the same slot (208) and that causes experienced behavior. How can I "check" what all slots contain?

– Ivan Kokan
yesterday

Do pdflatex nfssfont from the terminal and hit return at the prompts until you get *. Then type tablebye and the PDF file will show the full table for a T1 encoded font.

– egreg
yesterday

OK, I see that table is actually given in encguide. So, the idea of having correctly paired mappings for đ and Đ ("with strokes") is not feasible for T1, actually for non-Unicode based TeX system.

– Ivan Kokan
yesterday

I have read encguide thoroughly on this matter since yesterday, and things are much clearer to me now. (The issue about Đ is explained very well, I should have read that way before.) I have even tried changing /Eth to /Dcroat in lm-ec.enc and tested new MWE using lmodern and ð Ð đ Đ; the outcome of copying the content from PDF was expected - everything was correct except Eth. I suppose all slots must be mapped uniquely to only one glyph (name), and /Eth was the decision (some history here tug.org/fontname/ec.enc)?

– Ivan Kokan
12 hours ago

|
show 1 more comment

1 Answer
1

active

oldest

votes

glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠ add a suitable pdfglyphtounicode:

documentclass{article}



usepackage[T1]{fontenc}

usepackage[utf8]{inputenc}

usepackage[croatian]{babel}



input{glyphtounicode}

pdfgentounicode=1



usepackage{newtxtext}

usepackage{newtxmath}

pdfglyphtounicode{nequal}{2260}



begin{document}

$ne$

end{document}

(I found the nequal by looking in the txsyc.pfb).

answered yesterday

Ulrike Fischer

194k8302687

Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

– Ivan Kokan
yesterday

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "85"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f477433%2fwrong-mappings-when-copying-from-pdf-with-pdfgentounicode-with-newtx%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠ add a suitable pdfglyphtounicode:

documentclass{article}



usepackage[T1]{fontenc}

usepackage[utf8]{inputenc}

usepackage[croatian]{babel}



input{glyphtounicode}

pdfgentounicode=1



usepackage{newtxtext}

usepackage{newtxmath}

pdfglyphtounicode{nequal}{2260}



begin{document}

$ne$

end{document}

(I found the nequal by looking in the txsyc.pfb).

answered yesterday

Ulrike Fischer

194k8302687

Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

– Ivan Kokan
yesterday

add a comment |

glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠ add a suitable pdfglyphtounicode:

documentclass{article}



usepackage[T1]{fontenc}

usepackage[utf8]{inputenc}

usepackage[croatian]{babel}



input{glyphtounicode}

pdfgentounicode=1



usepackage{newtxtext}

usepackage{newtxmath}

pdfglyphtounicode{nequal}{2260}



begin{document}

$ne$

end{document}

(I found the nequal by looking in the txsyc.pfb).

answered yesterday

Ulrike Fischer

194k8302687

Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

– Ivan Kokan
yesterday

add a comment |

glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠ add a suitable pdfglyphtounicode:

documentclass{article}



usepackage[T1]{fontenc}

usepackage[utf8]{inputenc}

usepackage[croatian]{babel}



input{glyphtounicode}

pdfgentounicode=1



usepackage{newtxtext}

usepackage{newtxmath}

pdfglyphtounicode{nequal}{2260}



begin{document}

$ne$

end{document}

(I found the nequal by looking in the txsyc.pfb).

answered yesterday

Ulrike Fischer

194k8302687

glyphtounicode.tex contains a lot declarations to map glyph names to unicode points. But it is not complete. To get e.g. the ne to copy as ≠ add a suitable pdfglyphtounicode:

documentclass{article}



usepackage[T1]{fontenc}

usepackage[utf8]{inputenc}

usepackage[croatian]{babel}



input{glyphtounicode}

pdfgentounicode=1



usepackage{newtxtext}

usepackage{newtxmath}

pdfglyphtounicode{nequal}{2260}



begin{document}

$ne$

end{document}

(I found the nequal by looking in the txsyc.pfb).

answered yesterday

Ulrike Fischer

194k8302687

answered yesterday

Ulrike Fischer

194k8302687

answered yesterday

Ulrike Fischer

194k8302687

answered yesterday

Ulrike Fischer

194k8302687

Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

– Ivan Kokan
yesterday

add a comment |

Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

– Ivan Kokan
yesterday

Thanks! I guess that adding such definitions in newtx itself (or other fonts it is compound of) would be even better than defining it locally, right?

– Ivan Kokan
yesterday

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to TeX - LaTeX Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Kdtyjb