Tologix - Desktop Converter

Special characters: Paragraph symbol not converting correctly from PDF to HTML

Assigned to
Martin Laporte, CTO at Tologix Martin L.
Notes
Hi Martin Laporte, CTO at Tologix Martin ,

The converter tool is not recognizing the paragraph symbol correctly (pilcrow, ¶).  Instead of the pilcrow symbol  ¶, I see a letter "P"

SC/0056/02

PDF

HTML

Thanks,
Irit

Comments & Events

Irit Weinfeld
Hi Martin Laporte, CTO at Tologix Martin ,

Another example where the converter tool is not recognizing the paragraph symbol correctly (pilcrow, ¶).  Instead of the pilcrow symbol  ¶, I see a letter "J§2"


UN/0087/17.  I did OCR Advanced on the desktop converter.

PDF, paragraph 2

HTML
Martin Laporte, CTO at Tologix
Hi Irit Weinfeld Irit ,

This is a scanned document, and the OCR engine is unable to properly identify these symbols.

As a test, I opened the PDF in Adobe Acrobat and copied the text. Here is the output when pasted:
h) In footnote 70, "Resp. Memorial IN 347-376" is corrected to "Resp. Memorial IN
404-411";

Interestingly, both OCR engines seem to detect the same "IN".
So the problem is not with the converter, but rather it is due to the document being a scanned copy.

Thanks,
-Martin