❏ Special characters: Paragraph symbol not converting correctly from PDF to HTML
- Assigned to
-
Martin L.
- Notes
-
Hi
,
Martin
The converter tool is not recognizing the paragraph symbol correctly (pilcrow, ¶). Instead of the pilcrow symbol ¶, I see a letter "P"
SC/0056/02
PDF
HTML
Thanks,
Irit
Another example where the converter tool is not recognizing the paragraph symbol correctly (pilcrow, ¶). Instead of the pilcrow symbol ¶, I see a letter "J§2"
UN/0087/17. I did OCR Advanced on the desktop converter.
PDF, paragraph 2
HTML
IC/0420/20
The paragraph symbol converts to ",i 8"
PDF
Block Property
SC/0053/04
SC/0053/04 - Tologix - Desktop Converter
The paragraph symbol converts to "IN"
This is a scanned document, and the OCR engine is unable to properly identify these symbols.
As a test, I opened the PDF in Adobe Acrobat and copied the text. Here is the output when pasted:
Interestingly, both OCR engines seem to detect the same "IN".
So the problem is not with the converter, but rather it is due to the document being a scanned copy.
Thanks,
-Martin