-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use native Unicode font on Window & macOS, ignore missing stopword file #219
base: main
Are you sure you want to change the base?
Conversation
It seems like a good idea to default to a unicode enabled font. What do you think about shipping a different open source font? That might be more robust than what you did. I'm not familiar enough with windows to know on how many different versions this works. |
Noto Sans seems like a good candidate: https://www.google.com/get/noto/ |
It seems like a good idea to default to a unicode enabled font. What
do you think about shipping a different open source font? That might
be more robust than what you did. I'm not familiar enough with windows
to know on how many different versions this works
Hi Andreas,
I love your module. It's very easy to work with. Thank you for sharing it.
I agree in principle that it's a sound idea to supply the font. I
searched for a decent open source font that included Chinese and
couldn't find one. I had high hopes for unifont-9.0.06.ttf from
http://unifoundry.com/unifont.html, but it just looked like crap. If
your google mojo is better than mine, fantastic, but I eventually gave
up. There are just too many Chinese characters, I suspect, for a lot of
font developers.
What I did was have the program look for three different fonts.
* "ARIALUNI.ttf" was installed on one of my Windows computers (on
which I have Word) but was not on another (which doesn't have Word.
This leads me to guess that Word may have been the source of that
font. But maybe I've just internationalized my desktop better than
my laptop.)
* If that fails, "Arial Unicode.ttf" is installed on my osX 10.10.5
computer (which does not have Word, so I'm guessing this is a
standard system font on the Mac).
* If that fails, it searches for "Arial.ttf", which I'm pretty sure is
essentially universal on Windows and OS X / macOS, but which does
not include all Unicode characters. (Most European languages are
supported. Chinese and Arabic are both missing. These are my two
test Unicode languages, as I have user-provided translations for my
program. I didn't check Japanese, Korean, etc.) This leaves us no
worse off than if we use DroidSansMono.
* If that fails, the program uses the DroidSansMono font you supply.
It was unclear to me if that font was freely distributable, and I
haven't had time to check into that yet.
It's not a perfect methodology, but of course users always have the
option to supply their own font of choice, which we can assume will
include the characters they need.
I'll grant you, it's not a perfect solution, but it's the best I could
come up with. At least for my needs, I think it will work. Most of my
customers are academics using university-supplied hardware which is
likely to have Word installed.
David
…--
David K. Woods, Ph.D.
President, Researcher, Lead Transana Developer
https://www.transana.com
|
This does look promising. I'll let you know what I think once I've
tried it.
Thanks,
David
…On 01/12/2017 12:00 PM, Andreas Mueller wrote:
Noto Sans seems like a good candidate: https://www.google.com/get/noto/
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#219 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AL5BDZhxW0CTOdASymy6AVvcsUJ3Ullkks5rRmpBgaJpZM4LiCrO>.
--
David K. Woods, Ph.D.
President, Researcher, Lead Transana Developer
https://www.transana.com
|
Unfortunately, Noto Sans is a large number of language-specific fonts.
There isn't a single font we could select that would work accoss
languages that I can see.
David
…On 01/12/2017 12:00 PM, Andreas Mueller wrote:
Noto Sans seems like a good candidate: https://www.google.com/get/noto/
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#219 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AL5BDZhxW0CTOdASymy6AVvcsUJ3Ullkks5rRmpBgaJpZM4LiCrO>.
--
David K. Woods, Ph.D.
President, Researcher, Lead Transana Developer
https://www.transana.com
|
I think there is: https://noto-website-2.storage.googleapis.com/pkgs/NotoSans-hinted.zip |
Nope. Here's the output of my sample program using ARIALUNI.ttf,
NotoSans-Regular.ttf, and NotoSansUI-Regular.ttf. Only ArialUni allows
mixing languages. My software is used in the Second Language
Acquisition community, among other fields, so being able to combine
arbitrary languages in the same WordCloud is one of my desired goals.
Noto's model of different TrueType files for different languages doesn't
work for what I'm trying to do with this code change.
David
![arialuni](https://cloud.githubusercontent.com/assets/12468493/21911647/453eadf8-d8e7-11e6-8f4f-1c1888390014.jpg)
![notosans-regular](https://cloud.githubusercontent.com/assets/12468493/21911652/4cd5d276-d8e7-11e6-9fe4-61ce8dcc3f06.jpg)
![notosansui-regular](https://cloud.githubusercontent.com/assets/12468493/21911657/50f56056-d8e7-11e6-9c65-97548fc165b1.jpg)
…On 01/12/2017 04:18 PM, Andreas Mueller wrote:
I think there is:
https://noto-website-2.storage.googleapis.com/pkgs/NotoSans-hinted.zip
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#219 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AL5BDesdNp1T7csHyu_ZUapxCAR5MIq3ks5rRqa5gaJpZM4LiCrO>.
--
David K. Woods, Ph.D.
President, Researcher, Lead Transana Developer
https://www.transana.com
|
thanks for trying! I'll come back to this in a bit, I'm crazy swamped right now! |
Hi Andreas,
You may recall that I submitted a patch about 3 weeks ago about unicode
fonts. I had a second issue in my patch at the time that we barely
touched upon, in that I added an try ... except block around the
stopwords list. When I built my application on Windows, the default
stopwords list does not get included in the distributable build's
site_packages structure (causing a crash), and I didn't want a default
stopwords list, so I sent you a patch that creates an empty stopwords
list if one is not found.
I want to let you know about a new, very strange issue I've found with
word_cloud when built to a python executable using py2exe on Windows.
Things are fine with the 1.2.1 release, but if I build from near-current
source code, when I go to uninstall my program, I get error messages
about DLLs that are "not supported by windows or contain errors". I've
seen two different DLLs mentioned, something like "mpr.dll" and
"api-ms-win-core-apiquery-l1-1-0.dll" at different times.
To be clear, I'm using 32-bit Python 2.7.12 on Windows 10 using py2exe
0.6.9 and InnoSetup 5.5.8. This problem occurs if I use my own
word_cloud build from recent source code, but not if I use the
word_cloud version installed by pip.
I recognize this may not be "actionable" at this time, but I wanted to
at least make you aware of it in the hope that maybe the specifics make
sense to you. Or maybe you can tell me how to fix my build process.
If you'd like, I can send you a minimal example in a week or two, but
I'm trying to get a release of my program out the door right now and
can't afford to take the time to develop the example right now. Let me
know if it's something you'd like me to do, and I'll get you a full
example as soon as I can.
David
…On 01/13/2017 10:38 AM, Andreas Mueller wrote:
thanks for trying! I'll come back to this in a bit, I'm crazy swamped
right now!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#219 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AL5BDfVcTb3Id_Nb9pLc_41qLX502c7oks5rR6h0gaJpZM4LiCrO>.
--
David K. Woods, Ph.D.
President, Researcher, Lead Transana Developer
https://www.transana.com
Please note that my e-mail address is now [email protected]
|
Hey David. Sorry I didn't follow up on your patch - it's on my todo list, but my todo list is long. Andy |
d5ef10d
to
ab85b24
Compare
I've modified FONT_PATH so it seeks a font on Windows and macOS that supports Unicode characters, such as Chinese and Arabic. If this fails, or if you are on Linux, it falls back to the DroidSansMono font.
Also, if the default stopwords file is missing, the program creates an empty set. (I didn't want default stopwords, and py2exe on Windows didn't copy the stopwords file.)