I now have both short (~270,000 words) and long (4,694,000+ words) versions of Wiktionary. The main problem with the long version is that only about one in forty entries is recognisable as a word. One of the first recognisable words is %ile, short for percentile.
> zamburak
not in short Wiktionary but is in long Wiktionary
> Ecky Thump
“ecky” (adj) in short Wiktionary :) :)
> metamagical themas
New one! I don’t have it.
> minkya
New one! May be in the foreign language version of Wiktionary, I’ll check, nope.
> ethionyl…
No, closest is
“methionylglutaminylarginyltyrosylglutamylserylleucylphenylalanylal…serine” in long Wiktionary
> naaaaah
New one! I don’t have it. I have “naah” in a database of most commonly used words in TV and film scripts.
> mamihlapinatapei
Have you spelled that correctly? I have “mamihlapinatapai” in long Wiktionary ;) ;) Even more surprising, the spellchecker that I use as I type this recognises “mamihlapinatapai”.
> Yaghan
New one!
> Tierra del Fuego
“Tierra del Fuego” (proper) in short Wiktionary
… Just noticed “Tidley Winks” (proper) also makes it into short Wiktionary
All joking aside, I can’t use the long Wiktionary list because it contains so much junk.
> Geographical word lists?
The problem there is that all lists I’ve found so far are either too long or too short. Lists of capital cities and largest cities look useful, but all the lists of rivers I’ve found so far have a lot of rivers I’ve never heard of. I recognise practically none of the geographical features recorded in gazetteers, such as the list of all place names on Google Earth.
> Language word Lists?
They look really promising, they need a lot of parsing to remove extraneous detail.
> Does it include words from works of fiction?
Some, not enough. eg. I happened to notice “jolinar”, a character from Stargate, on one of my wordlists. Know where to look for more?
> looked at? The Oxford English Dictionary (20 Volume Set) / Edition 2
http://www.barnesandnoble.com/w/oxford-english-dictionary-j-a-simpson/1101392458
> have you ever seen the size of an unabridged dictionary ?
Saw it in hardcopy many years ago, before the WWW existed. The complete OED is now accessible online, but I lost access to it. I wonder – perhaps I can get access again through a local library?
> You should get a list of all Wikipedia articles
Yes. I should. I tried to. Wikipedia has a downloadable backup through DBpedia, but navigating through the DBpedia website always eventually led me back to the FAQ page, which is blank; I never found the data. I’ll try a different route.
The great thing about Wikipedia is that I can count the number of times each word appears, so can delete words that are rarely used. I note that “Metamagical Themas” has its own page on Wikipedia, but is missing from Wiktionary.
I couldn’t be sure about finding words like naaah or ‘puter on Wikipedia.
Thanks all, will keep you updated.