TLA FYI

the data

Ahh, Three-Letter Acronyms: Time savers, confusion makers, sources of eternal annoyance. TLAs make managers feel competent, and let imaginative people enter rude words on high-score tables. But what makes a great TLA? What makes one stick? Why does BPM roll off the tongue, while QWJ makes you feel a bit sick? In an effort to find out, I have undertaken an exhaustive, or at least quite tiring, exploration into the TLA.

Acquiring Data

TLA-inatorTo learn about the TLA I first had to find them. This sounds fairly easy - they are everywhere. But to capture all TLAs for study required a bit of work.

First I tracked down URLs of acronym websites, and conducted some preliminary searches. After establishing several potential sources I wrote corresponding regular-expressions to screen-scrape out the results. The regular-expressions were used as input in my TLAinator application - which proceeded to do 17576 HTTP calls to each website, violate several terms-of-service, parse the results and store the TLA counts in a text file.

Next, using some almost obsolete Google API's, I retrieved "Estimated Google Results" on search queries "AAA" through "ZZZ" - to see who would win a TLA popularity contest.

Finally, the TLAs were programmatically run through Microsoft Word's spell checker to indicate if it was recognised as an English word, to ensure our TLA was not just a TLW.

The data gathering lasted about a month, due to a loss of interest in the project. But once complete, the results where collated into Excel for some preliminary data mining. Once playing with Excel got annoying enough, the files were loaded into a database table for some serious analysis...

the data

Data mining

The first interesting thing I found came from Google's Estimated Results results. Some searches returned an error, if the TLA was a very common word. Here are the 16 forbidden words:
all, and, any, are, can, com, for, new, not, oer, php, the, use, was, www, you.

Seems fair enough. But wait a minute... OER? what the flippin' heck does OER mean? Why can I happily search on every other combination of three letters, except for OER? If I search directly in Google, it's fine - about 4,180,000,000 results. I don't know why this PLA (poor lil' acronym) got the chop - perhaps it's the password to Google?

The top results that actually returned results from the Google API were:
may, top, web, one, has, but, see, our, add, que, out, inc, fax, htm, per, now, rss, die, get.

Any items of interest? Well, SEX was not number 1 - it showed up at the #189 position - just one spot below PAY. Coincidence? XXX is way down the list at #477.

For the scripters amongst us, ASP came in at #18. where as PHP was banned. We can assume PHP wins that one. For the animal lovers CAT gets #176 with DOG well behind at #226.

Image formats? JPG wins at #162. 2nd is GIF at #234. PNG a very distant 3rd at #730. Sadly my favourite format, the XBM image, barely charts at #8757.

The top spots from Google for non-Word-words were: que, htm, rss, por, jul, pdf, tel, dvd, faq, gov, der, sie, url, als, una... a mix of tech and French, but no real TLA info. It was time to move on...

What about the TLAs?!

It was certainly a tight competition for the most common TLA, and although no individual acronym stood out as a clear winner, the top 15 or so were streets ahead:

ACE, ACS, CPS, CCC, CAP, ABC, CPL, CES, SES, APS, BPS, CAT, MPS, CAS, CMS

So, if you're writing a new computer system, starting a business, or naming your band you might want to avoid these if you ever want to be found via google. Though, you don't just want to use an uncommon acronym - they're uncommon for a reason:

ZKQ, KZQ, QCZ, OJZ, QJV, QZJ, QWJ, QZM, YKQ, UZJ, FZQ, OXK, VZQ, XJQ, UXK

I don't think you'll hear your manager ask you for the QCJ on the YKQ any time soon. This bunch of tongue-twisters had the least number of acronyms, combined with the least number of google hits. Perhaps more useful for your business/band/system is the least number of acronyms combined with most number of google hits:

PAY, VEZ, MUY, BIJ, JEU, CZY, JAY, JEJ, WIJ, YOK, GOO, VOZ, JEG, ZIE, QTY

Interestingly, this bunch consists of a large number of actual acronyms (initialisations which you pronounce as a word) rather than just initialisations (which I have been incorrectly calling acronyms).

Patterns

Next up, I had a look at "patterns" that made popular TLAs - for example, acronyms that containing the same first and third letter.

The most popular First And Third (FAT) TLAs were: SES, SPS, SAS, ASA, PIP, SOS, ACA, STS, SDS, CSC, SIS, PEP, SMS, CRC, SCS.

The top SAT TLAs: ACC, SCC, MCC, PCC, ICC, CSS, PSS, TCC, NCC, ECC, ESS, ISS, DSS. Of note, the top 17 SAT TLAs all end in CC or SS.

Top FAS TLAs: CCS, SSP, PPS, CCA, SSC, PPL, MMS, CCP, SST, CCL, MMC, AAC, SSA, SSI, AAS, CCR, CCM.

First Second Third (FST) TLAs went: CCC, AAA, SSS, PPP, DDD, TTT, FFF, BBB, MMM, WWW, EEE, RRR.

FAT TLAs are the most popular, followed by SAT, then FAS. FST TLAs are not so popular at all - I suppose they don't feel very poetic. Try WWW for example.

What can we take away from the pattern-based study? Well, if you're looking to include a pattern in your TLA - double C's, double S's, double P's rule the day.

Groups

Finally, an analysis of the most popular letters in TLAs. This was done by grouping all TLAs by FLO (First Letter Only), FAT, and FST.

FLO and FAT had surprisingly similar results, and it is clear that if you want a really catchy TLA then it better start with S, C, A, P, M, T or D. I call it Scapmtd's Law.

FLO: scapmtdirenblfwohgvuqjkyzx
FAT: scapmtdrliewfbnogvhukqykxz
FST: caspdtfbmwerlhngiokuvzxqyj

Actually, I can't remember why I started doing this now. I'm going outside to play some frisbee.