29 January 2013

SEO: Scraping synonyms from Wikipedia

Here's a Python script for scraping synonyms from Wikipedia. You provide the core keywords, and Python (plus the BeautifulSoup module) will get the synonyms.

The last article I wrote about getting SEO keywords from Wikipedia seemed interesting to people. The method was, however, manual, which takes time and effort to complete for more than a couple of keywords.

If you want to hurry things up or automate the process, below is a Python script which scrapes potential keywords from Wikipedia. If you’re going to be doing a lot of this, I’d be tempted to download a snapshot of the Wikipedia database and use that rather than downloading. This script is really intended for educational purposes or those who want to retrieve a very small list of synonyms.

The output is a CSV (comma separated file) that can be opened in a spreadsheet. I find LibreOffice much better at handling Unicode content from a CSV file than Excel and it’s a free download. Just don’t ask me how to import Unicode CSV with Excel!

Each series of synonyms requires a lot of cleaning but it’s easier than downloading it all yourself.

"""
Scrape synonyms from Wikipedia
"""

import urllib2
import BeautifulSoup as BS

# phrases go in here as a list of strings
# e.g., names = ['United_kingdom','United_States','Philippines'] looks for synonyms for the UK, US and the Philippines

names = ['United_kingdom','United_States','Philippines'] 

URL = "http://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/%s&hidetrans=1&hidelinks=1&limit=500"

fout = open('synonyms_test.csv','w')
for name in names:
    active_URL = URL%name
    print active_URL
    req = urllib2.Request(active_URL, headers={'User-Agent' : "Magic Browser"})
    data = urllib2.urlopen(req)
    stuffs = data.read()
    soup = BS.BeautifulSoup(stuffs)
    links_body = soup.find("ul", {"id" : "mw-whatlinkshere-list"})
    fout.write('%s, '%name)
    try:
        links = links_body.findAll('a')
        for link in links:
            if link.text != "links":
                fout.write('%s, '%link.text.encode('utf-8'))
        fout.write('\n')
    except AttributeError: # needed in case nothing is returned
        pass
fout.close()

A couple of things: this code needs BeautifulSoup installed. See their install notes on how to do this. What this module does is parse the Wikipedia page. The script then iterates through the names you’ve provided with scrapes a page for the first 500 links that are not transduction pages or just plain links. This reduces the problem space down to mostly synonymous re-directs which is what we want.

To run this script, you need to see line 11 and insert the phrases you want to retrieve synonyms for. Don’t leave spaces: replace them with underscores. The script also doesn’t check whether the phrase you’ve used is the canonical page either so that’s something you need to check for.

Once the script’s finished, load the CSV file into LibreOffice Calc (or some other form of spreadsheet that can load CSV files with Unicode), and delete anything that clearly isn’t a valid synonym for SEO purposes.

When that’s done, delete all the blanks and shift cells left (not up!), and you should have a spreadsheet full of nice synonyms that can enhance your SEO.

Happy scraping!

15 January 2013

Some SEO tips keywords

Here's a quick tip to get keywords to improve your search engine optimisation (SEO) using Wikipedia - for free! Enter your term into Wikipedia. If it's a brand name, enter the product type (e.g., "handbags').

Click on 'Toolbox' to the left and then 'What links here', and you'll be shown a new page that details all inbound links to that page within Wikipedia.

Then, under 'Filters', 'hide' both transclusions and links so that only re-directs to the page are shown.

And hey presto! There's a nice list of synonymous terms with a variety of spellings.

For example, handbag comes up with:


Clutch (handbag) (redirect page) ‎ (links)
Manbag (redirect page) ‎ (links)
Handbags (redirect page) ‎ (links)
Man bag (redirect page) ‎ (links)
Man-bag (redirect page) ‎ (links)
Manpurse (redirect page) ‎ (links)
Hand bag (redirect page) ‎ (links)
Hand-bag (redirect page) ‎ (links)
Hand-bags (redirect page) ‎ (links)
Man purse (redirect page) ‎ (links)
👜 (redirect page) ‎ (links)
Evening bag (redirect page) ‎ (links)

whereas 'telescope' comes up with:

TeleScope (redirect page) ‎ (links)
Telescopes (redirect page) ‎ (links)
Perspicil (redirect page) ‎ (links)
Telescopy (redirect page) ‎ (links)
Astronomic telescope (redirect page) ‎ (links)
Telescopic observational astronomy (redirect page) ‎ (links)
Telescopically (redirect page) ‎ (links)
Astronomical telescope (redirect page) ‎ (links)
Ground telescope (redirect page) ‎ (links)

13 January 2013

Prolog and UX Planning

Summary: Prolog is a logical programming language that can help craft perfect sitemaps and workflows by ensuring solutions meet all business and technical constraints. Here, I'll chat a little about Prolog and how it might be used, with more detailed information coming in future.

Part of Thought Into Design's work is natural language interfaces. Among the many tools we use is a language called Prolog. This is a logic language, quite strong on declarative style. It works by defining facts and rules and then asking queries. In some ways, it's how I envisaged computer programming to be, back in the early 1980s, before I ever programmed anything.

Examples of facts are:


man(alan).
man(tony).
woman(jell).
woman(ann).

These say (in order) that the atom 'alan' is a 'man', as is 'tony', whereas 'jell' and 'ann' are both classed as woman.

Rules determine how atoms relate to each other. Using the above code, we could define some rules thus:


human(X) :- man(X).
human(X) :- woman(X).

Everything classed as both 'man' and 'woman' is now also classed as 'human'.

With these in place, we can issue queries that tell us if a particular result is logically possible or not.

        human(X).
And we get a print out of everyone who is human. This is a very basic example and seems similar to a query language, but Prolog's power is in being able to infer relationships from what it's been told.

Prolog could be quite useful when crafting sitemaps and doing workflows, more so for the larger and complex sites rather than simple ones. There are often times when several different business rules need to be accounted for and the more complex the rule-set, the harder it is for a designer to navigate through them.

Prolog, or other logic languages, might be a way to help determine if particular sitemaps and workflows are valid solutions to problems or not.

My ideas are quite unformed as yet and this is something I hope to return to soon so watch this space!

Twitter Bootstrap for Responsive UX Design

Summary: We redesigned a website to be responsive using Twitter Bootstrap and JQuery to create design documentation. Bootstrap proved to be an effective tool for conventional interactions but less use with more complex stuff.

One task we've done lately has been to redesign the Thought Into Design site. It's quite boring and uncommunicative and the analytics suggest that there engagement can and should be better.

The broad business requirements are:

  • Offer a client list
  • Explain the types of work we do
  • Show work samples
  • Improve the user journey to contact us

After some initial planning, we decided to try out Twitter Bootstrap and frankly, it was a nice experience. There is a short summary of using Bootstrap with real clients which is well worth a read.

What we found was that it is an incredibly quick way to code some static pages up (HTML / CSS / Javascript) and was quite an enjoyable way to code after years of trying to get DIVs to fall into the right place. In some ways, it reminds me a little of table-based coding (and yes, I'm old enough to remember when virtually all sites were done that way!) so I have reservations. Fundamentally, coding is done by defining rows and then the number of elements (from 12) in each row. You can see the redesign in progress here: see the redesign in progress. Plus, responsiveness is baked in.

But after over a decade of doing wireframes, responsive design quickly muddies the water and increases the workload significantly for designers. Instead of a single set of wireframes, we now need to produce them for web, tablets and phones of various shapes and sizes.

Curiously, I remember being pooh-poohed a few years back when I suggested coding alternative CSS sheets for small screen devices like the Asus EEE netbook and was told that only web would ever be needed now. This was just before smart phones...

As an agency, we're happy to do what ever work is necessary to achieve the client's business goals. But it's also inefficient. What about if we could just skip the wireframing process and move directly onto working prototypes as documentation? This is something I've done before, particularly with complex interactions that needed to be tested dynamically, but coding was always a slow process. Bootstrap and JQuery have made the coding process a doddle now. I can see a future for designer-orientated tools that  handle this with less code and more visual creation (I'm thinking if there's a product to be made here).

Major advantages are:

  • Passing off working code to developers that documents the workflows and interactions within itself. 
  • Reducing time spent documenting interactions statically when a working prototype will do it better
  • Communicates the interactions to the stakeholders better


But it's not all roses. Going direct to code, for me personally, hinders the creative process. I need to have some aim before I code must like if I'm writing a web application, I need to spend time planning long before I write a single line of code.

In addition, while it can be instructive to be shown capabilities beyond my reckoning, I also need to be able to think / ideate beyond the capabilities of software. One example is how to communicate complex information using dynamic graphs / charts, and Bootstrap won't handle those complex interactions. For the simple, bread-and-butter stuff, Bootstrap is a superb tool. I will still run to my sketchpad as a first option,

Disadvantages are:

  • Might hinder the creative process
  • Doesn't help initial ideation
  • The working code might not be up to par
  • Deals poorly with widgets outside of its reckoning
  • Reinforces the idea that UX is about code
In Conclusion, Twitter Bootstrap is a very useful tool particularly for standard, conventional interactions. For example, planning forms with normal widgets. The resulting code is superb documentation for stakeholders, users (testing / research) and developers / testers. Less orthodox interactions, however, require a different framework for now.