|
|
This article in Nature claims:
A new computer program can tell whether a book was written by a man or a woman. The simple scan of key words and syntax is around 80% accurate on both fiction and non-fiction.
The program's success seems to confirm the stereotypical perception of differences in male and female language use. Crudely put, men talk more about objects, and women more about relationships.
Female writers use more pronouns (I, you, she, their, myself), say the program's developers, Moshe Koppel of Bar-Ilan University in Ramat Gan, Israel, and colleagues. Males prefer words that identify or determine nouns (a, the, that) and words that quantify them (one, two, more).
So this article would already, through sentences such as this, have probably betrayed its author as male: there is a prevalence of plural pronouns (they, them), indicating the male tendency to categorize rather than personalize.
If I were female, the researchers imply, I'd be more likely to write sentences like this, which assume that you and I share common knowledge or engage us in a direct relationship. These differing styles have previously been called 'informational' and 'involved', respectively.
They tried the program out on 566 English-language works to achieve the scores above. A. S. Byatt's (crap) book "Possession" was misclassified by gender, along with Kazuo Ishiguro's "The Remains of the Day".
The research team are now testing texts from further back in history and in other languages to see if the same findings result.
Choosing some online texts by (so far as I know) female writers, I gave it the first five paragraphs of Mary Shelley’s "Frankenstein" and George Eliot’s "Middlemarch" and it got their gender right.
I gave it the first five paragraphs of Edith Wharton’s "The Age of Innocence", Anna Sewell’s "Black Beauty" and Mrs Gaskell’s "Wives and Daughters" and it thought they were men.
I did the same with D H Lawrence’s "Sons and Lovers", Jack London’s "White Fang", Somerset Maugham’s "Of Human Bondage", Edgar Allan Poe’s "The Pit and the Pendulum", and Oscar Wilde’s "Lord Arthur Savile’s Crime".
The computer said Lawrence and London were clearly very male, Maugham and Wilde were male and Poe was female.
The Maugham and Wilde samples had short paragraphs, so I gave it the first 550 words. When I fed in the initial shorter text samples, it thought both Maugham and Wilde were female.
I tried it with some more recent writing from the Guardian site (where I first heard about the Gender Genie).
I submitted the first five hundred words (plus) of five articles by male writers and five by female.
It said David Aaronovitch, Iain Banks, Hugh Fearnley-Whittingstall, Julie Burchill, Germaine Greer, Christina Odone and Zadie Smith were male.
It said Gareth MacLean, Gary Younge and Sandi Toksvig were female.
I finished off by submitting the first two posts above and the programme decided both Todd and I are female.
I wonder how they came up with their claim of 80% accuracy? Of the twenty two samples I’ve just given it to analyse, it got less than fifty percent right.
Clearly, I am at a loose end, this Sunday night... |
|
|