In a previous article I used the Python programming language and machine learning algorithms to figure out who wrote the individual chapters of a textbook. This problem is known as authorship attribution, and uses techniques from the field of stylometry (or textometry). The general idea is as follows: use textual analysis and natural language processing to characterize an author's individual writing style, and statistical inference to determine the most likely author of an unlabelled or anonymous piece of work.
This page contains a free online tool for you to conduct your own authorship attribution experiments.
There are many techniques for authorship attribution, and the best approach depends on the nature of the text being analysed. Therefore, several different methods have been implemented below, and the results will require some degree of interpretation. For example, if the different techniques independently come to the same conclusion, that should give you more confidence in the overall result. These are tools for exploratory analysis, and do not provide conclusive or definitive answers. In theory the underlying algorithms are language independent, but have been developed, tested and tuned using a database of English text.
- Copy and paste samples of the first author's writing into the "Author 1" box. The general rule of thumb is "more text is better than less text", as this will help build a more robust model.
- Copy and paste samples of the second author's writing into the "Author 2" box.
- Copy and paste text into the "Unknown Author" box. This is the text that will be analysed and attributed to Author 1 or Author 2.
- Click the submit button - that's all there is to it!
If you have any questions, I can be contacted at email@example.com. Please let me know if you find any bugs, and feel free to request additional functionality. A commercial/forensic version of this software is under development — email for more information. I would also like to hear about your area of interest. For example, are you looking at homework, online comments or historical documents? Are you trying to detect when students aren't submitting their own work, or who is writing a defamatory blog? Please share.
This tool is provided for educational and entertainment purposes only. No guarantee is given as to the accuracy of the results, and the outcomes are not to be used for commercial or legal purposes.