Although Microsoft Word has had basic grammar checking for ages, other applications like browsers and email programs have traditionally left grammar as an exercise for the reader writer.

That is changing with the advent of online tools like the heavily advertised Grammarly, which promise writing suggestions for spelling, grammar, and even style. However, Grammarly is currently only available in English. It is also reasonable to be suspicious of their data collection policy; while Grammarly claims not to sell user data, they do collect all texts and use them to improve their AI models.

Enter LanguageTool, a free open-source style checker. It supports over 30 languages and offers plugins for browsers and word processors. It also has excellent privacy protections compared to other options.

Keeping content private

By default, LanguageTool offers a cloud service for checking. What differentiates them from competitors is that they do not store any content and do not sell information about users, unlike other services which profit from training their models on your data. It is probably reasonable for most users to install the LanguageTool plugins and use the default cloud service.

For even better privacy protection, you can also run the checker locally. This way no data is transmitted to their servers, and has the added benefit of working offline. This consists of running a light java program whenever you want style checking enabled. In the instructions below I will describe how to set it up on macOS so that it runs automatically in the background.

Installing LanguageTool locally on a Mac

General instructions are available at LanguageTool embedded HTTP Server and should work on multiple platforms. These instructions focus on additional steps needed to run LanguageTool the background. Knowledge of the command line is assumed.

First, download the LanguageTool Desktop version for offline use and move it to a good location. I chose /usr/local/LanguageTool for my installation.

Check that you have Java installed by running java -version. If it’s not at least Java 8 you will need to install it (e.g. with brew install openjdk).

Create a wrapper script to run the tool and save it as languagetool.sh:

#!/bin/bash cd "$( dirname "${BASH_SOURCE[0]}" )" exec java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081 --allow-origin "*"

Make the file executable and test it out:

/usr/local/LanguageTool$ chmod +x languagetool.sh /usr/local/LanguageTool$ ./languagetool.ch 2021-03-02 22:57:34 +0100 INFO org.languagetool.server.DatabaseAccess Not setting up database access, dbDriver is not configured 2021-03-02 21:57:34 +0000 WARNING: running in HTTP mode, consider running LanguageTool behind a reverse proxy that takes care of encryption (HTTPS) 2021-03-02 21:57:35 +0000 Setting up thread pool with 10 threads 2021-03-02 21:57:35 +0000 Starting LanguageTool 5.2 (build date: 2020-12-29 15:54, eb572bf) server on http://localhost:8081... 2021-03-02 21:57:35 +0000 Server started

In a new terminal, verify that you can connect to the local server.

$ curl --data "language=en-US&text=a simple test" http://localhost:8081/v2/check {"software":{"name":"LanguageTool","version":"5.2","buildDate":"2020-12-29 15:54","apiVersion":1,"premium":false,"premiumHint":"You might be missing errors only the Premium version can find. Contact us at support<at>languagetoolplus.com.","status":""},"warnings":{"incompleteResults":false},"language":{"name":"English (US)","code":"en-US","detectedLanguage":{"name":"French","code":"fr","confidence":0.815771}},"matches":[{"message":"This sentence does not start with an uppercase letter.","shortMessage":"","replacements":[{"value":"A"}],"offset":0,"length":1,"context":{"text":"a simple test","offset":0,"length":1},"sentence":"a simple test","type":{"typeName":"Other"},"rule":{"id":"UPPERCASE_SENTENCE_START","description":"Checks that a sentence starts with an uppercase letter","issueType":"typographical","category":{"id":"CASING","name":"Capitalization"}},"ignoreForIncompleteSentence":true,"contextForSureMatch":-1}]}

The next step is to install the LanguageTool browser plugin-in. Configure it to use your local server by going to the add-on’s options, opening “Experimental settings,” and selecting “Local server”. Text inputs in your browser should now display the LanguageTool checkmark (or error indicator) in the bottom right corner.

Now to run the service automatically. You can skip this part if you want to control when the service runs manually. Stop the manual server with ⌃C. Then, create a launchd job by saving the following as ~/Library/LaunchAgents/languagetool.plist:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>languagetool.job</string> <key>Program</key> <string>/usr/local/LanguageTool/languagetool.sh</string> <key>RunAtLoad</key> <true/> <key>Sockets</key> <dict> <key>Listeners</key> <dict> <key>Bonjour</key> <array> <string>8001</string> </array> <key>SockNodeName</key> <string>0.0.0.0</string> <key>SockServiceName</key> <string>8001</string> </dict> </dict> <key>StandardErrorPath</key> <string>/Library/Logs/languagetool.log</string> <key>StandardOutPath</key> <string>/Library/Logs/languagetool.log</string> </dict> </plist>

Now load the new launchd job:

$ launchctl load -w ~/Library/LaunchAgents/languagetool.plist

I can highly recommend the LaunchControl app, which makes controlling and debugging launchd jobs painless.

The LanguageTool should now run automatically in the background!

Grammar checking algorithms

LanguageTool uses a series of hundreds of prescriptive rules for each language. For example, there are rules for ‘comma between independent clauses‘ or ‘its vs. it’s‘. These are based on a prescriptive syntax model, so they can be easily tricked, but they seem fairly good in practice for well-maintained languages like English and German.

There is also a more advanced algorithm, which uses n-gram frequencies to detect common word substitutions (e.g. ‘Don’t forget to put on the breaks‘). The default offline checker omits the n-gram checker because it requires a large corpus of data. It can also be installed locally, but at the cost of an 8G download and some significant processing for each sentence checked.

LanguageTool does not use any deep learning algorithms. There are some mentions of neural networks on the developer channels, so it’s possible that it may adapt more recent machine learning advances in the future. The open source nature of the software means there is always the possibility to add additional features on top of LanguageTool.

For me, even the rule-based version of LanguageTool felt like a big improvement from a basic spell checker. It’s especially useful to me when writing in German, where I’m more likely to make the simple kinds of grammatical mistakes that LanguageTool’s prescriptive models can detect. And I value the privacy that comes from running the tool locally rather than relying on a cloud service.

Share

Leave a Reply

Your email address will not be published. Required fields are marked *