• Breaking the Google Addiction one step at a time

    Google isn't your friend. Google isn't my friend. Google is, and always has been, a data-whore.

    But, still we use them and allow them to slurp up more and more data about us.

    They're a bit like Amazon in that respect - you know they're an increasingly terrible company, but they're just so convenient and you keep on using them whilst ignoring the power they're amassing over the market.

    But, it is something that's been concerning me more and more over the years.

    We install adblockers, no-script and other extensions to add a fig-leaf to our privacy, or to try and avoid Google's user-hostile changes, yet we keep on using the same services. Even when they completely change the UI around on us, for no good reason, we still keep using their services.

    I decided, quite a while ago, it was time I made a change, but then did very little, at least until recently.

    As great as a "clean-break" might sound, going cold turkey off Google's services is never going to work - no model of user behaviour supports making massive jarring changes.

    So I decided to start with the most obvious interaction with Google - their search engine. I don't have Google Home or similar, so my most frequent interaction with Google is search.

  • The Pitfalls of Building an Elasticsearch backed Search Engine

    There are a ton of articles on the internet describing how to go about building a self-hosted fulltext search engine using ElasticSearch.

    Most of the tutorials I read describe a fairly simple process, install some software, write a little bit of code to insert and extract data.

    The underlying principle really is:

    1. Install and set up ElasticSearch
    2. Create a spider/crawler or otherwise insert your content into Elasticsearch
    3. Create a simple web interface to submit searches to Elasticsearch
    4. ???
    5. Profit

    At the end of it you get a working search engine. The problem is, that search engine is crap.

    It's not that it can't be saved (it definitely can), so much as that most tutorials seem not to lend any thought to improving the quality of search results - it returns some results and that's good enough.

    Over the years, I've built up a lot of internal notes, JIRA tickets etc, so for years I ran a self-hosted internal search engine based upon Sphider. It's code quality is somewhat questionable, and it's not been updated in years, but it sat there and it worked.

    The time came to replace it, and experiments with off-the-shelf things like yaCy didn't go as well as hoped, so I hit the point where I considered self-implementing. Enter ElasticSearch, and enter the aforementioned Internet tutorials.

    The intention of this post isn't to detail the process I followed, but really to document some of the issues I hit that don't seem (to me) to be too well served by the main body of existing tutorials on the net.

    The title of each section is a clicky link back to itself.