- Details
-
Published: Wednesday, 05 February 2020 09:27
-
Written by Ben Tasker
There are a ton of articles on the internet describing how to go about building a self-hosted fulltext search engine using ElasticSearch.
Most of the tutorials I read describe a fairly simple process, install some software, write a little bit of code to insert and extract data.
The underlying principle really is:
- Install and set up ElasticSearch
- Create a spider/crawler or otherwise insert your content into Elasticsearch
- Create a simple web interface to submit searches to Elasticsearch
- ???
- Profit
At the end of it you get a working search engine. The problem is, that search engine is crap.
It's not that it can't be saved (it definitely can), so much as that most tutorials seem not to lend any thought to improving the quality of search results - it returns some results and that's good enough.
Over the years, I've built up a lot of internal notes, JIRA tickets etc, so for years I ran a self-hosted internal search engine based upon Sphider. It's code quality is somewhat questionable, and it's not been updated in years, but it sat there and it worked.
The time came to replace it, and experiments with off-the-shelf things like yaCy didn't go as well as hoped, so I hit the point where I considered self-implementing. Enter ElasticSearch, and enter the aforementioned Internet tutorials.
The intention of this post isn't to detail the process I followed, but really to document some of the issues I hit that don't seem (to me) to be too well served by the main body of existing tutorials on the net.
The title of each section is a clicky link back to itself.
Read more ...
- Details
-
Published: Thursday, 20 February 2020 15:39
-
Written by Ben Tasker
Recently I've been playing around with the generation of random numbers.
Although it's not quite ready yet, once of the things I've built is a source of (hopefully) random data. The writeup on that will come later.
But, as an interesting distraction (and in some ways, the natural extension) is to then create a Psuedo Random Number Generator (PRNG) seeded by data from that random source.
I wanted it to be (in principle) Cryptographically Secure (i.e. so we're creating a CSPRNG). In practice it isn't really (we'll explore why later in this post). I also wanted to implement what Bernstein calls "Fast Key Erasure" along with some techniques discussed by Amazon in relation to their S2N implementation.
In this post I'll be detailing how my RNG works, as well as at looking at what each of those techniques do to the numbers being generated.
I'm not a cryptographer, so I'm going to try and keep this relatively light-touch, if only to try and avoid highlighting my own ignorance too much. Although this post (as a whole) has turned out to be quite long, hopefully the individual sections are relatively easy to follow
Read more ...