No internet connection
  1. Home
  2. Development

Bleve - Alternative to Elastic?

By @stellarpower
    2019-08-04 04:23:29.678Z

    Hi,

    I was reading through some of the docs and was already aware of the runtime requirements of the JVM; I saw that Toshi was mentioned as an alternative to Elasticsearch which is using much of the resources, but I see it's not regarded as production-ready yet.

    I did some googling and found something called Bleve - http://blevesearch.com/. The site seems a.bit out-of-date but there are comments and commits from the last few days and as I know nothing about Elastic was wondering if it was worth considering as an alternative to use. From what I see here
    https://news.ycombinator.com/item?id=8279081
    It seems to be aimed at replacing Lucene at the low-level, but offers a sufficient high-level API that it could feasibly be used instead of Elastic.

    Think this is of any use?

    Cheers :)

    • 6 replies
    1. Hi @stellarpower Belve looks like an interesting alternative to ElasticSearch yes, which (I suppose) would use only a fraction of the memory and CPU that ES requires. At the same time, it's a library that (it seems to me) one calls from one's Go code. Rather than a stand alone search engine server that one calls via HTTP. And what I want for Talkyard, is a separate server (application process) that handles search, rather than a library. One reason is that otherwise it'd be a bit too simple to DoS attack Talkyard, by crafting "evil" search queries that use up all CPU and memory of the main Talkyard server itself (making Ty inaccessible to "everyone").

      Toshi, however, is similar to ElasticSearch in that it runs in its own separate process, which I can place in its own Docker image with memory and CPU restrictions. So if something goes wrong, it'll afffect the search module only, but not the whole of Talkyard. (Also, it's written in Rust :- ) I like Rust)

      Another maybe good alternative could be PostgreSQL's built in search b.t.w. (which is getting better and better each year :- )). Not sure how "easy" it could be to DoS the PostgreSQL database, by typing "evil" queries. Hmm.

      1. S@stellarpower
          2019-08-05 14:40:58.321Z

          Hey Kaj,

          Absolutely understand, that's a very sensible idea. I had a look at Toshi and seemed good, was just concerned about the open admission in the readme that it's not production-quality yet. Think the lower-level library is stable enough but Toshi itself apparently isn't yet. If I could put Bleve into a container as a server and mould the API sufficiently so that it (let' s just assume hypotheticaly for a second) could be dropped in without any modification, would this be of interest at all?

          I'm interested in talkyard for my site, and just generally, it's a fantastic piece of software that does nearly everything I could want, and you've clearly designed and documented it exceptionally well for a FOSS product, but I'm on a budget and have been looking recently into serverless options where there is a generous free tier but resource usage is limited, so I'd be interested into whether the footprint can be reduced and further if e.g. Postgres (and Redis?) could be separated and hosted by a cloud database server to take advantage of the free tier. I know you offer served options but was a little confused by the pricing (but I'll keep this thread on-topic and ask later)

          If there are any other things that I could help with or that could do with being completed, please let me know and I'll try :) I love how talkyard is just packaged as a docker image and I can just run with it. Makes life much easier. Unfortunately however, I think the downside to this is that if I can't provide a setup in Bionic with the requirements, it can't be run so easily. So I'd be interested, time permitting, in looking into whether it can be a bit more modular - your design already is, but whether one could allow components to be swapped out for alternatives where the default setup is impossible.

          Cheers

          Ben

          1. KajMagnus @KajMagnus2019-08-06 14:15:43.823Z2020-03-20 04:40:32.860Z

            Ok yes I too think Toshi seems like too high a risk, currently.

            If I could put Bleve into a container as a server and mould the API sufficiently so that it (let' s just assume hypotheticaly for a second) could be dropped in without any modification, would this be of interest at all?

            Hmm. The nearest time (like, 10 weeks) I'm afraid I'm too short of time to have a look at the result. After that, ... Probably I'd want to wait until Toshi is more stable, and then compare Toshi, with this Bleve + API server you might create, with built-in PostgreSQL search, with ElasticSearch.

            Another thing you could maybe do, is to try to integrate Toshi with Talkyard, and join the Toshi project and make it production quality sooner? Maybe by contributing automatic tests, to the Toshi project? — b.t.w. what's your overall project time frame?

            (I'm getting the interest you're familiar with Golang? Not as much with Rust? What's your technical background if I may ask :- ))

            Actually, looking at the activity in Toshi: https://github.com/toshi-search/Toshi/graphs/contributors
            And Tantivy Tantivity (comparable to Bleve, written in Rust): https://github.com/tantivy-search/tantivy/graphs/contributors
            And Bleve: https://github.com/blevesearch/bleve/graphs/contributors

            ... then I'm actually thinking that Tantivy Tantivity and Toshi are more "alive" than Bleve. The Bleve original author, isn't contributing that much to Bleve any longer, whereas the creators of Tantivy + Toshi are both of them still active in their projects.

            have been looking recently into serverless options where there is a generous free tier but resource usage is limited, so I'd be interested into whether the footprint can be reduced and further if e.g. Postgres (and Redis?) could be separated and hosted by a cloud database server to take advantage of the free tier.

            I'd love to hear about your use case? For what do you have in mind to use Talkyard?
            And what is affordable, from your perspective and for your use case?
            Is it ok if I ask in which country do you live?

            footprint can be reduced and further

            I want Talkyard to run in a Raspberry Pi with 600 MB RAM :- ) not the nearest months, but ... eventually.

            I know you offer served options but was a little confused by the pricing (but I'll keep this thread on-topic and ask later)

            Sorry about that. Please ask and I'll try to clarify?
            Maybe I can change the pricing pages too so they become less confusing.
            (If you open-source self host, $10 / month for a 2 GB DigitalOcean VPS should work fine.)

            1. FPaul Masurel @fulmicoton
                2020-03-19 13:05:28.628Z

                tantivy author here. It is called "tantivy", not "tantivity". :)

                I confirm development is active.

                1. Hi Paul, I've changed to Tantivy above now. (Thanks :- ))

                  B.t.w. I think Rust is nowadays my on shared 1st place favorite language. The more I learn about it, the more I feel "wow this is how I would have wanted things to work all the time, I just didn't know about it until I found out now because of Rust".

                  If one day Talkyard starts using Tantivy (and Toshi), then seems to me that'd roughly reduce the amount of memory Talkyard needs, with sth like 30% or 50%.

                  I'm curious about how you found out about Talkyard? (I think currently not so many people know it exists)

                  1. FPaul Masurel @fulmicoton
                      2020-03-20 15:00:49.461Z

                      Hello,

                      If one day Talkyard starts using Tantivy (and Toshi), then seems to me that'd roughly reduce the amount of memory Talkyard needs, with sth like 30% or 50%.

                      I am uncomfortable on commenting on that because I do not know enough about elasticsearch. I am more familiar with Lucene.
                      Lucene itself is pretty memory efficient contrary to popular belief.

                      I'm curious about how you found out about Talkyard? (I think currently not so many people know it exists)

                      I landed on this forum post via github analytics this time... But I already knew about talkyard and visited the website a long time ago.
                      I don't remember when nor why.

            2. Progress
            3. @KajMagnus closed this topic 2019-08-05 14:19:45.392Z.