No internet connection

Feature request: Search API

By Christian Scheuer @chrscheuer2020-04-06 10:40:38.800Z

I don't remember if we've discussed this elsewhere.
As I do remember, we talked about it quite a bit, but it's a very long time ago.

We're doing various integrations now, where we try to make it easier for users to find the forum and to use it more.
A great way to get them to do that, is if we can integrate forum search into our app and website.

The API should ideally be accessible publicly without authentication, but we can also live with it if it has to go through our server first (at least for a start - even though it will make things a bit slower for the user).

We could live with the API just returning the top 5 or top 10 results as a starting point.
It would be ideal if it returns:

  • Title
  • Short summary (possibly with emphasized keywords)
  • Link
  • 35 replies

There are 35 replies. Estimated reading time: 26 minutes

  1. KajMagnus @KajMagnus2020-04-20 03:43:29.932Z2020-04-20 03:55:10.522Z

    Good idea, what about this public API in the upcoming version:

    GET  http:// ty server /-/v0/search?q=UX+improvements
    

    and the response: (note: matching phrases are marked with the HTML <mark> tag, in htmlWithMarks: ... below)

    {
      "searchResults" : [ {
        "pageTitle" : "support-chat",
        "pageUrl" : "http://site-3.localhost/-31",
        "postHits" : [ {
          "isPageTitle" : false,
          "isPageBody" : false,
          "htmlWithMarks" : [ "Probably such an iframe could be a bit better looking and <mark>UX</mark> friendly (maybe clickable author names)" ]
        } ]
      }, {
        "pageTitle" : "Potential UX improvements",
        "pageUrl" : "http://site-3.localhost/-334",
        "postHits" : [ {
          "isPageTitle" : false,
          "isPageBody" : true,
          "htmlWithMarks" : [ "All of this is about trying to <mark>improve</mark> the forum so it doesn&#x27;t require so much interaction from us on", "Right now it isn&#x27;t an <mark>improvement</mark> with the draft UI interleaved.", "Generally, <mark>UX</mark> changes that are only half-done means we have to spend time with reporting feedback when", "helpful if there could be a more well-tested&#x2F;documented approach for TY to introduce changes to the <mark>UX</mark>" ]
        } ]
      ...
    

    The response, as Typescript interfaces:

    interface SearchResultsApiResponse {
      searchResults: PageAndHits[];
    }
    
    interface PageAndHits {
      pageTitle: string;
      pageUrl: string;
      postHits: PostHit[];
    }
    
    interface PostHit {
      isPageTitle?: boolean;
      isPageBody?: boolean;
      htmlWithMarks: string[];
    }
    

    isPageTitle can be good to know, because maybe you don't want to both show the title, and include a highlighted matching phrase from the title (because the the title text gets inclued twice).

    (What's a good thing to call the Original Post? Above, it's "Page body": isPageBody?: boolean. But maybe people confuse that with the <body> html tag? Maybe isOrigPost would be better? But what if it's not a forum post, but an article? What about isArticleText? But what if it's not an article, but a forum post? Hmm)

    If !isPageTitle && !isPageBody, then the post is a reply (to the orig post, or to someone else).

    Maybe some time later, there could be an isAcceptedSolution field too?

    1. CChristian Scheuer @chrscheuer2020-04-20 18:08:49.565Z

      Yay - looks great!

      I think we'd like to have the category path and the last modified date. By category path I mean for example Packages -> Soundminer (since we have subcategories). These paths should have some kind of ID with them as well.
      Would it make sense to have the username of the posts and/or pages that were hit? At least the author of the page I think would be good to have so we can show them with a little image.

      Wrt using GET and querystring, I'm thinking this would be the start of the API, but it would likely be something that we'd want to augment in the future.
      For example to add:

      • Search only in certain categories
      • Search only in certain tags
      • Potentially paging

      For these reasons, I feel like a POST with json could potentially be more flexible. I seriously hate URL serialization/deserialization haha, everybody always gets it wrong.

      We also need to think about if it returns only public material (I think it should by default)

      1. KajMagnus @KajMagnus2020-04-25 06:45:34.105Z

        category path and the last modified date

        Yes (and that'd be nice to include on Talkyard's own search results page too).

        the username of the posts and/or pages that were hit? [...] author of the page

        Yes

        so we can show them with a little image

        The person's avatar image?

        POST with json could potentially be more flexible

        I think so too — internally, Talkyard has both a GET API, so queries can be linked via a URL. And a POST API, for the reasons you mentioned. Now I changed the public API to POST. A basic version (without the things mentioned above unfortunately) will be included in the upcoming version.

        The API wants JSON that looks like: { searchQuery: { queryText: "..... " }, pretty?: bool }. If the queryText is like: " ... text text categories:category-url-slug,another-cat-slug" then only those categories will get searched.

        We can add a separate categoryRefs: ... field next to queryText later, and then you can refer to the categories via ext-id instead, so the search functionality won't break if you change their URL slugs.

        1. KajMagnus @KajMagnus2020-04-26 22:14:16.189Z

          @chrscheuer — I'm adding author names and avatar url, + category name and URL (not the complete category path yet though).

          Someone mentioned an API endpoint for listing popular pages in a category, (here)
          and I thought it'd be nice to implement both the search API, and that other list-things API,
          and see how they a bit can share code and Typescript interfaces, with author names etc included.

          1. CChristian Scheuer @chrscheuer2020-04-28 14:07:58.195Z

            Super cool. Let me know when it's up on either server so I can make some tests :)

            With regards to the tagging system, also let me know when/if you'd like to discuss it further. I think we may start implementing our own tagging system for now so we can get something up and running very quickly and then we can switch to the forum's system once it's ready.

            1. KajMagnus @KajMagnus2020-04-30 08:07:28.750Z

              tagging system, also let me know when/if you'd like to discuss it further. I think we may start implementing our own tagging system for now so we can get something up and running very quickly and then we can switch to the forum's system once it's ready

              I think the nearest weeks I won't have time to look into the tagging system. Probably I should do OpenID Connect first. — Also, maybe in a way it'd be good if you build your own tags? Then, you can tell me how to implement tags in Talkyard in a way that works for you (and you seem to have a slightly more advanced need for tags than most organizations (?), so, what works fine for you, would work fine for almost everyone I'd think).

              B.t.w. one thing: I think I'd like the unique identifier of a tag to be a numeric ID, but not the tag label. So one can rename a tag, without having to re-index all pages tagged with that tag. (In ElasticSearch, the page would be connected to that never changing tag numeric ID, no need to reindex the pages, if renaming a tag label — the ID didn't change)

              1. CChristian Scheuer @chrscheuer2020-04-30 10:40:00.465Z

                Completely agree. That's also why I just thought we could start on our own - it will be easier to show you what we want by having something that already works :)

              2. In reply tochrscheuer:
                KajMagnus @KajMagnus2020-04-30 07:09:59.122Z2020-04-30 07:19:28.020Z

                I've upgraded this server Ty .io — your server, Ty .net, in 2 days I'd think (that is, Saturday).

                Meanwhile — here's the modified Search API:

                https://github.com/debiki/talkyard/blob/40ff70deb434d16f5d833ae8005158f873671637/tests/e2e/pub-api.ts#L292

                (The changes: Search query field renamed from queryText to freetext. And the search results are in a thingsFound array, instead of searchResults, and postsHit is now postsFound. "Found" sounds more nice than "Hit" I think, some time later, when searching for people: ParticipantFound[] instead of ParticipantHit[].)

                (If you scroll up and look at type FindWhat = 'Pages' | 'Members' | ... and interface LookWhere { ..., then, Ignore the comment about ReferencedThings object — I forgot to delete that comment.)

                B.t.w. the only thing I've actually implemented this far, is:

                POST /-/v0/search  {
                  searchQuery: { freetext: "... search query ..." }
                }
                

                ( + a list query, for listing the most popular pages, in a specific category:

                /-/v0/list  {
                  listQuery: {
                    findWhat: 'Pages',
                    lookWhere: { inCategories: ['extid:the_categorys_ext_id'] },
                  }
                }
                

                )

                1. CChristian Scheuer @chrscheuer2020-04-30 10:38:42.543Z

                  This all looks brilliant - great with your ElasticSearch guides on compound queries as well!
                  Love the scrollCursor placeholder too.

                  1. CChristian Scheuer @chrscheuer2020-04-30 10:39:08.908Z

                    Does lookWhere.writtenBy accept ssoid user IDs?

                    1. KajMagnus @KajMagnus2020-05-02 18:08:00.815Z2020-05-02 18:15:34.228Z

                      accept ssoid user IDs?

                      Not yet, but yes, that's the idea: writtenBy: ['ssoid:...', 'username:...', 'username:could_be_a_group' ].

                      Sorry seems I won't upgrade the server until tomorrow

      2. Progress
        with doing this idea
      3. @KajMagnus marked this topic as Started 2020-04-20 03:49:59.044Z.
      4. C
        Christian Scheuer @chrscheuer2020-05-31 16:30:32.017Zreplies toKajMagnus:

        I'm getting a CORS error when trying to test this:

        Access to fetch at 'https://forum.soundflow.org/-/v0/search' from origin 'http://localhost:8080' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
        
        1. C
          Christian Scheuer @chrscheuer2020-05-31 16:30:56.508Zreplies tochrscheuer:

          It's okay that I can't test from localhost, but would it be possible to white-list the soundflow.org domain for CORS requests?

          1. KajMagnus @KajMagnus2020-05-31 17:32:46.328Zreplies tochrscheuer:

            What if I add a CORS domain whitelist config field in the admin area? Then the site admins can decide (i.e. you'd type soundflow.org or maybe *.soundflow.org in that field)

            1. C
              Christian Scheuer @chrscheuer2020-05-31 17:35:37.902Zreplies toKajMagnus:

              Yea that'd be great!

              We have a very tight beta deadline again this round by the way :) Releasing a large new version on June 15, which means this coming week is the cutoff for features so we have enough time to beta test.
              Do you think this and the other issue is possible to get looked at this week? If not that's just good to know, then we'll build in some workarounds for the features (search via our server etc.).

              1. KajMagnus @KajMagnus2020-06-01 07:48:26.028Zreplies tochrscheuer:

                Do you think this and the other issue is possible to get looked at this week?

                1) The other issue, yes. 2) This CORS issue: I think so but I'm not totally certain — looks more complicated to add per site CORS to the web framework, than what I thought.

                I can post a status update tomorrow (that'd be fine? I mean, not too late)

                1. C
                  Christian Scheuer @chrscheuer2020-06-01 08:09:54.093Zreplies toKajMagnus:

                  That would be great - update tomorrow is fine. We can work around the CORS issue by sending through our own servers (even though it will make it slow for users) so would be great to get the markdown issue fixed and then see how far we can get with CORS.

                  1. KajMagnus @KajMagnus2020-06-02 15:21:24.114Zreplies tochrscheuer:

                    I got the markdown issue fixed (not code reviewed yet).

                    I can add CORS headers via Nginx, I'll give this a try later today or tomorrow. (That's a better approach than using the app server for that, anyway, long term, I think.)

                    1. C
                      Christian Scheuer @chrscheuer2020-06-02 20:59:11.164Zreplies toKajMagnus:

                      Cool - thanks for the update!

                      1. KajMagnus @KajMagnus2020-06-03 17:25:17.577Zreplies tochrscheuer:

                        Hi @chrscheuer — I could repro the "Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header ... " problem and make it work by adding CORS headers (and some other changes — lots of CORS related things to read about).

                        These CORS requests are for /-/v0/search, right? Would you want people to do these search requests as "strangers", meaning, not logged in?

                        Or, if a user is logged in, would you want his/her session cookie to be included in the CORS request, so the response can include topics not publicly visible? (but visible to him/her if s/he is logged in)

                        (Or, not sure if / how this could work, but maybe in the distant future, using a Bearer token or Basic Auth password somehow. Seems tricky to distribute such secret things on a per user basis though, hmm)

                        1. C
                          Christian Scheuer @chrscheuer2020-06-03 19:27:19.208Zreplies toKajMagnus:

                          Nice!
                          Yea these are for /-/v0/search. We log users in to their SF account when displaying the help panel which this is part of. But also roundtripping the forum SSO might be overkill for the search for now.
                          So basically yea it would be okay for the search to only display public results for now, and probably for any foreseeable future.

                          I don't think the setting allowing the soundflow.org domain to post requests has anything to do with this question though, right? The CORS header is about allowing Javascript hosted on the domain soundflow.org to send HTTP POST requests to the search endpoint - not about whether or not we should send session cookies - or am I misunderstanding something?

                          1. KajMagnus @KajMagnus2020-06-04 14:33:22.309Zreplies tochrscheuer:

                            The CORS header is about allowing Javascript hosted on the domain soundflow.org to send HTTP POST requests to the search endpoint

                            Yes (and also GET, PUT, DELETE).

                            not about whether or not we should send session cookies

                            Those POST requests can optionally include one's session cookie. Then, there'd be CORS POST requests with the Soundflow user's Talkyard session cookie — then, Talkyard would look at the cookie and know who the user over at soundflow.org is and could include in the response access restricted topics s/he is allowed to see.

                            But by default CORS requests don't send cookies. I think that's a good start (i.e. to skip cookies), to see how things work out, with a bit less complexity now in the beginning, fewer things that can go wrong.

                            (Here's a bit about CORS and cookies — they call it "Requests with credentials": https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS#Requests_with_credentials )

                            it would be okay for the search to only display public results for now, and probably for any foreseeable future

                            Ok that sounds good (simpler & safer for now :- ))

                            1. C
                              Christian Scheuer @chrscheuer2020-06-04 15:29:01.017Zreplies toKajMagnus:

                              Great :) Yea makes perfect sense to take it in two separate steps. Both in terms of ease of implementation now, security aspects and the fact that we can get very far with just un-authenticated (public) search.
                              How do you think the timeline would look for the simple CORS case?

                              1. KajMagnus @KajMagnus2020-06-05 05:52:22.629Zreplies tochrscheuer:

                                think the timeline would look for the simple CORS case?

                                I've made that work (I think) — I have in mind to do code review today, then I can deploy an early version only to this server Ty. io tomorrrow, allowing CORS requests from soundflow.org, and then you can try it out?
                                Maybe you / we will find something we didn't think about.

                                1. C
                                  Christian Scheuer @chrscheuer2020-06-05 17:12:13.840Zreplies toKajMagnus:

                                  Sounds great - that would be perfect.
                                  Our release is scheduled for June 15, which means we'll be doing videos and integration tests next week. So it's important we plan for whatever we make work now to not stop working before the release... Just so we get the admin option to add the domain added before rolling back the feature (if we get it working). Hope that makes sense.

                                  1. KajMagnus @KajMagnus2020-06-06 18:51:29.126Zreplies tochrscheuer:

                                    I'm building the new server now, will upgrade early tomorrow morning it seems.

                                    whatever we make work now to not stop working before the release

                                    I think it's very unlikely that there's anything in this CORS stuff that needs to be rolled back

                                    (Actually I don't completely understand this sentence: "Just so we get the admin option to add the domain added before rolling back the feature (if we get it working" maybe there's some cut and paste weirdness?)

                                    1. C
                                      Christian Scheuer @chrscheuer2020-06-06 18:54:44.946Zreplies toKajMagnus:

                                      Awesome! Thank you :)

                                      1. KajMagnus @KajMagnus2020-06-07 04:22:44.024Z2020-06-07 04:43:38.422Zreplies tochrscheuer:

                                        Now the new server is running here on Ty .io, and it allows CORS from: http://localhost:8080 and https://soundflow.org.

                                        You can do this:

                                        1. Copy this HTML page with CORS test helper Javascript and cURL to your localhost:
                                          https://raw.githubusercontent.com/debiki/talkyard/master/tests/e2e/utils/ext-cors-site.html
                                          (it's this: https://github.com/debiki/talkyard/blob/master/tests/e2e/utils/ext-cors-site.html )

                                        2. Start a server at 8080: ./node_modules/.bin/http-server -p8080 dir/with/that/html/page/

                                        3. Go here: http://localhost:8080/ext-cors-site.html

                                        4. Open dev-tools and type:

                                        corsFetch({
                                            url: 'https://www.talkyard.io/-/v0/search',
                                            POST: { searchQuery: { freetext: 'pri' + 'sm' }},
                                            onDone: function(rsp) { logToPageAndConsole(rsp) }});
                                        

                                        You can also try the cURL examples, and change: -H "Origin: http://localhost:8080" to -H "Origin: http://the.wrong.origin" to see what'll happen

                                        1. C
                                          Christian Scheuer @chrscheuer2020-06-09 08:42:27.556Zreplies toKajMagnus:

                                          It works!! Thank you so much for this quick fix. Let me know when this is ready on forum.soundflow.org :)
                                          If there's any chance of it working tomorrow morning (we're doing a live demo with a press reporter) that would be amazing.

                                          1. KajMagnus @KajMagnus2020-06-09 09:22:59.563Zreplies tochrscheuer:

                                            I just upgraded the server — you can go here: /-/admin/settings/features

                                            and check Enable Cross-Origin Resource Sharing (CORS)

                                            and then type, on 2 separate lines:

                                            http://localhost:8080
                                            https://soundflow.org
                                            

                                            in the text box that then appears (but don't end with a slash, don't: https://soundflow.org/ )

                                            1. C
                                              Christian Scheuer @chrscheuer2020-06-09 09:30:59.187Zreplies toKajMagnus:

                                              AMAZING!! It works :)

                                              Here it is in action in our app:

                                              THANK YOU MAGNUS!!!

                                              1. KajMagnus @KajMagnus2020-06-09 09:39:59.419Zreplies tochrscheuer:

                                                Ok :- ) Looks nice in the app I think, seems user friendly

                                                1. C
                                                  Christian Scheuer @chrscheuer2020-06-09 12:07:37.737Zreplies toKajMagnus:

                                                  Haha yea... One of these days when you get a Mac we have to get you on board ;)