DataQueryBuilder query using prepareNextPage returns a record twice

Hello,

app id: 8149FCCE-07DA-F988-FFBB-6154872C9500
I have a generic utility query used in a service that started returning an odd result. The query is used dozens of times per day but only started the odd behavior this morning - if not, I don’t see how it could have ever worked. The consumer of this data blows up if its assumption of unique records is not correct. In this case, the call is returning one record twice.

The query returns all records for a specified table. I’ve added a few console.logs to debug, but nothing else has changed:

    async getAllRecords (table, cols, rels, depth, sort, where, groupby) {
        let lastSize = 0
        let coll = []
        let query = Backendless.DataQueryBuilder.create()

        query.setPageSize(100)

        if (depth) query.setRelationsDepth(depth)
        if (sort) query.setSortBy(sort)
        if (cols && cols.length) query.addProperties(cols.join(","))
        if (rels && rels.length) query.setRelated(rels)
        if (where) query.setWhereClause(where)
        if (groupby) query.setGroupBy(groupby)

        do {
            const items = await Backendless.Data.of(table).find(query)
            lastSize = items.length
            coll = coll.concat(items)

            if (lastSize == 100) {
                query.prepareNextPage()
            }
        } while (lastSize >= 100)
        return coll
    }

Here is what the query object looks like before the first find for a table called people:

13:34:03.691 DataQueryBuilder {
       offset: 0,
       pageSize: 100,
       sortBy: [ 'last asc' ],
       groupBy: [ 'objectId' ],
       properties: [
         'Count(compliance) as compcount,mc,name,status,city,county,availableDate,disabled, 
email,emgcyname,emgcyphon,first,formatted,last,location,media,notes,phone,qbid,review,state,street,type,unit,verified,zip,created,special'
       ],
       excludeProps: null,
       whereClause: 'status = 0',
       havingClause: null,
       relations: [ 'vehicles.type', 'company' ],
       relationsDepth: null,
       relationsPageSize: null,
       distinct: false
}

In the console, and in the results of this call, this returns 471 records. The problem is that the 100th record from the first iteration of the loop is repeated as the first record of the 2nd iteration. Here is the result of the query and some log output showing the state of the query and the objectIds of the result for the 5 passes required to return 471 records:

13:34:06.146 -----
13:34:06.146 query: offset: 0 size: 100
13:34:06.146 pass 1 count returned: 100
13:34:06.146 First rec: EA14F386-9421-4174-8A4F-F6A1943E40B8
13:34:06.147 Last rec (99): 35850602-16EF-4EBC-8165-B1496A6089CC
13:34:06.147 -----
13:34:07.080 -----
13:34:07.080 query: offset: 100 size: 100
13:34:07.080 pass 2 count returned: 100
13:34:07.080 First rec: 35850602-16EF-4EBC-8165-B1496A6089CC
13:34:07.080 Last rec (99): 6C20592A-4C1C-4897-90DD-265902AA5FE7
13:34:07.081 -----
13:34:08.042 -----
13:34:08.042 query: offset: 200 size: 100
13:34:08.042 pass 3 count returned: 100
13:34:08.042 First rec: 3E9D841A-10DE-4E62-A1FD-7E4E0FAF8CE5
13:34:08.042 Last rec (99): 310054D4-5DC3-4212-9524-92AB0AAE5A85
13:34:08.042 -----
13:34:08.979 -----
13:34:08.980 query: offset: 300 size: 100
13:34:08.980 pass 4 count returned: 100
13:34:08.980 First rec: 77651C2A-A128-4AAF-898B-90003CEB7654
13:34:08.980 Last rec (99): E21D6859-B9F2-4469-A615-0123DA41998C
13:34:08.980 -----
13:34:09.797 -----
13:34:09.797 query: offset: 400 size: 100
13:34:09.797 pass 5 count returned: 71
13:34:09.798 First rec: 59A21D34-B4FB-43E7-852F-363918711D12
13:34:09.798 Last rec (70): 45BDFD2D-4E94-4EA4-A567-E38B98A78F53
13:34:09.798 -----

Please note that the record with 35850602-16EF-4EBC-8165-B1496A6089CC is the last record of the first pass, and the first record of the 2nd pass. The other 4 loops don’t repeat. The last record of the last pass (45BDFD2D-4E94-4EA4-A567-E38B98A78F53) correctly corresponds to the last record expected given the current sort. Given that the counts returned are correct, it seems that there must be a missing record along the way since one is repeated.

Just for giggles, I tried the same query but set the offset to 1 before the first pass. No records repeated. Could there be something wrong with record 35850602-16EF-4EBC-8165-B1496A6089CC of my people table?

Thank you,

Kelly

Hi @Kelly_Oglesby ,

We have not made any releases in recent days. Last release was on 12th of October.
Also checked in the test app and was unable to reproduce your problem.

Most probably that right before the appearance of this error your logic adds a new record which according to the your query placed at the one of iterated pages. Could you please check if there is a chance for such situation with your code?

Regards, Andriy

Andriy,

This table hasn’t had a new record since Oct 10. I just ran the query again, and for me it’s 100% reproducible. The same record is returned in two queries using different offsets with no new records added. It seems that should be impossible.

I tried adding a bogus record that collates to the top of the sort. The query returns, correctly, 472 records, with no repeats. I filtered the returned set to make sure 35850602-16EF-4EBC-8165-B1496A6089CC only appears once. When I remove the bogus record, the problem comes back.


09:20:19.792 query: offset: 0 size: 100
09:20:19.792 pass 1 count returned: 100
09:20:19.792 rec (0): EA14F386-9421-4174-8A4F-F6A1943E40B8
09:20:19.792 rec (99): 35850602-16EF-4EBC-8165-B1496A6089CC <--------------- 100th record
09:20:19.793 -----
prepareNextPage called here
09:20:20.673 -----
09:20:20.673 query: offset: 100 size: 100
09:20:20.673 pass 2 count returned: 100
09:20:20.673 rec (0): 35850602-16EF-4EBC-8165-B1496A6089CC <------------- first record
09:20:20.674 rec (99): 6C20592A-4C1C-4897-90DD-265902AA5FE7
09:20:20.674 -----

One more thing, when I use the same sort, page size, and offsets in the REST console, it returns exactly what is expected. No repeats.

I think I have figured it out - it’s a collation “feature”. The query that is failing is sorting on a field called “last” which is a non-unique string value. I noticed that the “last” value of the 100th record and the 101st record are the same value, “Espinoza”. Apparently, when you sort on only one field, you don’t necessarily get the exact same collation each time. In other words, the two records are always collating together in the sort, but not necessarily in the same order. On the first pass, the duplicate record is collating “above” the second one. On the second pass, the duplicate is collating “below” the other record. So the duplicate becomes the last record of the first pass, and the first record of the second pass.

The reason it has worked for the previous several years is that we never had two records with the same “last” value that spanned the pageSize value of 100.

The fix was to change the sort to be more specific - [“last asc”,“first asc”]. Bada bing.

Thank you!