Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service/dynamodb scan() not providing consistent pagination limits #2957

Open
2 of 3 tasks
ren123t opened this issue Jan 15, 2025 · 3 comments
Open
2 of 3 tasks

service/dynamodb scan() not providing consistent pagination limits #2957

ren123t opened this issue Jan 15, 2025 · 3 comments
Assignees
Labels
bug This issue is a bug. p3 This is a minor priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Comments

@ren123t
Copy link

ren123t commented Jan 15, 2025

Acknowledgements

Describe the bug

currently having an issue with the service/dynamodb code. it seems like the scan feature when provided a filter clause paginates first and returns output values after. not sure if this is intended but it interferes when trying to define pagination for requests that expect a consistent number of entries related to a requested page.

an example is a request like url.com/resource?page=3&limit=10&status=fail, if I provide a scan limit of 10 and status fail it will still count the non fail status entries in the limit to be paginated, so as I index each page I can get page results of: page:1 size 3, page:2 size 7, page:3 size:10, page:4 size:1... etc.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

When using scaninput.limit = any number (10) and a filter condition (X = Y) the dynamodb.scan() function would return pages of 10 filtered entries per limit each scan using lastevaluatedkey as the start index for the scanoutput

Current Behavior

dynamodb.scan() currently returns any next 10 entries from dynamodb and then filters, returning a variable x amount of entries in scanoutput

Reproduction Steps

scan wraper function being used:

func (db *DynamoDB) Scan(filterVals []SearchStruct, paginationLimit int32, page int) ([]map[string]interface{}, error) {
	mapReturn := []map[string]interface{}{}
	filterCond := expression.ConditionBuilder{}
	i := 0
	for _, v := range filterVals {
		i++
		if i == 1 {
			err := buildFilterExpression(v, &filterCond, true)
			if err != nil {
				return nil, err
			}
		} else {
			err := buildFilterExpression(v, &filterCond, false)
			if err != nil {
				return nil, err
			}
		}
	}

	run := true
	//generate unique start key to avoid chance of hitting an actual key
	uniqueStartKey := uuid.New()
	lastEvaluatedKey := map[string]types.AttributeValue{uniqueStartKey: &types.AttributeValueMemberNULL{}}
	currentPage := 1
	for run {
		//see how this performs without index
		scanInput := dynamodb.ScanInput{
			TableName: &db.TableName,
		}

		if paginationLimit != 0 {
			scanInput.Limit = &paginationLimit
		} else if db.DefaultPagination != 0 {
			scanInput.Limit = &db.DefaultPagination
		}

		//allow for full search if no filter values are passed
		if len(filterVals) != 0 {
			filterEx, err := expression.NewBuilder().WithFilter(filterCond).Build()
			if err != nil {
				return nil, err
			}
			scanInput.FilterExpression = filterEx.Filter()
			scanInput.ExpressionAttributeNames = filterEx.Names()
			scanInput.ExpressionAttributeValues = filterEx.Values()
		}
                //first entry flag
		if lastEvaluatedKey[uniqueStartKey] == nil {
			scanInput.ExclusiveStartKey = lastEvaluatedKey
		}

		ret, err := db.Client.Scan(context.Background(), &scanInput)
		if err != nil {
			return nil, err
		}

		//if page isnt defined, we assume they want everything by pagination
		if page == 0 || currentPage == page {
			for _, v := range ret.Items {
				holdMap := map[string]interface{}{}
				err = attributevalue.UnmarshalMap(v, &holdMap)
				if err != nil {
					return nil, err
				}
				mapReturn = append(mapReturn, holdMap)
			}
		}

		//stop looping when we hit the page we want if page != 0
		if currentPage > page && page != 0 {
			run = false
			continue
		}

		if ret.LastEvaluatedKey != nil {
			lastEvaluatedKey = ret.LastEvaluatedKey
			currentPage++
		} else {
			run = false
		}
	}
	return mapReturn, nil
}

Possible Solution

if there is a way to either filter before using limit or if there can be a middle function/secondary field that defines non-dynamodb pagination if dynamodb allows it.

Additional Information/Context

No response

AWS Go SDK V2 Module Versions Used

github.com/aws/aws-sdk-go-v2 v1.32.8
github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue v1.15.26
github.com/aws/aws-sdk-go-v2/feature/dynamodb/expression v1.7.61
github.com/aws/aws-sdk-go-v2/service/dynamodb v1.39.3

Compiler and Version used

go version go1.23.4 windows/amd64

Operating System and version

windows 10

@ren123t ren123t added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 15, 2025
@adev-code adev-code self-assigned this Jan 16, 2025
@adev-code adev-code added p3 This is a minor priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Jan 16, 2025
@adev-code
Copy link

Hi @ren123t, thanks for reaching out. I see that you are using a custom code for pagination. Would there be a reason why you are not using a built SDK pagination (https://docs.aws.amazon.com/code-library/latest/ug/go_2_dynamodb_code_examples.html) ?
A sample code below that uses the built in pagination from the documentation shared "NewScanPaginator" under Scan:

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/aws/aws-sdk-go-v2/aws"
    "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue"
    "github.com/aws/aws-sdk-go-v2/feature/dynamodb/expression"
    "github.com/aws/aws-sdk-go-v2/service/dynamodb"
)

func main() {
    cfg, err := config.LoadDefaultConfig(context.TODO())
    if err != nil {
        log.Printf("error: %v", err)
        return
    }

    client := dynamodb.NewFromConfig(cfg)

    filter := expression.Name("Status").Equal(expression.Value("Active"))
    expr, err := expression.NewBuilder().WithFilter(filter).Build()
    if err != nil {
        log.Fatalf("Failed to build filter expression: %v", err)
    }

    paginator := dynamodb.NewScanPaginator(client, &dynamodb.ScanInput{
        TableName:                 aws.String("scanTableRepro"),
        FilterExpression:          expr.Filter(),
        ExpressionAttributeNames:  expr.Names(),
        ExpressionAttributeValues: expr.Values(),
    })

    fmt.Println("Scanning the DynamoDB table...")
    for paginator.HasMorePages() {
        output, err := paginator.NextPage(context.TODO())
        if err != nil {
            log.Fatalf("Error scanning table: %v", err)
        }

        for _, item := range output.Items {
            var unmarshalled map[string]interface{}
            err = attributevalue.UnmarshalMap(item, &unmarshalled)
            if err != nil {
                log.Fatalf("Failed to unmarshal item: %v", err)
            }
            fmt.Printf("Item: %+v\n", unmarshalled)
        }
    }

    fmt.Println("Scan completed")
}

Please let me know if you have any questions. Thanks.

@ren123t
Copy link
Author

ren123t commented Jan 17, 2025

I originally wrote it when pagination wasn't needed, but now there's more granular filtering requirements added to it now. I didn't realize there was already a built in paginator in the sdk. i've done a quick update and it seems like the same behavior is still present.

scan function:

func (db *DynamoDB) Scan(filterVals []SearchStruct, paginationLimit int32, page int) ([]map[string]interface{}, error) {
	mapReturn := []map[string]interface{}{}
	filterCond := expression.ConditionBuilder{}
	i := 0
	for _, v := range filterVals {
		i++
		if i == 1 {
			err := buildFilterExpression(v, &filterCond, true)
			if err != nil {
				return nil, err
			}
		} else {
			err := buildFilterExpression(v, &filterCond, false)
			if err != nil {
				return nil, err
			}
		}
	}

	scanInput := dynamodb.ScanInput{
		TableName: &db.TableName,
	}

	if paginationLimit != 0 {
		scanInput.Limit = &paginationLimit
	} else if db.DefaultPagination != 0 {
		scanInput.Limit = &db.DefaultPagination
	}

	//allow for full search if no filter values are passed
	if len(filterVals) != 0 {
		filterEx, err := expression.NewBuilder().WithFilter(filterCond).Build()
		if err != nil {
			return nil, err
		}
		scanInput.FilterExpression = filterEx.Filter()
		scanInput.ExpressionAttributeNames = filterEx.Names()
		scanInput.ExpressionAttributeValues = filterEx.Values()
	}
	scanPaginator := dynamodb.NewScanPaginator(db.Client, &scanInput)
	currentPage := 1
	for scanPaginator.HasMorePages() {
		response, err := scanPaginator.NextPage(context.Background())
		if err != nil {
			fmt.Println(err)
			break
		}
		//if page isnt defined, we assume they want everything by pagination
		if page == 0 || currentPage == page {

			var responseMaps []map[string]interface{}
			fmt.Println(response.Items)
			fmt.Println(len(response.Items))
			err = attributevalue.UnmarshalListOfMaps(response.Items, &responseMaps)
			if err != nil {
				log.Printf("Couldn't unmarshal query response. Here's why: %v\n", err)
				break
			} else {
				mapReturn = append(mapReturn, responseMaps...)
			}
		}
		//stop looping when we hit the page we want if page != 0
		if currentPage > page && page != 0 {
			break
		}
		currentPage++
	}

	return mapReturn, nil
}`

function call:

`_, err := dynamoDB.Scan([]SearchStruct{{FieldName: "status", Operator: EQUALS, FieldValue: "success"}}, 3, 0)`

print output:
page 1:
[map[scrubed]]
1
page 2:
[map[scrubed] map[scrubed]]
2
page 3:
[]
0
page 4:
[map[scrubed] map[scrubed]]
2
page 5:
[]
0

this is in a test data set of about 13 items, Thanks!

@adev-code
Copy link

Hi @ren123t, thanks for the response. I have replicated your code on my side and yes DynamoDB scans and counts the unfiltered one within a page but does not output it. So even unfiltered items are counted and therefore one page would not be able to hold the filtered data and some will have to be moved to the next page. This is expected result as per the Dynamo DB service : https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.FilterExpression

- A filter expression is applied after a Scan finishes but before the results are returned. Therefore, a Scan consumes the same amount of read capacity, regardless of whether a filter expression is present.
- Now suppose that you add a filter expression to the Scan. In this case, DynamoDB applies the filter expression to the six items that were returned, discarding those that do not match. The final Scan result contains six items or fewer, depending on the number of items that were filtered.

Please let me know if you have any other questions. Thanks.

@adev-code adev-code added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p3 This is a minor priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Projects
None yet
Development

No branches or pull requests

2 participants