Cursors
Procs that have the potential to return large datasets may break the result down into smaller sets, or pages. Instead of returning the full dataset all at once, these procs return one page of values at a time—providing a cursor used to fetch the next page of values.
These procs return cursors when the size of the dataset exceeds the defined page size:
Client libraries handle cursors automatically by paging through the dataset as values are consumed.
Paging is controlled through the count
and total
arguments. The count
argument sets the page size, defaulting to
250 values. The total number of values fetched across all pages is set with the total
argument—if total
is
not passed, proc assumes that all available values are to be retrieved.
Enumerating Cursors
Cursors are fully supported by enum.*
procs. Used together, they provide patterns for iterating and mutating large
datasets within proc or externally from a client library. Let's explore this in context of a common use-case: fetching
the underlying value for every key in a key-value bucket.
One approach is to use keyv.scan
to fetch the keys, then retrieve each value using keyv.get
.
Setup
require "proc"
client = Proc.connect("PROCAUTH")
const Proc = require("@proc.dev/client");
const client = Proc.connect("PROCAUTH");
authorization=PROCAUTH
client.keyv.scan.call(bucket: "v7efcf3c").each do |key|
client.keyv.get.call(bucket: "v7efcf3c", key: key)
end
client.keyv.scan.call(undefined, v7efcf3c).each((key) => {
client.keyv.get.call(undefined, v7efcf3c);
});
this example cannot be expressed in curl
This approach works for small datasets, but does not scale to larger ones. Because keyv.scan
only returns the keys,
each additional key requires another request to proc to get the underlying value. We can use an enumerator instead to
fetch the keys and map them into values before returning to the client.
Setup
require "proc"
client = Proc.connect("PROCAUTH")
const Proc = require("@proc.dev/client");
const client = Proc.connect("PROCAUTH");
authorization=PROCAUTH
client.keyv.scan.call(bucket: "m6cb1cf4") do
client.enum.map do
client.keyv.get(bucket: "m6cb1cf4")
end
end
client.keyv.scan.call(undefined, m6cb1cf4, () => {
return client.enum.map(undefined, {}, () => {
return client.keyv.get(undefined, m6cb1cf4);
});
});
curl "https://proc.run/keyv/scan?bucket=m6cb1cf4" --silent \
--header "authorization: bearer $authorization" \
--header "content-type: application/vnd.proc+json" \
--header "accept: text/plain" \
--data '[["$$", "proc", ["m6cb1cf4"]]]]]]]]]'
[...]
This approach constrains the number of requests between the client and proc to the number of pages returned by
keyv.scan
. Given 10,000 keys, keyv.scan
returns 40 pages of results. The first approach makes a total of
40 + 10,000
requests while the second approach makes only 40
requests—much better!
Learn more about enumerators in the enumerator docs.
Consuming Cursors
Cursors, like other enumerables, are enumerated lazily, meaning iteration over every underlying value is not guaranteed unless the enumerator is fully consumed. Enumerators can be consumed in one of two ways:
- Iterating over the enumerator in the client.
- Resolving the enumerator within proc.
The mapping example from the section above demonstrates client-based iteration. While the client-based approach makes sense
when values are needed on the client, it would be better in many cases if deletion took place without any involvement
from the client. Do this by piping the enumerable result of keyv.scan
to enum.each
.
Setup
require "proc"
client = Proc.connect("PROCAUTH")
const Proc = require("@proc.dev/client");
const client = Proc.connect("PROCAUTH");
authorization=PROCAUTH
delete = client.keyv.scan(bucket: "p6124289") >> client.enum.each {
client.keyv.delete(bucket: "p6124289")
}
delete.call
let delete = client.keyv.scan(
undefined, p6124289
).compose(
client.enum.each(undefined, {}, () => {
return client.keyv.delete(
undefined, p6124289
);
})
);
delete.call();
curl "https://proc.run/core/exec" --silent \
--header "authorization: bearer $authorization" \
--header "content-type: application/vnd.proc+json" \
--header "accept: text/plain" \
--data '[["$$", "proc", ["p6124289"]]]]]]]]]'
[...]
See enum.resolve
for a way to resolve enumerators without returning enumerable values to the client.
Stuck? Want to chat about an idea? Join the community on Discord.