How to Improve REST API Performance and Deal With the Data Overload
Not so long ago I've started to pave my discovery path to Graphql technology. A brand new idea, a brand new approach to exchange data between clients and servers... It's great to know that almost every day a new technology is born. I've completed several tutorials and noticed that their author(s) always mentioned one bright example of how GraphQL takes advantages over REST API. Here it is: if we're using REST approach, our client gets loaded with excess data which causes troubles with app’s performance rates. I've been thinking a lot about how we can solve this issue. All in all, you can't always change the technology so rapidly without having true reasons to do it. There are old projects that will require colossal resources and expenses to move them to Graph QL. Besides, it's not always appropriate to do so.
I have been writing Express JS + Mongo REST API for two years already, so I have decided to deal with this problem and share my ideas in this article. All examples and solutions are based around Express JS + Mongo + JS (for client) stack. However, I think the basic idea is suitable for any technology.
Issue 1. We get more than we've requested
Assume that the system has the user
item. This user has an extensive profile with a number of fields (a detailed description of the user as a personality). Let's say, name, last name, address, phone number, and some other additional information. If the system integrates with 3rd party services you will also see additional fields such as services
.
For the client, you'll need a simple form autocompletion. You need to find a specific user by their name and pass their id on the system. To do this, we'll need to write a route such as: api/v1/users?Search='Richard'
. This route is going to return us all the users with the name Richard. But along with username it has returned us the entire User
object.
{ _id: 1, profile: { firstName, lastName, address, .... phone, }, servicess: { ... } }
firstName
and lastName
fields will be enough to display our autocompletion. The more data, the longer the loading process. Besides, if you're dealing with mobile APIs which have slow Internet transmission signal, this can be a problem.
Solution
The first thing that can come to one's mind is to write a new route api/v1/users/for-autocomplete
. But the more routes, the larger your API is. Thus it turns out that this is not that good technique. In fact, the solution is really trivial. Since we've got query param search
, we can add fields
search type as well. This search type will be responsible for fields selection you need, where 1 points to the field you need, and -1 points to the field you don’t need. Now API request will look as follows: api/v1/users?serch=Richard&fields.profile.firstName=1&fields.profile.firstName=1
Such request query looks a bit odd at first sight and you might assume that writing it can cause troubles. Besides, why do I need it at all? The truth is, that such method is quite comfortable and really not a problematic one. One query
or withQuery
helper should be enough to generate your custom parameters string.
const toString = v => (_.isDate(v) && v.toISOString()) || v; const isObject = (v) => { if (_.isDate(v)) { return false; } return _.isObject(v); };
const query = (data, fields = {}) => {
const toQueryParam = (obj, prevName = '') =>
_.keys(obj).map(key =>
(isObject(obj[key]) && toQueryParam(obj[key], ${prevName}${key}.
)) ||${prevName}${key}=${toString(obj[key]) || ''}
);
const objAsParams = _.flatten(toQueryParam(data)).join('&');
const fieldsAsParams = .flatten(toQueryParam(fields)).join('&');
const separator = !.isEmpty(fields) && '&' || '';
return ?${objAsParams}${separator}${fieldsAsParams}
;
};
We have different types of checkouts used here, since your release type format can be different from ISO. In general, it's better to send web data in ISO format.
Now, let’s test our magic method...:
console.log(query({ limit: 10, from: new Date('12/12/2012') })); console.log(query({ limit: 10 }, { profile: { firstName: 1 } })); console.log(query({ limit: 10 }, { 'profile.firstName': 1 }));
...And get the desirable result:
?limit=10&from=2012-12-11T22:00:00.000Z - - - - - - - - - - - - - - - - - - ?limit=10&profile.firstName=1 - - - - - - - - - - - - - - - - - - ?limit=10&profile.firstName=1
Now, when requesting an item, you can create requests that return the data you need in the most comfortable way.
get(`api/v1/users${query({ search: 'Richard'}, { profile: { firstName: 1, lastName: 1 } })}`)
or
get(`api/v1/users${query({ search: 'Richard'}, 'profile.firstName profile.lastName')}`)
Let's talk about our API. The main idea is to parse our field
object with query params and filter these fields as API user pointed out/defined. If these fields are not defined, return the entire object then. You can find a small Express.js-based implementation of such idea below.
First of all, we need to choose our fields
field.
api.get('/', (req, res, next) => { const { fields } = req.query; // <-magick here })
In fact, there’s no magic at all. We'll just receive such object:
{ 'search': 'Richard', 'fields.profile.firsttName': '1', 'fields.profile.LastName': '-1', }
Consequently, we need to add another helper which will extract our fields
object from query
and separate it from other parameters. This method will come to hand in all other requests. That's why we recommend creating a middleware
:
const queryToObject = query => { const obj = {}; _.keys(query).forEach(key => _.set(obj, key, query[key])); return obj; };
const withFields = (req, res, next) => {
const query = queryToObject(req.query); // <- parse query to { search, profile: {...} }
const fields = DataObjectParser.untranspose(query.fields); // back fields to { 'profile.firstName': '-1' }
_.keys(fields || {}).forEach(key =>
_.extend(fields, { [key]: parseInt(fields[key]) })); // parse to Int
_.extend(req, { fields });
next();
};
We have to go through the scope of all the actions mentioned above since we're dealing with MongoDB here.
We can use our withFields
for each route and be sure that we’ve got access to req.fields
. Next step - is data selection and filtering. Thanks to Mongo's Projections approach which we're going to use here, we can mark this task as done too.
api.get('/', withFields, async (req, res, next) => { const users = await User.find({ }, req.fields); res.status(200).send({ users }); }); api.get('/:_id', withFields, ...);
Now that we have MongoDB in fields
field, our route will return filtered fields. In the outcome we’ve got a universal practical way of filtering the fields we need right from the client. Such technique suits all GET queries for all of your API entities.
Issue 2. Why so many calls?
Let's assume that users are connected to a certain organization in our system. We should display this organization for each user in the list. But here's one issue, our entities are divided. We only can save the user in a separate collection (the table view depends on DB type you're using):
So what should we do in this case? First thing that сomes into mind is to fetch the list of users and then, with an organizationId
in the pocket, get the organization.
GET -> api/v1/users GET -> api/v1/organization/_id
The problem here is that to display one user in the list, we need the additional request. If we have several relations of such kind, we'll end up with multiple requests. As we know, with every new server request comes more delay time.
Solution
What if we return all the data in a single request? The main idea here is that you have to return any existent dependency in the object apart from foreignKey
. Since most of the databases allow us to do this, we can make your API do the same thing.
According to this method, your route users
should look like this:
api.get('/', withFields, async (req, res, next) => { const users = await User.find({ }, req.fields); for(let i = 0;i<user.length; i++) { const organization = await Organization.find({ _id: user.organizationId }); _.extend(user[index]. { organization }) } res.status(200).send({ users }); });
However, writing the current code snippet for each of your routes is not the best way out. Maybe Mongo can help?
There's a great $lookup (aggregation) mechanism that allows you to make selections from other collections. Mongoose js went even further and provided us the populate mechanism. With its help, all this code is converted into one line (if you have described the schemes correctly, of course).
api.get('/', withFields, async (req, res, next) => { const users = await User.find({ }, req.fields) .populate({ path: 'createdById', select: _.keys(req.fields).join(' '), }) res.status(200).send({ users }); });
You can also filter the fields of the organization if you don't need them as in the principle described above.
Also, remember that returning the entire collection to the client is not the best solution. Don’t forget to put restrictions with .skip().limit() help.
A small tip left. When moongose calls populate, the process will redefine your organizationId field to organization object. Such small alteration can confuse the users of your API. They’ll receive the following object:
{ _id: 1, profile: {...} organizationId: {_id, title, ...} }
Good news, everyone. We can redefine toObject method. We call it every time the object is sent to the client. Therefore, we can modify it a bit before we write this small piece of code here:
schema.set ('toObject', { versionKey: false transform: (doc, ret) => { if (_.get (ret, 'organizationId._id')) { ret.organization = ret.organizationId; ret.organizationId = ret.organization._id; } }, });
Thus, the client will receive a more obvious/clear object type:
{ _id: 1, profile: {...} organizationId: 1 organization: {_id: 1, title, ...} }
In addition, we can create a way (similar to what we’ve done with fields field) to return objects with dependencies or in their pure form.
Conclusion
In this article, I've provided some examples of how to improve your API performance without wasting much time during the development process. Implementation of these ideas will not take you long. The use of these approaches will reduce the load of data transfer between the client and the server.