Converting Xml To Json in Node

2018-09-15 etl data json xml javascript node

There are many big data solutions catering to data transformation. A common business problem when migrating data from one cloud host to another is to not only load data from one type to another, but also discover and alter the schema. Json is overtaking Xml as the popular HTTP payload format for REST services. There are big data ETL tools like AWS Glue that can scale XML and JSON transformation to gigabytes of data. Interchanging XML and JSON in complex ETL processes may not require big data tools like map reduce and spark. If you only have a few 100 MB of data in your pipeline at a time, you can easily us AWS Lambda serverless functions to process data from blob storage like AWS S3.

There are a plenty of poor ways to stream xml into json. For example, many NPM libraries will load XML into a DOM structure to be queried like HTML. The problem with this approach is, for large XMLs, it loads the entire object into the heap. A better alternative would be to a sax-like event stream to parse xml directly into javascript objects.

I’ve implemented a fast, memory efficient event based xml stream parser that can generate json equivalent in pure node. Take a look at the code here.