Loading large CSV file with Papaparse not working (only first chunk loaded)

Loading large CSV file with Papaparse not working (only first chunk loaded) - reactjs

I would like to load a local file (client-side) with papaparse into my React application. Unfortunately, it only loads the first chunk but never the whole file. My file contains about 500 rows and there are never more than 300 rows loaded. It seems like the complete functions is already called after the first chunk.
Since I need to navigate to another page when the file is loaded completely, this is bothering me since I need the complete file for further functions.
The code I use at the moment:
async getData() {
const self = this;
let dataList = [];
Papa.parse(await this.fetchCsv(),
{
delimiter: ',',
header: true,
chunk: function (result, parser) {
parser.pause();
dataList = dataList.concat(result.data)
parser.resume();
},
complete: function () {
self.updateData(dataList);
}
});
}
async fetchCsv() {
const response = await fetch(this.props.location.state.filename);
const reader = response.body.getReader();
const result = await reader.read();
const decoder = new TextDecoder('utf-8');
return decoder.decode(result.value);
}
What I've also tried is using step instead of chunk but this did not change anything.
Can anyone tell me what I'm doing wrong here and why papaparse does not load the whole file?

You may be able to let papaparse do more. It can read local File or stream data from a remote server.
If you only have about 500 records, you may not need to add the complexity associated with streaming. This is especially true if you're just accumulating the data (which it appears you are). Use streaming primarily to process the data 1 record at a time.
If you want to stream, I'd recommend using the "step" callback instead of the "chunk" callback so you can process each row of data.
If you use the step or chunk callbacks, then you don't need the complete callback. If it's called, it won't have the data.

Related

JSON stored into an array not callable by specific index

I am trying to develop an app for my fantasy baseball league to use for our draft (we some kind of quirky stuff all the major sites don't account for) - I want to pull some player data to use for the app by using MLB's API. I have been able to get the response from MLB, but can't do anything with the data after I get it back. I am trying to store the JSON into an array, and if I console.log the array as a whole, it will give me the entire chunk of data, but if I try to call the specific index value of the 1st item, it comes back as undefined.
let lastName = 'judge';
let getData = new XMLHttpRequest;
let jsonData = [];
function getPlayer () {
getData.open('GET', `http://lookup-service-
prod.mlb.com/json/named.search_player_all.bam?
sport_code='mlb'&active_sw='Y'&name_part='${lastName}%25'`, true)
getData.onload = function() {
if (this.status === 200) {
jsonData.push(JSON.parse(this.responseText));
}
}
getData.send();
console.log(jsonData);
}
When I change the above console.log to console.log(jsonData[0]) it comes back as undefined. If I go to the console and copy the property path, it displays as [""0""] - Either there has to be a better way to use the JSON data or storing it into an array is doing something abnormal that I haven't encountered before.
Thanks!

The jsonData array will be empty after calling getPlayer function because XHR loads data asynchronously.
You need to access the data in onload handler like this (also changed URL to HTTPS to avoid protocol mismatch errors in console):
let lastName = 'judge';
let getData = new XMLHttpRequest;
let jsonData = [];
function getPlayer () {
getData.open('GET', `https://lookup-service-
prod.mlb.com/json/named.search_player_all.bam?
sport_code='mlb'&active_sw='Y'&name_part='${lastName}%25'`, true)
getData.onload = function() {
if (this.status === 200) {
jsonData.push(JSON.parse(this.responseText));
// Now that we have the data...
console.log(jsonData[0]);
}
}
getData.send();
}

First answer from How to force a program to wait until an HTTP request is finished in JavaScript? question:
There is a 3rd parameter to XmlHttpRequest's open(), which aims to
indicate that you want the request to by asynchronous (and so handle
the response through an onreadystatechange handler).
So if you want it to be synchronous (i.e. wait for the answer), just
specify false for this 3rd argument.
So, you need to change last parameter in open function as below:
getData.open('GET', `http://lookup-service-
prod.mlb.com/json/named.search_player_all.bam?
sport_code='mlb'&active_sw='Y'&name_part='${lastName}%25'`, false)
But from other side, you should allow this method to act asynchronously and print response directly in onload function.

Avoid duplicated publication in Meteor

I'm trying to export some data into a CSV file from a MySQL database using Meter/Sequelize. What I've done so far is to create a Meteor method called by the client which then call a server side function that return the data and I parse it into a csv string. My issue is returning the date client-side.
What I did
I have my CSV String server-side and I'm using FileSaver.js which can only be used client-side.
My "solution" was to create a client-side collection in which I published the String.
methods.js
run({exportParam}) {
if (!this.isSimulation) {
query.booksQuery(exportParam.sorted, exportParam.filtered, 0).then(
result => {
let CSVArr = [];
result.rows.forEach((value) => {
CSVArr.push(value.dataValues);
});
const CSVString = Baby.unparse(CSVArr,{ delimiter: ";"});<-CSV String
console.log("CSVString : ", CSVString);
Meteor.publish("CSVString", function() { <= publication
this.added("CSVCollection", Random.id(), {CSVString: CSVString});
this.ready();
});
});
}
},
And on the client-side I subscribe to the publication this way :
ExportButton.jsx
const handle = Meteor.subscribe('CSVString', {}, function() {
const exportString = myTempCollection.findOne().CSVString;
const blob = new Blob([exportString], {type:"text/plain;charset=utf
8"});
FileSaver.saveAs(blob, "test.csv");
});
My issue
It works great the first time I click my button and a CSV file is downloaded. The problem is that if I do it again I get the same file as the first one and I get this message on my console.
Ignoring duplicate publish named 'CSVString'
I'm pretty sure the problem comes from the fact that every time I click the button the same "CSVString" publication is created.
I'd like to know to know if there is a solution to this problem or if my approach is wrong.
Please let me know if you need anything else.

You are correct in assuming that you are trying to publish to the same collection every time. I think you should only do the publish once, and do that separately from inserting a record into the collection.

Is it possible to save a file directly from a web worker?

I have an entirely browser-based (i.e. no backend) application which analyzes XML data in files which average about 250MB each. The actual parsing and analysis happens in a web worker, which is fed data in 64KB chunks by a FileReader instance, and this is all quite performant.
I have a request from the client to expand this application so that it can generate a .zip file containing the original input file and the results of the analysis, and allow the user to save that file to her local machine. Generating a .zip file in memory with those contents isn't a problem. The problem lies in transferring that much data from the web worker which generates it back to the main browser thread, so that it can be saved; attempting to do this invariably provokes a crash or out-of-memory exception. (I've tried transferring strings all at once and a chunk at a time, and I've tried using an ArrayBuffer as a transferable object to avoid copying. All fail in the same fashion.)
Unfortunately, I don't know any way to invoke a file save operation directly from a worker thread. I know several methods of doing so from the main browser thread, but all of them require either the ability to create DOM nodes (which worker threads of course can't do), or the use of interfaces (i.e. msSaveBlob, saveAs) which no browser seems to expose to a worker thread. I've spent a while looking for possibilities on the web, but found nothing usable; FileWriterSync looked good, but only Chrome supports it, and I need to target IE and Firefox as well.
Is there a method I've overlooked for saving files directly from a web worker? If so, what is it? Or am I just out of luck here?

tl;dr demo
You don't need to copy the entire file to the client side at all. You don't even need to transfer it, in fact. First a recap.
This is how to create Blob from some typed array:
// Some arbitrary binary data
const mydata = new Uint16Array([1,2,3,4,5]);
// mydata vs. mydata.buffer does not seem to make any difference
const blob = new Blob([mydata], {type: "octet/stream"});
You can create an object URL, which is a copy of the original Blob managed by the browser and accessible as URL. I have done this with huge files without seeing performance impact:
const url = URL.createObjectURL(blob);
This is how I typically download URLs:
const link = document.createElement("a");
link.download = "data.bin";
link.href = e.data.link;
link.appendChild(new Text("Download data"));
link.addEventListener("click", function() {
this.parentNode.removeChild(this);
// remember to free the object url, but wait until the download is handled
setTimeout(()=>{URL.revokeObjectURL(e.data.link);}, 500)
});
document.body.appendChild(link);
You can trigger the download automatically by invoking click event on that link. I prefer to let the user decide when to download.
So, all together:
worker.js
// Some arbitrary binary data
const mydata = new Uint16Array([1,2,3,4,5]);
self.onmessage = function(e) {
console.log("Message: ",e.data)
switch(e.data.name) {
case "make-download" :
const blob = new Blob([mydata.buffer], {type: "octet/stream"});
const url = URL.createObjectURL(blob);
self.postMessage({name:"download-link", link:url});
break;
default:
console.error("Unknown message:", e.data.name);
}
}
main.js
var worker = new Worker("worker.js");
worker.addEventListener("message", function(e) {
switch(e.data.name) {
case "download-link" : {
if(e.data.error) {
console.error("Download error: ", e.data.error);
}
else {
const link = document.createElement("a");
link.download = "data.bin";
link.href = e.data.link;
link.appendChild(new Text("Download data"));
link.addEventListener("click", function() {
this.parentNode.removeChild(this);
// remember to free the object url, but wait until the download is handled
setTimeout(()=>{URL.revokeObjectURL(e.data.link);}, 500)
});
document.body.appendChild(link);
}
break;
}
default:
console.error("Unknown message:", e.data.name);
}
});
function requestDownload() {
worker.postMessage({name:"make-download"});
}
When I click Download in my demo, I can see this in my HEX editor:
Looks just fine :)

How to Properly Call API and Cache the Data (Node/Angular)?

I'm currently working on a project that requires me to make an API call. It only allows me to make 500 requests / 10 mins but the data returned (object with ~800 properties) only changes every few months so I rather just cache it somewhere.
I'm very new to this whole thing and I'm wondering how can I make the call every few months and store the data somewhere so that I could retrieve it from the client whenever needed?
Thanks in advance!

Since you want to store your object for a longer period of time, I would suggest writing it to disk rather than caching it in memory (in case your node app crashes).
You didn't mention it precisely, but I assume you are referring to a simple javascript object, which you want to store? To store such an object to disk, you can do the following:
var fs = require("fs");
// with your object being stored in the variable "myObject", after your API call:
var myObject = ....
fs.writeFile( "myFilename.json", JSON.stringify(myObject), "utf8", function(err) {
if(err) {
return console.log(err);
}
// do whatever you want to do after file has been saved...
});
To read the object from disk, simply do:
myObject = require("./filename.json");

Trying to get ng-csv to work with Firebase

I have data stored on Firebase. I have a function that will grab the information from Firebase and return it as an array. I want to be able to use ng-csv to download that file as a .csv however when I download it is an empty file.
Is it possible to use ng-csv if I am trying to grab data from Firebase and if so does anyone have any examples?
Update (from OPs duplicate question):
I am trying to use ng-csv to allow a user to download a .csv file by clicking a button. The information is stored in Firebase and I have created a function that returns the needed information from Firebase as an array. However, I think the problem is when the button is clicked the file is downloaded before the information is pulled from Firebase and loaded, so the .csv file is always empty. Is there a way around this? Here is my code in my main.js app:
this.export = function() {
var results = fireFactory.getResults() //code that returns the array of objects
results.$loaded().then(function(array) {
var test= [];
test.push(array[0]);
test.push(array[1]);
return test;
};
};
Here is my code in my HTML file:
<button class ="btn" ng-csv="main.export()" filename="test.csv">Export</button>
Is there anyway to delay the file downloading until the information has been loaded and returned from the main.export() function?

You are almost there. Frank van Puffelen was on the right path, but stopped short of providing the fix for your code.
return test;
the above statement inside your callback is returning the result inside a promise. This results can only be consumed using promise aware code. Fortunately, ng-csv accepts a promise. If the promise is returned it should work:
this.export = function() {
var results = fireFactory.getResults()
//Here we return the promise for consumption by ng-csv
return results.$loaded().then(function(array) {
var test= [];
test.push(array[0]);
test.push(array[1]);
//the array is returned from the callback, not export
return test;
};
};

You're being tricked by the asynchronous nature in which Firebase loads data. You seem to be thinking that return test; in your original code returns a value from the export function. But if you look more carefully you'll notice that you're actually returning from the (anonymous) callback function.
It's a bit easier to see this, if we separate the callback function out and add some logging statements:
function onDataLoaded(array) {
console.log('Got results from Firebase');
var test= [];
test.push(array[0]);
test.push(array[1]);
return test;
};
this.export = function() {
console.log('Starting to get results from Firebase');
var results = fireFactory.getResults() //code that returns the array of objects
console.log('Started to get results from Firebase');
results.$loaded().then(onDataLoaded);
console.log('Registered result handler');
};
When you call fireFactory.getResults() Firebase will start downloading the data from its servers. Since this may take some time, the downloading happens asynchronously and the browser continues executing your export function, which registers a callback that you want invoked when the data from Firebase is available.
So you'll see the following output in the JavaScript console:
Starting to get results from Firebase
Started to get results from Firebase
Registered result handler
Got results from Firebase

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight