Grunt: copying files very slow. How to improve performance? - angularjs

I've inherited application code that uses Grunt (1.0.0) to build its AngularJS front-end.
What suprised me is that build action step 'copy' (implemented with grunt-contrib-copy plugin) takes very long time: more than 1 minute, while I would expect it to take less than a second.
Here are the execution time statistics for grunt build, including problematic copy tasks:
loading tasks 1.4s - 2%
uglify:build 14.4s ---------- 16%
copy:common 1m 6.4s ---------------------------------------- 76%
copy:partner_xxxxx 4.9s --- 6%
Total 1m 27.9s
The number of copied files seems reasonable:
Running "copy:common" (copy) task
Created 12 directories, copied 179 files
Copying this same destination folder in Windows Explorer takes less than 1 second (drive is a fast SSD).
Here's how gulp task is defined:
copy: {
common: {
cwd: '.',
src: [
'**/*.html',
'**/*.json',
'**/*.cur',
'**/partials/**/*.js',
'**/directives/**/*.js',
'**/app-services/**/*.js',
'**/main-scripts/**/*.js',
'**/bundles/**',
'**/images/**',
'**/utils/**',
'!**/tests/**',
'!**/partner-info/**',
'!**/bower_components/**',
'!**/node_modules/**',
'!bower.json',
'!package.json'
],
dest: publishDest+ "//<%= grunt.option('partnerName') %>"
},
expand: true
}
My question is: is it normal for Grunt to be this slow? Are there any gotchas that may slow down this process? Do you see any ways to improve this time?

Related

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JS heap out of memory [duplicate]

Today I ran my script for filesystem indexing to refresh RAID files index and after 4h it crashed with following error:
[md5:] 241613/241627 97.5%
[md5:] 241614/241627 97.5%
[md5:] 241625/241627 98.1%
Creating missing list... (79570 files missing)
Creating new files list... (241627 new files)
<--- Last few GCs --->
11629672 ms: Mark-sweep 1174.6 (1426.5) -> 1172.4 (1418.3) MB, 659.9 / 0 ms [allocation failure] [GC in old space requested].
11630371 ms: Mark-sweep 1172.4 (1418.3) -> 1172.4 (1411.3) MB, 698.9 / 0 ms [allocation failure] [GC in old space requested].
11631105 ms: Mark-sweep 1172.4 (1411.3) -> 1172.4 (1389.3) MB, 733.5 / 0 ms [last resort gc].
11631778 ms: Mark-sweep 1172.4 (1389.3) -> 1172.4 (1368.3) MB, 673.6 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x3d1d329c9e59 <JS Object>
1: SparseJoinWithSeparatorJS(aka SparseJoinWithSeparatorJS) [native array.js:~84] [pc=0x3629ef689ad0] (this=0x3d1d32904189 <undefined>,w=0x2b690ce91071 <JS Array[241627]>,L=241627,M=0x3d1d329b4a11 <JS Function ConvertToString (SharedFunctionInfo 0x3d1d3294ef79)>,N=0x7c953bf4d49 <String[4]\: ,\n >)
2: Join(aka Join) [native array.js:143] [pc=0x3629ef616696] (this=0x3d1d32904189 <undefin...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::Abort() [/usr/bin/node]
2: 0xe2c5fc [/usr/bin/node]
3: v8::Utils::ReportApiFailure(char const*, char const*) [/usr/bin/node]
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [/usr/bin/node]
5: v8::internal::Factory::NewRawTwoByteString(int, v8::internal::PretenureFlag) [/usr/bin/node]
6: v8::internal::Runtime_SparseJoinWithSeparator(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/bin/node]
7: 0x3629ef50961b
Server is equipped with 16gb RAM and 24gb SSD swap. I highly doubt my script exceeded 36gb of memory. At least it shouldn't
Script creates index of files stored as Array of Objects with files metadata (modification dates, permissions, etc, no big data)
Here's full script code:
http://pastebin.com/mjaD76c3
I've already experiend weird node issues in the past with this script what forced me eg. split index into multiple files as node was glitching when working on such big files as String. Is there any way to improve nodejs memory management with huge datasets?
If I remember correctly, there is a strict standard limit for the memory usage in V8 of around 1.7 GB, if you do not increase it manually.
In one of our products we followed this solution in our deploy script:
node --max-old-space-size=4096 yourFile.js
There would also be a new space command but as I read here: a-tour-of-v8-garbage-collection the new space only collects the newly created short-term data and the old space contains all referenced data structures which should be in your case the best option.
If you want to increase the memory usage of the node globally - not only single script, you can export environment variable, like this:
export NODE_OPTIONS=--max_old_space_size=4096
Then you do not need to play with files when running builds like
npm run build.
Just in case anyone runs into this in an environment where they cannot set node properties directly (in my case a build tool):
NODE_OPTIONS="--max-old-space-size=4096" node ...
You can set the node options using an environment variable if you cannot pass them on the command line.
Here are some flag values to add some additional info on how to allow more memory when you start up your node server.
1GB - 8GB
#increase to 1gb
node --max-old-space-size=1024 index.js
#increase to 2gb
node --max-old-space-size=2048 index.js
#increase to 3gb
node --max-old-space-size=3072 index.js
#increase to 4gb
node --max-old-space-size=4096 index.js
#increase to 5gb
node --max-old-space-size=5120 index.js
#increase to 6gb
node --max-old-space-size=6144 index.js
#increase to 7gb
node --max-old-space-size=7168 index.js
#increase to 8gb
node --max-old-space-size=8192 index.js
I just faced same problem with my EC2 instance t2.micro which has 1 GB memory.
I resolved the problem by creating swap file using this url and set following environment variable.
export NODE_OPTIONS=--max_old_space_size=4096
Finally the problem has gone.
I hope that would be helpful for future.
i was struggling with this even after setting --max-old-space-size.
Then i realised need to put options --max-old-space-size before the karma script.
also best to specify both syntaxes --max-old-space-size and --max_old_space_size my script for karma :
node --max-old-space-size=8192 --optimize-for-size --max-executable-size=8192 --max_old_space_size=8192 --optimize_for_size --max_executable_size=8192 node_modules/karma/bin/karma start --single-run --max_new_space_size=8192 --prod --aot
reference https://github.com/angular/angular-cli/issues/1652
I encountered this issue when trying to debug with VSCode, so just wanted to add this is how you can add the argument to your debug setup.
You can add it to the runtimeArgs property of your config in launch.json.
See example below.
{
"version": "0.2.0",
"configurations": [{
"type": "node",
"request": "launch",
"name": "Launch Program",
"program": "${workspaceRoot}\\server.js"
},
{
"type": "node",
"request": "launch",
"name": "Launch Training Script",
"program": "${workspaceRoot}\\training-script.js",
"runtimeArgs": [
"--max-old-space-size=4096"
]
}
]}
I had a similar issue while doing AOT angular build. Following commands helped me.
npm install -g increase-memory-limit
increase-memory-limit
Source: https://geeklearning.io/angular-aot-webpack-memory-trick/
I just want to add that in some systems, even increasing the node memory limit with --max-old-space-size, it's not enough and there is an OS error like this:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
In this case, probably is because you reached the max mmap per process.
You can check the max_map_count by running
sysctl vm.max_map_count
and increas it by running
sysctl -w vm.max_map_count=655300
and fix it to not be reset after a reboot by adding this line
vm.max_map_count=655300
in /etc/sysctl.conf file.
Check here for more info.
A good method to analyse the error is by run the process with strace
strace node --max-old-space-size=128000 my_memory_consuming_process.js
I've faced this same problem recently and came across to this thread but my problem was with React App. Below changes in the node start command solved my issues.
Syntax
node --max-old-space-size=<size> path-to/fileName.js
Example
node --max-old-space-size=16000 scripts/build.js
Why size is 16000 in max-old-space-size?
Basically, it varies depends on the allocated memory to that thread and your node settings.
How to verify and give right size?
This is basically stay in our engine v8. below code helps you to understand the Heap Size of your local node v8 engine.
const v8 = require('v8');
const totalHeapSize = v8.getHeapStatistics().total_available_size;
const totalHeapSizeGb = (totalHeapSize / 1024 / 1024 / 1024).toFixed(2);
console.log('totalHeapSizeGb: ', totalHeapSizeGb);
Steps to fix this issue (In Windows) -
Open command prompt and type %appdata% press enter
Navigate to %appdata% > npm folder
Open or Edit ng.cmd in your favorite editor
Add --max_old_space_size=8192 to the IF and ELSE block
Your node.cmd file looks like this after the change:
#IF EXIST "%~dp0\node.exe" (
"%~dp0\node.exe" "--max_old_space_size=8192" "%~dp0\node_modules\#angular\cli\bin\ng" %*
) ELSE (
#SETLOCAL
#SET PATHEXT=%PATHEXT:;.JS;=;%
node "--max_old_space_size=8192" "%~dp0\node_modules\#angular\cli\bin\ng" %*
)
Recently, in one of my project ran into same problem. Tried couple of things which anyone can try as a debugging to identify the root cause:
As everyone suggested , increase the memory limit in node by adding this command:
{
"scripts":{
"server":"node --max-old-space-size={size-value} server/index.js"
}
}
Here size-value i have defined for my application was 1536 (as my kubernetes pod memory was 2 GB limit , request 1.5 GB)
So always define the size-value based on your frontend infrastructure/architecture limit (little lesser than limit)
One strict callout here in the above command, use --max-old-space-size after node command not after the filename server/index.js.
If you have ngnix config file then check following things:
worker_connections: 16384 (for heavy frontend applications)
[nginx default is 512 connections per worker, which is too low for modern applications]
use: epoll (efficient method) [nginx supports a variety of connection processing methods]
http: add following things to free your worker from getting busy in handling some unwanted task. (client_body_timeout , reset_timeout_connection , client_header_timeout,keepalive_timeout ,send_timeout).
Remove all logging/tracking tools like APM , Kafka , UTM tracking, Prerender (SEO) etc middlewares or turn off.
Now code level debugging: In your main server file , remove unwanted console.log which is just printing a message.
Now check for every server route i.e app.get() , app.post() ... below scenarios:
data => if(data) res.send(data) // do you really need to wait for data or that api returns something in response which i have to wait for?? , If not then modify like this:
data => res.send(data) // this will not block your thread, apply everywhere where it's needed
else part: if there is no error coming then simply return res.send({}) , NO console.log here.
error part: some people define as error or err which creates confusion and mistakes. like this:
`error => { next(err) } // here err is undefined`
`err => {next(error) } // here error is undefined`
`app.get(API , (re,res) =>{
error => next(error) // here next is not defined
})`
remove winston , elastic-epm-node other unused libraries using npx depcheck command.
In the axios service file , check the methods and logging properly or not like :
if(successCB) console.log("success") successCB(response.data) // here it's wrong statement, because on success you are just logging and then `successCB` sending outside the if block which return in failure case also.
Save yourself from using stringify , parse etc on accessive large dataset. (which i can see in your above shown logs too.
Last but not least , for every time when your application crashes or pods restarted check the logs. In log specifically look for this section: Security context
This will give you why , where and who is the culprit behind the crash.
I will mention 2 types of solution.
My solution : In my case I add this to my environment variables :
export NODE_OPTIONS=--max_old_space_size=20480
But even if I restart my computer it still does not work. My project folder is in d:\ disk. So I remove my project to c:\ disk and it worked.
My team mate's solution : package.json configuration is worked also.
"start": "rimraf ./build && react-scripts --expose-gc --max_old_space_size=4096 start",
For other beginners like me, who didn't find any suitable solution for this error, check the node version installed (x32, x64, x86). I have a 64-bit CPU and I've installed x86 node version, which caused the CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory error.
if you want to change the memory globally for node (windows) go to advanced system settings -> environment variables -> new user variable
variable name = NODE_OPTIONS
variable value = --max-old-space-size=4096
You can also change Window's environment variables with:
$env:NODE_OPTIONS="--max-old-space-size=8192"
Unix (Mac OS)
Open a terminal and open our .zshrc file using nano like so (this will create one, if one doesn't exist):
nano ~/.zshrc
Update our NODE_OPTIONS environment variable by adding the following line into our currently open .zshrc file:
export NODE_OPTIONS=--max-old-space-size=8192 # increase node memory limit
Please note that we can set the number of megabytes passed in to whatever we like, provided our system has enough memory (here we are passing in 8192 megabytes which is roughly 8 GB).
Save and exit nano by pressing: ctrl + x, then y to agree and finally enter to save the changes.
Close and reopen the terminal to make sure our changes have been recognised.
We can print out the contents of our .zshrc file to see if our changes were saved like so: cat ~/.zshrc.
Linux (Ubuntu)
Open a terminal and open the .bashrc file using nano like so:
nano ~/.bashrc
The remaining steps are similar with the Mac steps from above, except we would most likely be using ~/.bashrc by default (as opposed to ~/.zshrc). So these values would need to be substituted!
Link to Nodejs Docs
Use the option --optimize-for-size. It's going to focus on using less ram.
I had this error on AWS Elastic Beanstalk, upgrading instance type from t3.micro (Free tier) to t3.small fixed the error
In my case, I upgraded node.js version to latest (version 12.8.0) and it worked like a charm.
Upgrade node to the latest version. I was on node 6.6 with this error and upgraded to 8.9.4 and the problem went away.
For Angular, this is how I fixed
In Package.json, inside script tag add this
"scripts": {
"build-prod": "node --max_old_space_size=5048 ./node_modules/#angular/cli/bin/ng build --prod",
},
Now in terminal/cmd instead of using ng build --prod just use
npm run build-prod
If you want to use this configuration for build only just remove --prod from all the 3 places
I experienced the same problem today. The problem for me was, I was trying to import lot of data to the database in my NextJS project.
So what I did is, I installed win-node-env package like this:
yarn add win-node-env
Because my development machine was Windows. I installed it locally than globally. You can install it globally also like this: yarn global add win-node-env
And then in the package.json file of my NextJS project, I added another startup script like this:
"dev_more_mem": "NODE_OPTIONS=\"--max_old_space_size=8192\" next dev"
Here, am passing the node option, ie. setting 8GB as the limit.
So my package.json file somewhat looks like this:
{
"name": "my_project_name_here",
"version": "1.0.0",
"private": true,
"scripts": {
"dev": "next dev",
"dev_more_mem": "NODE_OPTIONS=\"--max_old_space_size=8192\" next dev",
"build": "next build",
"lint": "next lint"
},
......
}
And then I run it like this:
yarn dev_more_mem
For me, I was facing the issue only on my development machine (because I was doing the importing of large data). Hence this solution. Thought to share this as it might come in handy for others.
I had the same issue in a windows machine and I noticed that for some reason it didn't work in git bash, but it was working in power shell
Just in case it may help people having this issue while using nodejs apps that produce heavy logging, a colleague solved this issue by piping the standard output(s) to a file.
If you are trying to launch not node itself, but some other soft, for example webpack you can use the environment variable and cross-env package:
$ cross-env NODE_OPTIONS='--max-old-space-size=4096' \
webpack --progress --config build/webpack.config.dev.js
For angular project bundling, I've added the below line to my pakage.json file in the scripts section.
"build-prod": "node --max_old_space_size=5120 ./node_modules/#angular/cli/bin/ng build --prod --base-href /"
Now, to bundle my code, I use npm run build-prod instead of ng build --requiredFlagsHere
hope this helps!
If any of the given answers are not working for you, check your installed node if it compatible (i.e 32bit or 64bit) to your system. Usually this type of error occurs because of incompatible node and OS versions and terminal/system will not tell you about that but will keep you giving out of memory error.
None of all these every single answers worked for me (I didn't try to update npm tho).
Here's what worked: My program was using two arrays. One that was parsed on JSON, the other that was generated from datas on the first one. Just before the second loop, I just had to set my first JSON parsed array back to [].
That way a loooooot of memory is freed, allowing the program to continue execution without failing memory allocation at some point.
Cheers !
You can fix a "heap out of memory" error in Node.js by below approaches.
Increase the amount of memory allocated to the Node.js process by using the --max-old-space-size flag when starting the application. For example, you can increase the limit to 4GB by running node --max-old-space-size=4096 index.js.
Use a memory leak detection tool, such as the Node.js heap dump module, to identify and fix memory leaks in your application. You can also use the node inspector and use chrome://inspect to check memory usage.
Optimize your code to reduce the amount of memory needed. This might involve reducing the size of data structures, reusing objects instead of creating new ones, or using more efficient algorithms.
Use a garbage collector (GC) algorithm to manage memory automatically. Node.js uses the V8 engine's garbage collector by default, but you can also use other GC algorithms such as the Garbage Collection in Node.js
Use a containerization technology like Docker which limits the amount of memory available to the container.
Use a process manager like pm2 which allows to automatically restart the node application if it goes out of memory.

How much does minifying reduce file size?

In a React app, is it possible to find out how much Webpack's minification reduces the project's size excluding all the dependencies and packages not written by the project's developer?
My build/static/ directory is currently bigger than my src directory and I believe it is because code from the dependencies is also minified with the files of interest. Where could I find something to approximately compare my src directory size to?
I built the project with npm run build to find out how large the output is with minimization enabled.
Then I edited node_modules/react-scripts/config/webpack.config.js and changed line 189 (the line with the minimize property) from:
...
optimization: {
minimize: isEnvProduction,
minimizer: [
...
to:
...
optimization: {
minimize: false,
minimizer: [
...
to disable minimization.
Then built the project again to find out its size without minimization.
You can compare the built file sizes manually to find out their difference, but you will also get a nice output to the terminal when building the second time:
File sizes after gzip:
613.36 KB (+504.48 KB) build\static\js\2.0ddf8239.chunk.js
60.24 KB (+20.01 KB) build\static\js\main.8e9dd59c.chunk.js
4.73 KB (+457 B) build\static\css\main.aaaa4d7d.chunk.css
1.66 KB (+933 B) build\static\js\runtime~main.7f8cc4df.js
Notice the differences stated in parantheses.

Why EXT4/JBD2 after mounted keeps calling ext4_journal_stop?

I was investigating the journaling layer used in the EXT4 (JBD2) and I added some printk to see the behavior of the ext4_journal_start and ext4_journal_stop functions being called.
This is the procedure:
I first format a given partition using:
sudo mke2fs -t ext4 /dev/vdb
(I am using QEMU to run this experiment)
Then I mount it:
sudo mount /dev/vdb /mnt/mydisk
That is the normal procedure for mounting, but when I mount it, because of my printk's functions in both ext4_journal_start/stop, the dmesg shows a lot of calls to journal_stop without any journal_start.
P.S.: I should guess that it is some background behavior of EXT4 or something, but I have no idea what is it.
Here is the dmesg output:
* Restoring resolver state... [ OK ]
* Stopping System V runlevel compatibility [ OK ]
[ 124.648904] JOURNAL STOP
[ 124.778691] JOURNAL STOP
...
[... ] # it is called maybe more than 40 times
...
[ 129.641895] jbd2_journal_commit_transaction
[ 129.769132] JOURNAL STOP
...
[... ] # it is called maybe more than 40 times
...
[ 134.766164] jbd2_journal_commit_transaction
After 134 seconds, it stops these messages, and then I try to write some file into that mounting point, and it behaves as expected.
[ 624.995549] JOURNAL START
[ 624.996849] JOURNAL STOP
[ 625.000676] JOURNAL START
[ 625.001757] JOURNAL START
[ 625.002822] JOURNAL STOP
[ 625.003773] JOURNAL STOP
[ 631.004110] jbd2_journal_commit_transaction
So, it is strange that after mounting, even that I did absolutely nothing, these functions are being called (journal_stop) several times and, furthermore, after two commits (the function call jbd2_journal_commit_transaction) the dmesg gets stable, and it then follows an expected behavior.
To make it clear, my question is: what causes this several calls without any reason (the ext4_journal_stop)?
By debugging the ext4 source code, I discovered what causes the several journal operations right after mounting the file system.
The ext4 is creating the inode table, so that is why the journal is called several times right after mounting a partition.

Fail the Jenkins build if unit test execution time exceeds limit

I would like to fail my builds if ANY particular unit test execution time (not the summary tests run time) exceeds certain reasonable limit, say two seconds. I am using MSTest.
Thanks!
Use the timeout block to create a timeout failure. Here is an example from the Jenkins CI Jenkinsfile:
// We're wrapping this in a timeout - if it takes more than 180 minutes, kill it.
timeout(time: 180, unit: 'MINUTES') {
// See below for what this method does - we're passing an arbitrary environment
// variable to it so that JAVA_OPTS and MAVEN_OPTS are set correctly.
withMavenEnv(["JAVA_OPTS=-Xmx1536m -Xms512m -XX:MaxPermSize=1024m",
"MAVEN_OPTS=-Xmx1536m -Xms512m -XX:MaxPermSize=1024m"]) {
// Actually run Maven!
// The -Dmaven.repo.local=${pwd()}/.repository means that Maven will create a
// .repository directory at the root of the build (which it gets from the
// pwd() Workflow call) and use that for the local Maven repository.
sh "mvn -Pdebug -U clean install ${runTests ? '-Dmaven.test.failure.ignore=true -Dconcurrency=1' : '-DskipTests'} -V -B -Dmaven.repo.local=${pwd()}/.repository"
}
}

Google Compute Engine VM instance: VFS: Unable to mount root fs on unknown-block

My instance on Google Compute Engine is not booting up due to having some boot order issues.
So, I have created a another instance and re-configured my machine.
My questions:
How can I handle these issues when I host some websites?
How can I recover my data from old disk?
logs
[ 0.348577] Key type trusted registered
[ 0.349232] Key type encrypted registered
[ 0.349769] AppArmor: AppArmor sha1 policy hashing enabled
[ 0.350351] ima: No TPM chip found, activating TPM-bypass!
[ 0.351070] evm: HMAC attrs: 0x1
[ 0.351549] Magic number: 11:333:138
[ 0.352077] block ram3: hash matches
[ 0.352550] rtc_cmos 00:00: setting system clock to 2015-12-19 17:06:53 UTC (1450544813)
[ 0.353492] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[ 0.354108] EDD information not available.
[ 0.536267] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
[ 0.537862] md: Waiting for all devices to be available before autodetect
[ 0.538979] md: If you don't use raid, use raid=noautodetect
[ 0.539969] md: Autodetecting RAID arrays.
[ 0.540699] md: Scanned 0 and added 0 devices.
[ 0.541565] md: autorun ...
[ 0.542093] md: ... autorun DONE.
[ 0.542723] VFS: Cannot open root device "sda1" or unknown-block(0,0): error -6
[ 0.543731] Please append a correct "root=" boot option; here are the available partitions:
[ 0.545011] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[ 0.546199] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-39-generic #44~14.04.1-Ubuntu
[ 0.547579] Hardware name: Google Google, BIOS Google 01/01/2011
[ 0.548728] ffffea00008ae140 ffff880024ee7db8 ffffffff817af92b 000000000000111e
[ 0.549004] ffffffff81a7c7c8 ffff880024ee7e38 ffffffff817a976b ffff880024ee7dd8
[ 0.549004] ffffffff00000010 ffff880024ee7e48 ffff880024ee7de8 ffff880024ee7e38
[ 0.549004] Call Trace:
[ 0.549004] [] dump_stack+0x45/0x57
[ 0.549004] [] panic+0xc1/0x1f5
[ 0.549004] [] mount_block_root+0x210/0x2a9
[ 0.549004] [] mount_root+0x54/0x58
[ 0.549004] [] prepare_namespace+0x16d/0x1a6
[ 0.549004] [] kernel_init_freeable+0x1f6/0x20b
[ 0.549004] [] ? initcall_blacklist+0xc0/0xc0
[ 0.549004] [] ? rest_init+0x80/0x80
[ 0.549004] [] kernel_init+0xe/0xf0
[ 0.549004] [] ret_from_fork+0x58/0x90
[ 0.549004] [] ? rest_init+0x80/0x80
[ 0.549004] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 0.549004] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
What Causes This?
That is the million dollar question. After inspecting my GCE VM, I found out there were 14 different kernels installed taking up several hundred MB's of space. Most of the kernels didn't have a corresponding initrd.img file, and were therefore not bootable (including 3.19.0-39-generic).
I certainly never went around trying to install random kernels, and once removed, they no longer appear as available upgrades, so I'm not sure what happened. Seriously, what happened?
Edit: New response from Google Cloud Support.
I received another disconcerting response. This may explain the additional, errant kernels.
"On rare occasions, a VM needs to be migrated from one physical host to another. In such case, a kernel upgrade and security patches might be applied by Google."
1. "How can I handle these issues when I host some websites?"
My first instinct is to recommend using AWS instead of GCE. However, GCE is less expensive. Before doing any upgrades, make sure you take a snapshot, and try rebooting the server to see if the upgrades broke anything.
2. How can I recover my data from old disk?
Even Better - How to recover your instance...
After several back-and-forth emails, I finally received a response from support that allowed me to resolve the issue. Be mindful, you will have to change things to match your unique VM.
Take a snapshot of the disk first in case we need to roll back any of the changes below.
Edit the properties of the broken instance to disable this option: "Delete boot disk when instance is deleted"
Delete the broken instance.
IMPORTANT: ensure not to select the option to delete the boot disk. Otherwise, the disk will get removed permanently!!
Start up a new temporary instance.
Attach the broken disk (this will appear as /dev/sdb1) to the temporary instance
When the temporary instance is booted up, do the following:
In the temporary instance:
# Run fsck to fix any disk corruption issues
$ sudo fsck.ext4 -a /dev/sdb1
# Mount the disk from the broken vm
$ sudo mkdir /mnt/sdb
$ sudo mount /dev/sdb1 /mnt/sdb/ -t ext4
# Find out the UUID of the broken disk. In this case, the uuid of sdb1 is d9cae47b-328f-482a-a202-d0ba41926661
$ ls -alt /dev/disk/by-uuid/
lrwxrwxrwx. 1 root root 10 Jan 6 07:43 d9cae47b-328f-482a-a202-d0ba41926661 -> ../../sdb1
lrwxrwxrwx. 1 root root 10 Jan 6 05:39 a8cf6ab7-92fb-42c6-b95f-d437f94aaf98 -> ../../sda1
# Update the UUID in grub.cfg (if necessary)
$ sudo vim /mnt/sdb/boot/grub/grub.cfg
Note: This ^^^ is where I deviated from the support instructions.
Instead of modifying all the boot entries to set root=UUID=[uuid character string], I looked for all the entries that set root=/dev/sda1 and deleted them. I also deleted every entry that didn't set an initrd.img file. The top boot entry with correct parameters in my case ended up being 3.19.0-31-generic. But yours may be different.
# Flush all changes to disk
$ sudo sync
# Shut down the temporary instance
$ sudo shutdown -h now
Finally, detach the HDD from the temporary instance, and create a new instance based off of the fixed disk. It will hopefully boot.
Assuming it does boot, you have a lot of work to do. If you have half as many unused kernels as me, then you might want to purge the unused ones (especially since some are likely missing a corresponding initrd.img file).
I used the second answer (the terminal-based one) in this askubuntu question to purge the other kernels.
Note: Make sure you don't purge the kernel you booted in with!
How to handle these issues when I host some websites?
I'm not sure how you got into this situation, but it would be nice to have additional information (see my comment above) to be able to understand what triggered this issue.
How to recover my data from old disk?
Attach and mount the disk
Assuming you did not delete the original disk when you deleted the instance, you can simply mount this disk from another VM to read the data from it. To do this:
attach the disk to another VM instance, e.g.,
gcloud compute instances attach-disk $INSTANCE --disk $DISK
mount the disk:
sudo mkdir -p /mnt/disks/[MNT_DIR]
sudo mount [OPTIONS] /dev/disk/by-id/google-[DISK_NAME] /mnt/disks/[MNT_DIR]
Note: you'll need to substitute appropriate values for:
MNT_DIR: directory
OPTIONS: options appropriate for your disk and filesystem
DISK_NAME: the id of the disk after you attach it to the VM
Unmounting and detaching the disk
When you are done using the disk, reverse the steps:
Note: Before you detach a non-root disk, unmount the disk first. Detaching a mounted disk might result in incomplete I/O operation and data corruption.
unmount the disk
sudo umount /dev/disk/by-id/google-[DISK_NAME]
detach the disk from the VM:
gcloud compute instances detach-disk $INSTANCE --device-name my-new-device
In my case grub's (/boot/grub/grub.cfg) first menuentry (3.19.0-51-generic) was missing an initrd entry and was unable to boot.
Upon further investigating, looking at dpkg for the specific kernel its marked as failed and unconfigured
dpkg -l | grep 3.19.0-51-generic
iF linux-image-3.19.0-51-generic 3.19.0-51.58~14.04.1
iU linux-image-extra-3.19.0-51-generic 3.19.0-51.58~14.04.1
This all stemmed from the Ubuntu image supplied by Google having unattended-upgrades enabled. For some reason the initrd was killed when it was being built and something else came along and ran update-grub2.
unattended-upgrades-dpkg_2016-03-10_06:49:42.550403.log:update-initramfs: Generating /boot/initrd.img-3.19.0-51-generic
Killed
E: mkinitramfs failure cpio 141 xz -8 --check=crc32 137
unattended-upgrades-dpkg_2016-03-10_06:49:42.550403.log:update-initramfs: failed for /boot/initrd.img-3.19.0-51-generic with 1.
To work around the immediate problem run.
dpkg --force-confold --configure -a
Although unattended-upgrades in theory is a great idea, having it enabled by default can have unattended consequences.
There are a few cases where the kernel fails to handle the initrdless boot. Disable the GRUB_FORCE_PARTUUID options so that it boots with initrd.

Resources