apache2 FastCGI comm with dynamic server aborted first read idle timeout - apache2

Summary: Unable to run any of the most simple “Hello World” FastCGI script, any request always terminating into a time out. Seems there is no communication at all between the server and the FastCGI scripts (using dynamic FastCGI scripts).
The environment
Ubuntu Precise (12.04)
Package apache2.2-bin
Package apache2-mpm-prefork
Package libapache2-mod-fastcgi
Package libfcgi-perl
Package python-flup
Multiple sites configured as virtual hosts on 127.0.0.1
There exists a /var/lib/apache2/fastcgi directory, owned by www-data, readable by all (owner, group and others)
There exists a /var/lib/apache2/fastcgi/dynamic directory, owned by www-data, which is restricted to the owner (readable, writable and accessible by www-data only)
There exists an inode/socket file in the /var/lib/apache2/fastcgi/ directory
The FastCGI relevant configurations:
The directory /etc/apache2/mods-enabled/ holds a reference to fastcgi.conf and fastcgi.load (mod_fastcgi is enabled).
The file fastcgi.conf contains the following (left untouched, I did not edit it):
<IfModule mod_fastcgi.c>
AddHandler fastcgi-script .fcgi
#FastCgiWrapper /usr/lib/apache2/suexec
FastCgiIpcDir /var/lib/apache2/fastcgi
</IfModule>
The relevant configuration file in /etc/apache2/sites-enabled/ contains the following (there is nothing more anywhere else about FastCGI specific configuration):
<DirectoryMatch /fcgi-bin>
Options +ExecCGI
<FilesMatch "^[^\.]+$">
SetHandler fastcgi-script
</FilesMatch>
</DirectoryMatch>
The test materials on the test virtual host:
There exist a fcgi-bin/test-perl.fcgi whose content is (the file is executable by all, and readable by owner and group):
#!/usr/bin/perl
use CGI::Fast qw(:standard);
$COUNTER = 0;
while (new CGI::Fast) {
print header;
print start_html("Fast CGI Rocks");
print
h1("Fast CGI Rocks"),
"Invocation number ",b($COUNTER++),
" PID ",b($$),".",
hr;
print end_html;
}
There exist a fcgi-bin/test-python.fcgi whose content is (the file is executable by all, and readable by owner and group):
#!/usr/bin/python
def myapp(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/plain')])
return ['Hello World!\n']
try:
from flup.server.fcgi import WSGIServer
WSGIServer(myapp).run()
except:
import sys, traceback
traceback.print_exc(file=open("errlog.txt","a"))
The issue
Although both fcgi-bin/test-perl.fcgi and fcgi-bin/test-python.fcgi runs normally when executed from the command‑line, none seems to work when invoked, e.g. as http://test.loc/fcgi-bin/test-perl.fcgi or http://test.loc/fcgi-bin/test-python.fcgi.
Nothing at all happens, and after some delay, I get an Error 500, and Apache error logs contains multiple entries looking like:
[<date>] [error] [client <IP>] FastCGI: comm with (dynamic) server "/<…>/fcgi-bin/<script>.fcgi" aborted: (first read) idle timeout (30 sec), referer: <referrer>
[<date>] [error] [client <IP>] FastCGI: incomplete headers (0 bytes) received from server "<…>/fcgi-bin/<script>.fcgi", referer: <referrer>
I've spent hours and hours searching the web trying to understand why it does not work, and finally decided to give up and ask for some help here.
Any pointers and check list welcome. Feel free to ask for any missing details you may feel to be relevant or worth checking.
Enjoy a nice day.
-- edit --
Issue update
In my own reply to my own question, I mentioned a weird case where things were looking suddenly fine without reasons. I later discovered this was only partly fine.
In the same virtual host, so with the exact same server configuration, some scripts, which are exactly the same (and with exact same access rights), fails depending on their location.
As a remainder, here is what's in the site configuration:
<DirectoryMatch /fcgi-bin>
Options +ExecCGI
<FilesMatch "^[^\.]+$">
SetHandler fastcgi-script
</FilesMatch>
</DirectoryMatch>
With the above, only scripts in /fcgi-bin are handled as FastCGI script. But I also have some elsewhere (still for testing): one in /cgi-bin and one in / (i.e. in the public_html directory). For this purpose, .htaccess contains this entry:
Options +ExecCGI
AddHandler fastcgi-script .fcgi
So the two others FastCGI script should work the same as the one in /fcgi-bin, but they don't, and for the time, they invariably terminates with a connexion time‑out, just like the one /fcgi-bin first did.
This makes me feel something may be wrong with the mod_fastcgi module (known bug? else?). So far, this module seems to act rather randomly.
-- edit 2 --
The above in the first edit, was an error of mine: the group was wrong with the other scripts, it had to be www-data, but it was not. So is something is wrong, stick to the answer I gave, that is, try to look at the FastCgiConfig, and see if it solve anything or at least if it honours the time‑out options.

I will answer my own question, as it seems to be working now. However, the epilogue still looks weird.
Although the default configuration should be OK, I still wanted to review the “Module mod_fastcgi” document again. As I only wanted a dynamic FastCGI, I focused on the FastCgiConfig directive only, thus on purpose not going into FastCgiServer and FastCgiExternalServer directives.
As there was no FastCgiServer at all in the default fastcgi.conf file, I started to try to set‑up my own. For a first test, I wanted to use the -appConnTimeout option, at least to request the server to not wait so much long before it returns me an Error 500.
So I just added this in the site configuration (I did not touch fastcgi.cong), in the same file where virtual hosts are configured:
FastCgiConfig -appConnTimeout 2
This was to tell the server to wait no more than 2 seconds, instead of the 30 seconds it was waiting. I tried to invoked a FastCGI script to see if at least this configuration was working. I expected to get an error in a 2 seconds delay, but instead, the script ran without error.
What's weird, is that I then tried to remove this option, to check if it was just that addition which was just missing to make FastCGI scripts working. But after I commented‑out the option, it was still working, and the same after a full reboot.
Can't tell more, that looks weird, but this is the only thing I did, I did not edit anything else. I can just suggest people who may encounter a similar issue, to just try the above.
Sorry, if I can't explain what it did exactly. I really would like to know. It just working now, but I don't know why.

#############
fastcgi.conf
FastCgiWrapper Off

peng.rl 's answer solve my problem.
My ceph radosgw can't get apache's input at all. after set FastCgiWrapper Off, I can capture data in wireshark.

Related

DBus : Can't get match rules for my user's session bus

I'm trying to use dbus/tools/GetAllMatchRules.py to get diagnostic information. When I run it without parameters as my regular user I get "GetConnectionMatchRules failed: did you enable the Stats interface?"
I modified GetAllMatchRules to print the specific exception details. It now says
GetConnectionMatchRules failed: did you enable the Stats interface?: org.freedesktop.DBus.Error.AccessDenied: The caller does not have the necessary privileged to call this method
So then I'm wondering, does it work at all? So I sudo su and run it again and it gives me the kind of information I'd expect to see, just not for the right bus. Oddly, if I use the --system parameter, even root gets org.freedesktop.DBus.Error.AccessDenied .
The repository claims, in bus/example-session-disable-stats.conf.in , that
"If the Stats interface was enabled at compile-time, users can use it on
the session bus by default. Systems providing isolation of processes
with LSMs might want to restrict this. This can be achieved by copying
this file in #EXPANDED_SYSCONFDIR#/dbus-1/session.d/
"
But that's clearly not the case because my user can NOT access this information.
I even tried a brute force approach to disabling (commenting out) ALL deny statements at /usr/share/dbus-1/system.conf and reloading and it still doesn't work. I also tried a full system restart in case I wasn't reloading correctly. I also did a system-wide search for system.conf in case it's actually using some other conf file that I'm not seeing, which would mean I'm modifying the wrong thing. I got a big hint that that's not the case when I had a typo (-- instead of --> for commenting out) and it failed to reload, but did reload once I fixed the typo.
I'm ok with the possibility that I can only do this signed in as root, so I also tried modifying GetAllMatchRules to use dbus.bus.BusConnection(), and force-feeding it the session address (unix:path=/run/user/1000/bus) which results in
"org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Incidentally, this is the same issue that happens if I leave the code alone but use sudo -E su instead of just sudo su (the -E option in this case means that the $DBUS_SESSION_BUS_ADDRESS variable is retained)
I'm not sure what to try next...
Turns out there isn't currently a solution, the privilege error is simply the code that was chosen to indicate that the method is an unimplemented stub method

what does gitolite setup fix?

gitolite info didn't work, adding keys turned them into a no access key and did NOT create a corresponding entry in auth-keys file.
To fix this run gitolite setup on gitolite server
Question: what could have landed me in that mess?
And what does gitolite setup do when invoked for the n-th time (it's no longer setting things up, according to the docs it fixes hooks, but I wonder what the use case would be and which was mine)?
More details on gitolite info
gitolite info command is invoked like so:
> ssh git-user#ser-git
PTY allocation request failed on channel 0
hello git-admin, this is ...#... running gitolite3 3.6.7-2 (Debian) on git 2.17.1
R W some-repository
R W gitolite-admin
R W testing
Connection to ser-git closed.
Bad output is: FATAL: unknown git/gitolite command: 'info'
More details: keys without access.
gitolite sshkeys-lint was showing keys with (no access), now those keys have access as I set them (now meaning after gitolite setup).
ssh-keygen -lf /home/repo/.ssh/authorized_keys | wc -l (or without piped part, regardless) number of keys and their names indicated I didn't have the newest one added.
Similar question that did not work for me: keydir entries not propagating to authorized_keys
Docs pretty much had the answer once I dug deeper, I guess. Which is fairly nice of #sitaramc.
Without options, 'gitolite setup' is a general "fix up everything" command
(for example, if you brought in repos from outside, or someone messed
around with the hooks, or you made an rc file change that affects access
rules, etc.)
Symptoms keys stopped propagating and error FATAL: unknown git/gitolite command: 'info' on ssh git-user#ser-git. Fix was to run gitolite setup. So onto first question, the title one:
what does gitolite setup fix?
gitolite setup is implemented here
my Perl is rather weak, but there's a setup function in line 56. It calls args (which parses options, so here it had nothing to parse), then unless h_only (hooks only arg for setup), which wasn't used, so we skip compile and POST_COMPILE trigger and go for the hooks.
sub setup {
my ( $admin, $pubkey, $h_only, $message ) = args();
unless ($h_only) {
setup_glrc();
setup_gladmin( $admin, $pubkey, $message );
_system("gitolite compile");
_system("gitolite trigger POST_COMPILE");
}
hook_repos(); # all of them, just to be sure
}
package Gitolite::conf::store has hook_repos(), line 228: we change the dir to repo base dir (as per config file), and for each phy_repo we do hook_1(phy_repo). What is a phy_repo? a physical one.
same package, different method and line: hook_1($repo) in line 354.
Method hook_1($repo)
It's quite literally about fixing all the hooks.
Recreates dirs for common and admin hooks.
Rewrites update_hook (common) and post_update_hook (admin).
Sets 755 permissions for both common and admin hooks.
Then using ln_sf it symlinks the folders for common/admin hooks.
ln_sf is in common module, in line 162

C program exits giving error ORA-12162: TNS:net service name is incorrectly specified

I am working on a remote red-hat server and there I'm developing a c application to insert data in to a remote oracle database. So first i installed the OCI instant client rpm on the server and tried to compile a sample program. after certain linkages I could compile it. But then when I am going to run it. It exits giving an error saying
ORA-12162: TNS:net service name is incorrectly specified
The sample code I used is from the blog (refer to this code in case you need to clarify the things.where I’m quoting only few pieces to this post) René Nyffenegger's collection of things on the web
René Nyffenegger on Oracle
(refer to this code in case you need to clarify the things.where I’m quoting only few pieces to this post)
In the code I added some prints to check for the error And it seems like It gets stuck in the OCIServerAttach() function r gives a printed walue of -1
r=OCIServerAttach(srv, err, dbname, strlen(dbname), (ub4) OCI_DEFAULT);
printf("r value %d",r);
if (r != OCI_SUCCESS) {
checkerr(err, r);
goto clean_up;
}
Another point is that in the compilation process it gives a warning saying that a certain libry is not include. but the exicutable file is created. Here is the massage I get in the compilation process.
[laksithe#loancust ~]$ gcc -L$ORACLE_HOME/lib/ -L$ORACLE_HOME/rdbms/lib/ -o oci_test oci_test.o -L/usr/lib/oracle/12.1/client64/lib -lclntsh `cat $ORACLE_HOME/lib/sysliblist`
cat: /lib/sysliblist: No such file or directory
Going through the web I found that by creating a tnsnames.ora file with the connection details I could solve the problem. But even It didn't work for me. Here is the link for that blog blog
It has been a week since this error and I cold'nt solve it. could someone please help me.
connection string format I used is abc.ghi.com:1521/JKLMN
My recommendation is to bypass tnsnames completely. Oracle has always allowed you to put in the direct connection details, but EZConnect makes that even easier.
When you format your connection string, instead of listing the TNS name, use the actual connection properties in the following format:
servername:port/service name
For Example
MyOracle.MyCompany.Com:1521/SalesReporting
Your connection string might also require direct=true, but I'm honestly not sure.
I like the idea of tnsnames, but it's a double edged sword. When it works, it's great. When it doesn't, you want to throw something. With EZConnect, it always works.
By the way, if you don't know the properties of the three items above, find a machine that connect via tnsnames and:
tnsping <your TNS-named database>

Undocumented Managed VM task queue RPCFailedError

I'm running into a very peculiar and undocumented issue with a GAE Managed VM and Task Queues. I understand that the Managed VM service is in beta, so this question may not be relevant forever, but it's definitely causing me lots of headache now.
The main symptom of the issue is that, in certain (not completely known to me) circumstances, I'm seeing the following error/traceback:
File "/home/vmagent/my_app/some_file.py", line 265, in some_ndb_tasklet
res = yield some_task.add_async('some-task-queue-name')
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 472, in _on_rpc_completion
result = rpc.get_result()
File "/home/vmagent/python_vm_runtime/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/home/vmagent/python_vm_runtime/google/appengine/api/taskqueue/taskqueue.py", line 1948, in ResultHook
rpc.check_success()
File "/home/vmagent/python_vm_runtime/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
self.__rpc.CheckSuccess()
File "/home/vmagent/python_vm_runtime/google/appengine/ext/vmruntime/vmstub.py", line 312, in _WaitImpl
raise self._ErrorException(*_DEFAULT_EXCEPTION)
RPCFailedError: The remote RPC to the application server failed for call taskqueue.BulkAdd().
I've gone through my local App Engine SDK to trace this through, and I can get up to the last line of the trace, but google/appengine/ext/vmruntime/ doesn't exist on my machine at all, so I have no idea what's happening in vmstub.py. From looking at the local code, some_task.add_async('the-queue') is spinning up an RPC and waiting for it to finish, but this error is not what the except apiproxy_errors.ApplicationError, e: at line 1949 of taskqueue.py is expecting...
The code that's generating the error looks something like this:
#ndb.tasklet
def kickoff_tasks(batch_of_payloads):
for task_payload in batch_of_payloads:
# task_payload is a dict
task = taskqueue.Task(
url='/the/handler/url',
params=payload)
res = yield task.add_async('some-valid-task-queue-name')
Other things worth noting:
this code itself is running in a task handler kicked off by another task.
I first saw this error before implementing this sort of batching, and assumed the issue was because I had added too many tasks from within a task handler.
In some cases, I can run this successfully with a batch size of 100, but in others, it fails consistently (depending on the data in the payloads) at 100, and sometimes succeeds at batch sizes of 50.
The task payloads themselves include batches of items, and are tuned to be just small enough to fit in a task. App Engine advertises a maximum task size of 100KB, so I'm keeping the payloads to under 90,000 bytes right now. Lowering the size even more doesn't seem to help any.
I've also tried implementing an exponential backoff to retry the kickoff_tasks method when this error appears, but it seems that once the error is raised, I can't add any other tasks at all from within the same handler (i.e. I can't kickoff a "continue from where you left off" task, I just have to let this one fail and restart itself)
So, my question is, what is actually causing this error? How can I avoid it, or fix this so that I'm handling it correctly?
This is a known issue that is being worked on. There are actually two issues - the RPC failure itself and the lack of handling of the RPCFailedError exception by the SDK.
There is some public discussion of the issue here.
If you're using App Engine Flexible and the python-compat-multicore image, a new bug popped up related to App Engine using a newer version of the requests library that broke the communication between App Engine Flexible and the datastore. You can fix this error by monkey patching the library in your appengine_config.py file.
Add the following code to appengine_config.py:
try:
import appengine.ext.vmruntime.vmstub as vmstub
except ImportError:
pass
else:
if isinstance(vmstub.DEFAULT_TIMEOUT, (int, long)):
# Newer requests libraries do not accept integers as header values.
# Be sure to convert the header value before sending.
# See Support Case ID 11235929.
vmstub.DEFAULT_TIMEOUT = bytes(vmstub.DEFAULT_TIMEOUT)
Note that if you do not have an appengine_config.py file, you can just create it in your base project directory (wherever you put your app.yaml file). This file gets run during App Engine startup..

Using libssh2 scp to retrieve a file fails to authenticate when scp from cmd works

I'm trying to retrieve a file from an instance using libssh2 scp.
Just to make sure that my username, password, and keys are correct, I did:
sudo scp -v -P #port -i /home/username/.ssh/id_rsa username#XX.XX.XX.XX:/home/username/file .
Which asked me for the password, and then retrieved the file successfully.
In trying to accomplish the same thing with libssh2, I followed the example here:
http://www.libssh2.org/examples/scp.html
With superficial changes to variable types that seem to have since changed
(Not that it should matter, as those variables come after authentication).
However, on
libssh2_userauth_publickey_fromfile(session, username,"/home/username/.ssh/id_rsa.pub","/home/username/.ssh/id_rsa",password)
The program always exits with a LIBSSH2_ERROR_PUBLICKEY_UNVERIFIED.
Checking using gdb, I'm certain that the username and passwords being applied are correct.
What reasons might there be that are causing this problem?
Edit:
Further delving with GDB reveals that somewhere in the depth of libssh2_userauth_publickey_fromfile(), in _libssh2_userauth_publickey(session, username, username_len, pubkeydata, pubkeydata_len, sign_callback, abstract), it receives a LIBSSH2_ERROR_SOCKET_RECV.
The code behind that, however, is much too enigmatic for my untrained eye to make sense of.
One obvious thing I've missed is the error message, which comes out to be "Waiting for USERAUTH response"
Potentially relevant:
https://github.com/nodegit/nodegit/issues/553
After following what little advice I could gather from above link and removing a few keys from authorized_keys, the error remains the same but the message changed to "Callback returned error". Not sure if improvement or worse.
Checking server-side logs, I find the following:
Oct 20 06:53:51 testbed1 sshd[25837]: error: Could not load host key: /etc/ssh/keyname
Oct 20 06:53:52 testbed1 sshd[25837]: Connection closed by XX.XX.XX.XX [preauth]
Oct 20 06:54:48 testbed1 sshd[25839]: error: Could not load host key: /etc/ssh/keyname
Oct 20 06:54:51 testbed1 sshd[25839]: Accepted publickey for username from...
The first two lines are on a failed attempted from libssh2.
The next two lines are on a successful attempt from scp on commandline.
I'm still not absolutely sure what is the cause.
I can only speculate that I had fallen into the bug described here:
cURL sftp public key authentication fails "Callback Error"
The code ran fine on a key without passphrase.
I still have a hard time saying this is the exact solution, as when I had just started using libssh2, it ran fine with keys with passphrase.
Still, it "works" now.

Resources