Preventing firebase Out of memory exception - database

Aside from sharding your real-time database & setting largeHeap:true on the Manifest file, how else do you prevent an Out of memory exception on the client-side.
Assume you have 100k+ users in your customers node & for each user in order to retrieve their data, you have something like:
firebaseDatabase.getReference("customers")
.child(userId) //The culprit causing the error, since it has to read the entire node
.get()
How would you tackle this issue?

Related

Azure Search SDK for Blob Storage - Deleting Files

I have created an application that lists all the documents in an Azure storage container, and lets the user mark specific files to delete.
This is an Azure Search application, so the process is to add a "deleted" metadata property to the selected files, run the indexer to remove that information from the index, and then physically delete the files.
Here's the code for that process:
serviceClient.Indexers.Run(documentIndexer);
var status = serviceClient.Indexers.GetStatus(documentIndexer).LastResult.Status;
// Loop until the indexer is done
while (status == IndexerExecutionStatus.InProgress)
{
status = serviceClient.Indexers.GetStatus(documentIndexer).LastResult.Status;
}
// If successful, delete the flagged files
if (status == IndexerExecutionStatus.Success)
{
DeleteFlagged();
}
Everything works fine, but only if I put a breakpoint on the DeleteFlagged() call, effectively forcing a delay between running the indexer and deleting the files.
Without the pause, the indexer comes back as successful, and I delete the files, but the file contents haven't been removed from the index - they still show up in search results (the files have been physically deleted).
Is there something else I need to check before deleting?
When you Run an indexer, it doesn't instantly transition into InProgress state - in fact, depending on how many indexers are running in your service, there may be a significant delay before the indexer is scheduled to run. So, when you call GetStatus before the loop, the indexer may not be InProgress yet, and you end up deleting blobs too early.
A more reliable approach would be to wait for the indexer to complete this particular run (e.g., by looking at the LastResult's StartTime/EndTime).

How can I prevent accidentally overwriting an already existing database?

I'm adding BaseX to an existing web application and currently writing code to import data into it. The documentation is crystal-clear that
An existing database will be overwritten.
Finding this behavior mindboggingly dangerous, I tried it with the hope that the documentation was wrong but unfortunately my test confirmed it. For instance, using basexclient I can do this:
> create db test
Database 'test' created in 12.03 ms.
> create db test
Database 'test' created in 32.43 ms.
>
I can also replicate this behavior with the Python client, which is I what I'm actually using for my application. Reducing my code to the essentials:
session = BaseXClient.Session("127.0.0.1", 1984, "admin", "admin")
session.create("test", "")
It does not matter whether test exists or not, the whole thing is overwritten if it exists.
How can I work around this dangerous default behavior? I'd would like to prevent the possibility of missteps in production.
You can issue a list command before you create your database. For instance with the command line client if the database does not exist:
> list foo
Database 'foo' was not found.
Whereas if the database exists:
> list test
Input Path Type Content-Type Size
------------------------------------
This is a database that is empty so it does not show any contents but at least you do not get the error message. When you use a client you have to check whether it errors out or not. With the Python client you could do:
def exists(session, db):
try:
session.execute("list " + db)
except IOError as ex:
if ex.message == "Database '{0}' was not found.".format(db):
return False
raise
return True
The client raises IOError if the server raises an error, which is a very generic way to report a problem. So you have to test the error message to figure out what is going on. We reraise if it happens that the error message is not the one which pertains to our test. This way we don't swallow exceptions caused by unrelated issues.
With that function you could do:
session = BaseXClient.Session("127.0.0.1", 1984, "admin", "admin")
if exists(session, "test"):
raise SomeRelevantException("Oi! You are about to overwrite your database!")
session.create("test", "")

Why does WebApp2 auth.get_user_by_session() change the token?

I am using WebApp2 with auth for user sessions. My client will occasionally make nearly simultaneous requests to the server. The first one will make a request with session data that looks like this:
{
'cache_ts': 1408106895,
'token': u'GXpsaVQh5ZWtqxJMUBpGTr',
'user_id': 5690665774088192L,
'remember': 1,
'token_ts': 1408034938
}
Then after a call to auth.get_user_by_session(), the session comes back like this:
{
'cache_ts': 1408124980,
'token': u'0IVduczdGR5PkrMqNhBvzW',
'user_id': 5690665774088192L,
'remember': 1,
'token_ts': 1408124980
}
As you can see the token has been changed, and the timestamps updated.
Nearly simutaneously, another request is made that contains the same initial session data.
{
'cache_ts': 1408106895,
'token': u'GXpsaVQh5ZWtqxJMUBpGTr',
'user_id': 5690665774088192L,
'remember': 1,
'token_ts': 1408034938
}
However, that token is now invalid, so the session data is set to None. This wipes the users session, and causes lots of problems. Is there some setting I should be using to extend the life of the UserToken? Is there a more appropriate method than get_user_by_session()? I woud imagine that nearly simultaneous requests with the same session data shouldn't cause enormous issues. The ideal situation would be that if auth received invalid or expired tokens it would just ignore them, and throw an error.
Update 1
Hoped it was something simple like passing False to get_user_by_session(). That of course killed the session immediately.
Update 2
I've found that I only really need the user_id field, and that comes for free with the cookie data. Implementing that reduces the frequency of the issue. However the problem isn't actually fixed, and I'd love some input from anyone with familiarity of this library.
This is due to token_new_age parameter which defaults to 1 day so... every 24h the token will change.
This is a security measure because if someone hacks that session it will only work for 24h.
Parameter 'token_max_age' will also delete the token when time is consumed.

Ldap query only returning 1000 users... yes I am using paging

I have a simple GetStaff function that should retrieve all users from active directory. We have over a 1000 users so the directory searcher is using paging because the default for the AD MaxPageSize is 1000.
Currently the search works 'sometimes' when I build and sends back all 1054 users, and other times it only sends back 1000. If it works once, it works all the time. If it fails once, it fails all the time. I have set everything in using statements to make sure the objects are destroyed, but it still doesn't always seem to respect the PageSize attribute. By default if the PageSize attribute is set, the searcher should use a SizeLimit of 0. I have tried leaving the size limit out, setting it to 0, and setting it to 100000 and the unstable result is the same. I have also tried lowering the PageSize to 250 and get the same unstable results. Currently I am trying changing the ldap policy on the server to have a MaxPageSize of 10000 and I am still receiving 1000 users with the search PageSize to 10000 also. Not sure what I am missing here, but any help or direction would be appreciated.
public IEnumerable<StaffInfo> GetStaff(string userId)
{
try
{
var userList = new List<StaffInfo>();
using (var directoryEntry = new DirectoryEntry("LDAP://" + _adPath + _adContainer, _quarcAdminUserName, _quarcAdminPassword))
{
using (var de = new DirectorySearcher(directoryEntry)
{
Filter = GetDirectorySearcherFilter(LdapFilterOptions.AllUsers),
PageSize = 1000,
SizeLimit = 0
})
{
foreach (SearchResult sr in de.FindAll())
{
try
{
var userObj = sr.GetDirectoryEntry();
var staffInfo = new StaffInfo(userObj);
userList.Add(staffInfo);
}
catch (Exception ex)
{
Log.Error("AD Search result loop Error", ex);
}
}
}
}
return userList;
}
catch (Exception ex)
{
Log.Error("AD get staff try Error", ex);
return Enumerable.Empty<StaffInfo>();
}
}
A friend got back to me with the below response that helped me out, so I thought I would share it and hope it helps anyone else with the same issue.
The first thing I think of is "Are you using the domain name, e.g. foo.com as the _adpath?"
If so, then I have a pretty good idea. A dns query for Foo.com will return a random list of all of up to 25 DCs in the domain. If the first DC in that random list is not responsive or firewalled off and you get that DC from DNS then you will experience the behavior you describe. Since the DNS is cached on the local machine, you will see it happen consistently one day, then not do it the next. That's infuriating behavior. :/
You can verify this with a network trace to see if this is happening.
So how do you workaround it? A couple of options.
Query DNS -> create a lists of hosts returned -> Try the first one. If it fails, Try the next one. If you hit the bottom of the list, Fail. If you do this, log each independent failure noisily so the admins don't blame you.
Even better would be to ask the AD administrators for a list of ldap servers and use that with the approach described above.
80% of administrators will tell you just to use the domain name. This is good because that deploying a new domain will "just work" with no reconfiguration required.
15% of administrators will want to specify a couple of DCs that are network closest to the application. This is good for performance, but bad if they forget about this application when the time comes for them to upgrade their domain.
The other 5% doesn't really matter. :)
The next point that I see is that you are using LDAP, not LDAPs. That is fine, but there is a risk that you will use "Basic" binds. With "Basic" binds, joe hacker can steal your account credentials using a network sniffer. There are a couple of possible workarounds.
1. There is another DirectoryEntry constructor that will let you specify "Secure" as the auth method.
2. Ask your admins if you can use LdapS. (more portable, in case you need to talk to an LDAP server other than Active Directory)
The last piece is regarding Page Size. 1,000 should be fine universally. Don't use any value > 5,000 or you can expect some fidgety behaviors. i.e. This is higher than the default limit under Windows 2003, and in Windows 2008 the pagesize is hardcoded limited to 5,000 unless it's been overridden using a rather obscure bit in AD called dsHeuristics. http://support.microsoft.com/kb/2009267
LDAP is configured, by default, to only return a maximum of 1000. You can change this setting on the domain your requesting from.

Manage Website Configuration via CMS

I've read questions on Stack Overflow very similar to this question, but not quite the same.
Let's say that I had the following config.inc.php file included on every page of my website:
<?php
$site_name = 'Acme Inc.';
$authenticate_with_ldap = true;
$ldap_host = 'ldap.example.com';
$ldap_port = 389;
$ldap_rdn = 'ldap-user';
$ldap_password = 'ldap-pass';
$ldap_dn = 'ou=example,dc=example,dc=com';
$smtp_username = 'smtp-user';
$smtp_password = 'smtp-pass';
$recaptcha_publickey = 'my-recaptcha-publickey';
$recaptcha_privatekey = 'my-recaptcha-privatekey';
?>
Note: I have chosen to keep the website configuration in a file instead of the database because the information is used all over the website and it would be a lot more code and, I'm guessing, a lot more overhead to have to query the database for the same information all the time.
Now let's say that the website administrator is the type of person who would prefer to edit the above information using a CMS as opposed to going in and editing the file manually. My fear is that when the website administrator clicks the "Update" button and the PHP script gets to the file_put_contents function that overwrites the config.inc.php file, something could go wrong and either corrupt the file or make it unusable due to a syntax error or something.
Is this a reasonable concern? Should I tell the website administrator that he should just tough it out and edit the file manually? Should I store the information in the database instead? Or should I store the information in both places so that if the file gets messed up, it can be regenerated using the information in the database?
If you store that info in the DB as a single row of data, wouldn't it be cached anyway?

Resources