I have an mlflow server running locally and being exposed at port 80. I also have a model in the mlflow registry and I want to deploy it using the mlflow sagemaker run-local because after testing this locally, I am going to deploy everything to AWS and Sagemaker. My problem is that when I run:
export MODEL_PATH=models:/churn-lgb-test/2
export LOCAL_PORT=8000
mlflow sagemaker run-local -m $MODEL_PATH -p $LOCAL_PORT -f python_function -i splicemachine/mlflow-pyfunc:1.6.0
it starts the container and I immediately get this error:
2020-07-27 13:02:13 +0000] [827] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 162, in init_process
super().init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
self.load_wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
self.wsgi = self.app.wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
return self.load_wsgiapp()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
return util.import_app(self.app_uri)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/miniconda/envs/custom_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/models/container/scoring_server/wsgi.py", line 3, in <module>
app = scoring_server.init(pyfunc.load_model("/opt/ml/model/"))
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/pyfunc/__init__.py", line 292, in load_model
return importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 219, in _load_pyfunc
return _load_model_from_local_file(path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 206, in _load_model_from_local_file
with open(path, "rb") as f:
IsADirectoryError: [Errno 21] Is a directory: '/opt/ml/model'
[2020-07-27 13:02:13 +0000] [828] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 162, in init_process
super().init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
self.load_wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
self.wsgi = self.app.wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
return self.load_wsgiapp()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
return util.import_app(self.app_uri)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/miniconda/envs/custom_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/models/container/scoring_server/wsgi.py", line 3, in <module>
app = scoring_server.init(pyfunc.load_model("/opt/ml/model/"))
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/pyfunc/__init__.py", line 292, in load_model
return importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 219, in _load_pyfunc
return _load_model_from_local_file(path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 206, in _load_model_from_local_file
with open(path, "rb") as f:
IsADirectoryError: [Errno 21] Is a directory: '/opt/ml/model'
[2020-07-27 13:02:13 +0000] [828] [INFO] Worker exiting (pid: 828)
[2020-07-27 13:02:13 +0000] [827] [INFO] Worker exiting (pid: 827)
[2020-07-27 13:02:13 +0000] [829] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 162, in init_process
super().init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
self.load_wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
self.wsgi = self.app.wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
return self.load_wsgiapp()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
return util.import_app(self.app_uri)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/miniconda/envs/custom_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/models/container/scoring_server/wsgi.py", line 3, in <module>
app = scoring_server.init(pyfunc.load_model("/opt/ml/model/"))
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/pyfunc/__init__.py", line 292, in load_model
return importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 219, in _load_pyfunc
return _load_model_from_local_file(path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 206, in _load_model_from_local_file
with open(path, "rb") as f:
IsADirectoryError: [Errno 21] Is a directory: '/opt/ml/model'
[2020-07-27 13:02:13 +0000] [829] [INFO] Worker exiting (pid: 829)
[2020-07-27 13:02:13 +0000] [830] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 162, in init_process
super().init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
self.load_wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
self.wsgi = self.app.wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
return self.load_wsgiapp()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
return util.import_app(self.app_uri)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/miniconda/envs/custom_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/models/container/scoring_server/wsgi.py", line 3, in <module>
app = scoring_server.init(pyfunc.load_model("/opt/ml/model/"))
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/pyfunc/__init__.py", line 292, in load_model
return importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 219, in _load_pyfunc
return _load_model_from_local_file(path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 206, in _load_model_from_local_file
with open(path, "rb") as f:
IsADirectoryError: [Errno 21] Is a directory: '/opt/ml/model'
[2020-07-27 13:02:13 +0000] [830] [INFO] Worker exiting (pid: 830)
Traceback (most recent call last):
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 209, in run
self.sleep()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 357, in sleep
ready = select.select([self.PIPE[0]], [], [], 1.0)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 242, in handle_chld
self.reap_workers()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 525, in reap_workers
raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/miniconda/envs/custom_env/bin/gunicorn", line 8, in <module>
sys.exit(run())
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 58, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/base.py", line 228, in run
super().run()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 229, in run
self.halt(reason=inst.reason, exit_status=inst.exit_status)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 342, in halt
self.stop()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 393, in stop
time.sleep(0.1)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 242, in handle_chld
self.reap_workers()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 525, in reap_workers
raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
creating and activating custom environment
Got sigterm signal, exiting.
[2020-07-27 13:02:13 +0000] [831] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 162, in init_process
super().init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
self.load_wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
self.wsgi = self.app.wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
return self.load_wsgiapp()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
return util.import_app(self.app_uri)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/miniconda/envs/custom_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/models/container/scoring_server/wsgi.py", line 3, in <module>
app = scoring_server.init(pyfunc.load_model("/opt/ml/model/"))
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/pyfunc/__init__.py", line 292, in load_model
return importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 219, in _load_pyfunc
return _load_model_from_local_file(path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 206, in _load_model_from_local_file
with open(path, "rb") as f:
IsADirectoryError: [Errno 21] Is a directory: '/opt/ml/model'
[2020-07-27 13:02:13 +0000] [831] [INFO] Worker exiting (pid: 831)
[2020-07-27 13:02:14 +0000] [833] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 162, in init_process
super().init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
self.load_wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
self.wsgi = self.app.wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
return self.load_wsgiapp()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
return util.import_app(self.app_uri)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/miniconda/envs/custom_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/models/container/scoring_server/wsgi.py", line 3, in <module>
app = scoring_server.init(pyfunc.load_model("/opt/ml/model/"))
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/pyfunc/__init__.py", line 292, in load_model
return importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 219, in _load_pyfunc
return _load_model_from_local_file(path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 206, in _load_model_from_local_file
with open(path, "rb") as f:
IsADirectoryError: [Errno 21] Is a directory: '/opt/ml/model'
[2020-07-27 13:02:14 +0000] [833] [INFO] Worker exiting (pid: 833)
[2020-07-27 13:02:14 +0000] [832] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 162, in init_process
super().init_process()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
self.load_wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
self.wsgi = self.app.wsgi()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
return self.load_wsgiapp()
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
return util.import_app(self.app_uri)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/miniconda/envs/custom_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/models/container/scoring_server/wsgi.py", line 3, in <module>
app = scoring_server.init(pyfunc.load_model("/opt/ml/model/"))
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/pyfunc/__init__.py", line 292, in load_model
return importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 219, in _load_pyfunc
return _load_model_from_local_file(path)
File "/miniconda/envs/custom_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 206, in _load_model_from_local_file
with open(path, "rb") as f:
IsADirectoryError: [Errno 21] Is a directory: '/opt/ml/model'
[2020-07-27 13:02:14 +0000] [832] [INFO] Worker exiting (pid: 832)
It looks as though you might be hitting a bug in the MLFlow scoring server. The code is expecting to see a model file at the path /opt/ml/model, but instead is seeing a directory. I would submit this as an issue to https://github.com/mlflow/mlflow/issues and also add a code snippet of how you're saving your model.
Or... have you considered deploying your scikit-learn model to AWS Lambda instead of SageMaker? It might be a better fit, more cost-efficient, and easier than trying to fix this MLFlow bug if your scikit-learn model is <50 MB. I'm the author of an open-source package that takes this approach https://github.com/model-zoo/scikit-learn-lambda and related platform https://modelzoo.dev -- if you're interested in testing it, reach out at contact#modelzoo.dev.
Related
I am implementing a pearson hash in order to create a lightweight dictionary structure for a C project which requires a table of files names paired with file data - I want the nice constant search property of hash tables. I'm no math expert so I looked up good text hashes and pearson came up, with it being claimed to be effective and having a good distribution. I tested my implementation and found that no matter how I vary the table size or the filename max length, the hash is very inefficient, with for example 18/50 buckets being left empty. I trust wikipedia to not be lying, and yes I am aware I can just download a third party hash table implementation, but I would dearly like to know why my version isn't working.
In the following code, (a function to insert values into the table), "csString" is the filename, the string to be hashed, "cLen" is the length of the string, "pData" is a pointer to some data which is inserted into the table, and "pTable" is the table struct. The initial condition cHash = cLen - csString[0] is somethin I experimentally found to marginally improve uniformity. I should add that I am testing the table with entirely randomised strings (using rand() to generate ascii values) with randomised length between a certain range - this is in order to easily generate and test large amounts of values.
typedef struct StaticStrTable {
unsigned int nRepeats;
unsigned char nBuckets;
unsigned char nMaxCollisions;
void** pBuckets;
} StaticStrTable;
static const char cPerm256[256] = {
227, 117, 238, 33, 25, 165, 107, 226, 132, 88, 84, 68, 217, 237, 228, 58, 52, 147, 46, 197, 191, 119, 211, 0, 218, 139, 196, 153, 170, 77, 175, 22, 193, 83, 66, 182, 151, 99, 11, 144, 104, 233, 166, 34, 177, 14, 194, 51, 30, 121, 102, 49,
222, 210, 199, 122, 235, 72, 13, 156, 38, 145, 137, 78, 65, 176, 94, 163, 95, 59, 92, 114, 243, 204, 224, 43, 185, 168, 244, 203, 28, 124, 248, 105, 10, 87, 115, 161, 138, 223, 108, 192, 6, 186, 101, 16, 39, 134, 123, 200, 190, 195, 178,
164, 9, 251, 245, 73, 162, 71, 7, 239, 62, 69, 209, 159, 3, 45, 247, 19, 174, 149, 61, 57, 146, 234, 189, 15, 202, 89, 111, 207, 31, 127, 215, 198, 231, 4, 181, 154, 64, 125, 24, 93, 152, 37, 116, 160, 113, 169, 255, 44, 36, 70, 225, 79,
250, 12, 229, 230, 76, 167, 118, 232, 142, 212, 98, 82, 252, 130, 23, 29, 236, 86, 240, 32, 90, 67, 126, 8, 133, 85, 20, 63, 47, 150, 135, 100, 103, 173, 184, 48, 143, 42, 54, 129, 242, 18, 187, 106, 254, 53, 120, 205, 155, 216, 219, 172,
21, 253, 5, 221, 40, 27, 2, 179, 74, 17, 55, 183, 56, 50, 110, 201, 109, 249, 128, 112, 75, 220, 214, 140, 246, 213, 136, 148, 97, 35, 241, 60, 188, 180, 206, 80, 91, 96, 157, 81, 171, 141, 131, 158, 1, 208, 26, 41
};
void InsertStaticStrTable(char* csString, unsigned char cLen, void* pData, StaticStrTable* pTable) {
unsigned char cHash = cLen - csString[0];
for (int i = 0; i < cLen; ++i) cHash ^= cPerm256[cHash ^ csString[i]];
unsigned short cTableIndex = cHash % pTable->nBuckets;
long long* pBucket = pTable->pBuckets[cTableIndex];
// Inserts data and records how many collisions there are - it may look weird as the way in which I decided to pack the data into the table buffer is very compact and arbitrary
// It won't affect the hash though, which is the key issue!
for (int i = 0; i < pTable->nMaxCollisions; ++i) {
if (i == 1) {
pTable->nRepeats++;
}
long long* pSlotID = pBucket + (i << 1);
if (pSlotID[0] == 0) {
pSlotID[0] = csString;
pSlotID[1] = pData;
break;
}
}
}
FYI (This is not an answer, I just need the formatting)
These are just single runs from a simulation, YMMV.
distributing 50 elements randomly over 50 bins:
kalender_size=50 nperson = 50
E/cell| Ncell | frac | Nelem | frac |h/cell| hops | Cumhops
----+---------+--------+----------+--------+------+--------+--------
0: 18 (0.360000) 0 (0.000000) 0 0 0
1: 18 (0.360000) 18 (0.360000) 1 18 18
2: 10 (0.200000) 20 (0.400000) 3 30 48
3: 4 (0.080000) 12 (0.240000) 6 24 72
----+---------+--------+----------+--------+------+--------+--------
4: 50 50 1.440000 72
Similarly: distribute 365 persons over a birthday-calendar (ignoring leap days ...):
kalender_size=356 nperson = 356
E/cell| Ncell | frac | Nelem | frac |h/cell| hops | Cumhops
----+---------+--------+----------+--------+------+--------+--------
0: 129 (0.362360) 0 (0.000000) 0 0 0
1: 132 (0.370787) 132 (0.370787) 1 132 132
2: 69 (0.193820) 138 (0.387640) 3 207 339
3: 19 (0.053371) 57 (0.160112) 6 114 453
4: 6 (0.016854) 24 (0.067416) 10 60 513
5: 1 (0.002809) 5 (0.014045) 15 15 528
----+---------+--------+----------+--------+------+--------+--------
6: 356 356 1.483146 528
For N items over N slots, the expectation for the number of empty slots and the number of slots with a single item in them is equal. The expected density is 1/e for both.
The final number (1.483146) is the number of ->next pointer traversels per found element (when using a chained hash table) Any optimal hash function will almost reach 1.5.
I have too large const, but masm don't compile my source code. How can I fix it?
C25 byte 51, 135, 173, 160, 231, 165, 173, 168, 165, 32, 162, 235, 224, 160, 166, 165, 173, 168, 239, 32, 115, 117, 109, 91, 49, 48, 48, 48, 44, 32, 49, 48, 48, 48, 93, 32, 109, 111, 100, 32, 49, 53, 48, 48, 32, 100, 105, 118, 32, 51, 58, 32
error image
You can put at most 48 elements per line. So split the line into two or more lines that each contains 48 elements or less, e.g.:
C25 byte 51, 135, 173, 160, 231, 165, 173, 168, 165, 32, 162, 235, 224, 160, 166, 165, 173, 168, 239, 32, 115, 117, 109, 91, 49
byte 48, 48, 48, 44, 32, 49, 48, 48, 48, 93, 32, 109, 111, 100, 32, 49, 53, 48, 48, 32, 100, 105, 118, 32, 51, 58, 32
The end result I'm looking for is to implement T-SQL CHECKSUM in BigQuery with a JavaScript UDF. I would settle for having the C/C++ source code to translate but if someone has already done this work then I'd love to use it.
Alternatively, if someone can think of a way to create an equivalent hash code between strings stored in Microsoft SQL Server compared to those in BigQuery then that would help me too.
UPDATE: I've found some source code through HABO's link in the comments which is written in T-SQL to perform the same CHECKSUM but I'm having difficulty converting it to JavaScript which inherently cannot handle 64bit integers. I'm playing with some small examples and have found that the algorithm works on the low nibble of each byte only.
UPDATE 2: I got really curious about replicating this algorithm and I can see some definite patterns but my brain isn't up to the task of distilling that into a reverse engineered solution. I did find that BINARY_CHECKSUM() and CHECKSUM() return different things so the work done on the former didn't help me with the latter.
I spent the day reverse engineering this by first dumping all results for single ASCII characters as well as pairs. This showed that each character has its own distinct "XOR code" and letters have the same one regardless of case. The algorithm was remarkably simple to figure out after that: rotate 4 bits left and xor by the code stored in a lookup table.
var xorcodes = [
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
0, 33, 34, 35, 36, 37, 38, 39, // !"#$%&'
40, 41, 42, 43, 44, 45, 46, 47, // ()*+,-./
132, 133, 134, 135, 136, 137, 138, 139, // 01234567
140, 141, 48, 49, 50, 51, 52, 53, 54, // 89:;<=>?#
142, 143, 144, 145, 146, 147, 148, 149, // ABCDEFGH
150, 151, 152, 153, 154, 155, 156, 157, // IJKLMNOP
158, 159, 160, 161, 162, 163, 164, 165, // QRSTUVWX
166, 167, 55, 56, 57, 58, 59, 60, // YZ[\]^_`
142, 143, 144, 145, 146, 147, 148, 149, // abcdefgh
150, 151, 152, 153, 154, 155, 156, 157, // ijklmnop
158, 159, 160, 161, 162, 163, 164, 165, // qrstuvwx
166, 167, 61, 62, 63, 64, 65, 66, // yz{|}~
];
function rol(x, n) {
// simulate a rotate shift left (>>> preserves the sign bit)
return (x<<n) | (x>>>(32-n));
}
function checksum(s) {
var checksum = 0;
for (var i = 0; i < s.length; i++) {
checksum = rol(checksum, 4);
var c = s.charCodeAt(i);
var xorcode = 0;
if (c < xorcodes.length) {
xorcode = xorcodes[c];
}
checksum ^= xorcode;
}
return checksum;
};
See https://github.com/neilodonuts/tsql-checksum-javascript for more info.
DISCLAIMER: I've only worked on compatibility with VARCHAR strings in SQL Server with collation set to SQL_Latin1_General_CP1_CI_AS. This won't work with multiple columns or integers but I'm sure the underlying algorithm uses the same codes so it wouldn't be hard to figure out. It also seems to differ from db<>fiddle possibly due to collation: https://github.com/neilodonuts/tsql-checksum-javascript/blob/master/data/dbfiddle-differences.png ... mileage may vary!
fyi, for those of you who are stuck in T-SQL legacy mode, here's a C# implementation that was tested and looks good for most strings/ints that I've been working with:
public static int[] xorcodes = {
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
0, 33, 34, 35, 36, 37, 38, 39, // !"#$%&'
40, 41, 42, 43, 44, 45, 46, 47, // ()*+,-./
132, 133, 134, 135, 136, 137, 138, 139, // 01234567
140, 141, 48, 49, 50, 51, 52, 53, 54, // 89:;<=>?#
142, 143, 144, 145, 146, 147, 148, 149, // ABCDEFGH
150, 151, 152, 153, 154, 155, 156, 157, // IJKLMNOP
158, 159, 160, 161, 162, 163, 164, 165, // QRSTUVWX
166, 167, 55, 56, 57, 58, 59, 60, // YZ[\]^_`
142, 143, 144, 145, 146, 147, 148, 149, // abcdefgh
150, 151, 152, 153, 154, 155, 156, 157, // ijklmnop
158, 159, 160, 161, 162, 163, 164, 165, // qrstuvwx
166, 167, 61, 62, 63, 64, 65, 66, // yz{|}~
};
public static int rol(int x, int n) {
// simulate a rotate shift left (>>> preserves the sign bit)
return ((int)x << n) | ((int)((uint)x >> (32 - n)));
}
public static int checksum(string s) {
int checksum = 0;
for (var i = 0; i < s.Length; i++) {
checksum = rol(checksum, 4);
var c = ((int)s[i]);
int xorcode = 0;
if (c < xorcodes.Length) {
xorcode = xorcodes[c];
}
checksum ^= xorcode;
}
return checksum;
}
I open raf, get file channel, accumulate some data in buffers then
channel.position(raf.length)
channel.write(buffers)
channel.close
raf.close
I expect that bytes are written into raf at these offsets, which are raf.length + accumulated buffer sizes in the array
0,62,132,195,259,322,392,455,519,589,652,716,779,842,905,968,1031,1093,1155,1225,1287,1350,1414,1477,1541,1611,1674,1737,1801,1863,1927,1989,2059,2123,2193,2256,2319,2382,2452,2516,2586,2648,2711,2774,2837,2900,2962,3025,3089,3152,3216,3286,3348,3412,3482,3544,3614,3684,3754,3818,3882,3952,4022,4092
and ProcMon displays that this is what starts to happen at OS level
"CreateFile","objects.bin","SUCCESS","Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened"
"QueryNetworkOpenInformationFile","objects.bin","SUCCESS","CreationTime: 6.12.2015 20:11:21, LastAccessTime: 8.12.2015 11:03:49, LastWriteTime: 8.12.2015 11:03:49, ChangeTime: 8.12.2015 11:03:49, AllocationSize: 1.01.1601 2:00:00, EndOfFile: 1.01.1601 2:00:00, FileAttributes: ANCI"
"CloseFile","objects.bin","SUCCESS",""
"CreateFile","objects.bin","SUCCESS","Desired Access: Read Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened"
"QueryAttributeTagFile","objects.bin","SUCCESS","Attributes: ANCI, ReparseTag: 0x0"
"SetDispositionInformationFile","objects.bin","SUCCESS","Delete: True"
"CloseFile","objects.bin","SUCCESS",""
"CreateFile","objects.bin","SUCCESS","Desired Access: Generic Read/Write, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created"
"QueryStandardInformationFile","objects.bin","SUCCESS","AllocationSize: 0, EndOfFile: 0, NumberOfLinks: 1, DeletePending: False, Directory: False"
"WriteFile","objects.bin","SUCCESS","Offset: 0, Length: 62, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 62, Length: 70, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 132, Length: 63, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 195, Length: 64, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 259, Length: 63, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 322, Length: 70, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 392, Length: 63, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 455, Length: 64, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 519, Length: 70, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 589, Length: 63, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 652, Length: 64, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 716, Length: 63, Priority: Normal"
"WriteFile","objects.bin","SUCCESS","Offset: 779, Length: 63"
"WriteFile","objects.bin","SUCCESS","Offset: 842, Length: 63"
"WriteFile","objects.bin","SUCCESS","Offset: 905, Length: 63"
"WriteFile","objects.bin","SUCCESS","Offset: 968, Length: 63"
"ReadFile","objects.bin","END OF FILE","Offset: 2 452, Length: 2"
"QueryStandardInformationFile","objects.bin","SUCCESS","AllocationSize: 4 096, EndOfFile: 1 031, NumberOfLinks: 1, DeletePending: False, Directory: False"
"WriteFile","objects.bin","SUCCESS","Offset: 1 031, Length: 64"
"WriteFile","objects.bin","SUCCESS","Offset: 1 095, Length: 70"
"WriteFile","objects.bin","SUCCESS","Offset: 1 165, Length: 70"
"WriteFile","objects.bin","SUCCESS","Offset: 1 235, Length: 64"
"WriteFile","objects.bin","SUCCESS","Offset: 1 299, Length: 63"
"WriteFile","objects.bin","SUCCESS","Offset: 1 362, Length: 64"
"WriteFile","objects.bin","SUCCESS","Offset: 1 426, Length: 63"
"WriteFile","objects.bin","SUCCESS","Offset: 1 489, Length: 63"
"WriteFile","objects.bin","SUCCESS","Offset: 1 552, Length: 63"
"WriteFile","objects.bin","SUCCESS","Offset: 1 615, Length: 64"
"WriteFile","objects.bin","SUCCESS","Offset: 1 679, Length: 70"
"WriteFile","objects.bin","SUCCESS","Offset: 1 749, Length: 64"
"WriteFile","objects.bin","SUCCESS","Offset: 1 813, Length: 63"
"WriteFile","objects.bin","SUCCESS","Offset: 1 876, Length: 63"
"WriteFile","objects.bin","SUCCESS","Offset: 1 939, Length: 63"
"WriteFile","objects.bin","SUCCESS","Offset: 2 002, Length: 63"
"CloseFile","objects.bin","SUCCESS",""
As you see, it writes 16 buffers at offsets 0, 62, 132, ... up to 968 and closes the file! Next it opens it again and proceeds at offsets 1031,1095,1165, which come from another session, where I do the same: open raf and fc, and write another series of buffers, which breaks after 16 buffers written again. The second series starts at 1031 where file was closed and where it was a location for 17th buffer from first write. I see nothing about 16 buffer limitation in the jdocs.
Fortunately, fileChannel.force does not help to cure this situation because I do not need my data physically on disk unless system shuts down. I am happy with data flushed into OS cache at fc.write time or when JVM closes down so that it is available next time that I open the file in the same system.
Here is the program to reproduce the bug
object FileChannelFailureDemo extends App {
import java.nio._, java.io._, java.nio.channels._
val raf = new RandomAccessFile("""objects2""", "rw") ; val fc = raf.getChannel
val range = (1 to 20)
val ba = range.map (i => ByteBuffer.wrap(Array.ofDim[Byte](i)))
fc.write(ba.toArray) //> res0: Long = 136
val expectedLength = range.foldLeft(0){case (acc, i) => acc + i}
//> expectedLength : Int = 210
// 1+2+..+20 = 20*21/2 = 210 // epxected len
// 1+2+..+16 = 16*17/2 = 136 // len of file if only 16 buffers written
// assertion here, file size 136 != 210
assert(raf.length == expectedLength , raf.length + " != " + expectedLength)
//> java.lang.AssertionError: assertion failed: 136 != 210
//| at scala.Predef$.assert(Predef.scala:165)
println("succeeded, file size is " + raf.length)
fc.close; raf.close
}
Curiously, but it is not reproduced in that remote machine. I also do not see it if run this short program from console but this code fails assertion executed from Worksheet. It seems to depend on the program state. It can fail or succeed with the same JRE.
Here is the output I am working with:
Pool Name: Pool 2
Pool ID: 1
LUNs: 1015, 1080, 1034, 1016, 500, 1002, 1062, 1041, 1046, 1028, 1009, 1054, 513, 1058, 1070, 515, 1049, 1083, 1020, 1076, 19, 509, 1057, 1021, 525, 1019, 518, 1075, 29, 23, 1068, 37, 1064, 506, 1024, 1026, 1008, 1087, 1012, 1006, 1018, 502, 1004, 1074, 1030, 1032, 39, 1014, 1005, 1056, 1044, 2, 1033, 1001, 16, 1061, 1040, 1045, 1027, 26, 1023, 1053, 1037, 1079, 512, 520, 1069, 1039, 514, 1048, 1082, 523, 508, 524, 517, 522, 1066, 1089, 1067, 529, 528, 1063, 505, 1081, 527, 1007, 1086, 1051, 1011, 1035, 1017, 501, 1003, 1042, 1073, 1085, 1029, 1010, 24, 1013, 1055, 1043, 1059, 52, 1071, 516, 1050, 1084, 1000, 1077, 1060, 1072, 510, 1022, 1052, 526, 1036, 1078, 511, 35, 519, 1038, 521, 1047, 507, 6, 1065, 1025, 1088, 503, 53, 1031, 504
Pool Name: Pool 1
Pool ID: 0
LUNs: 9, 3, 34, 10, 12, 8, 7, 0, 38, 27, 18, 4, 42, 21, 17, 28, 36, 22, 13, 5, 11, 25, 15, 32, 1
Pool Name: Pool 4
Pool ID: 2
LUNs: (this one is empty)
What I would like to do is store each one of the "LUNs:" into their own variables (array?). Then take my number and search for it in all arrays, in this example there are three. If it matches my number for example "34" the program will output Your number is in Pool 1
I know how to pull the LUN lines I need with Regex expressions and I know how to compare the results with an if statement but get lost combining the two and even more lost when thinking about outputting the correct "Pool Name".
EDIT
I should add the total number of pools can change as well as the LUN number lists.
Convert the output into a single string, replace colons with equals signs and split the string at double line breaks, then convert the fragments into objects using ConvertFrom-StringData and New-Object and split the LUN string into an array:
$data = ... | Out-String
$pools = $data -replace ': +','=' -split "`r`n`r`n" |
% { New-Object -Type PSCustomObject -Property (ConvertFrom-StringData $_) } |
select -Property *,#{n='LUNs';e={$_.LUNs -split ', '}} -Exclude LUNs
With that you can get the pool name of a pool containing a given LUN like this:
$pools | ? { $_.LUNs -contains 34 } | select -Expand 'Pool Name'
I'm sure there's an easier way...
Is that what you need?
$Number = 42
$Lun1=1015, 1080, 1034, 1016, 500, 1002, 1062, 1041, 1046, 1028, 1009, 1054, 513, 1058, 1070
$Lun2=9, 3, 34, 10, 12, 8, 7, 0, 38, 27, 18, 4, 42, 21, 17, 28, 36, 22, 13, 5, 11, 25, 15, 32
$Lun3=$null
$Lun1Length=$Lun1.Length
$Lun2Length=$Lun2.Length
$Lun3Length=$Lun3.Length
[Array]$Luns = $Lun1, $Lun2, $Lun3
foreach ($Lun in $Luns)
{
if ($Lun -contains $Number)
{
Switch ($Lun.Length)
{
$Lun1Length {"$Number in Lun1"}
$Lun2Length {"$Number in Lun2"}
$Lun3Length {"$Number in Lun3"}
}
}
}
42 in Lun2