How to install additional packages in sagemaker pipeline - amazon-sagemaker

I want to add dependency packages in my sagemaker pipeline which will be used in Preprocess step.
I have tried to add it in required_packages in setup.py file but it's not working.
I think setup.py file is no use of at all.
required_packages = ["sagemaker==2.93.0", "matplotlib"]
Preprocessing steps:
sklearn_processor = SKLearnProcessor(
framework_version="0.23-1",
instance_type=processing_instance_type,
instance_count=processing_instance_count,
base_job_name=f"{base_job_prefix}/job-name",
sagemaker_session=pipeline_session,
role=role,
)
step_args = sklearn_processor.run(
outputs=[
ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
],
code=os.path.join(BASE_DIR, "preprocess.py"),
arguments=["--input-data", input_data],
)
step_process = ProcessingStep(
name="PreprocessSidData",
step_args=step_args,
)
Pipeline definition:
pipeline = Pipeline(
name=pipeline_name,
parameters=[
processing_instance_type,
processing_instance_count,
training_instance_type,
model_approval_status,
input_data,
],
steps=[step_process],
sagemaker_session=pipeline_session,
)

For each job within the pipeline you should have separate requirements (so you install only the stuff you need in each step and have full control over it).
To do this, you need to use the source_dir parameter:
source_dir (str or PipelineVariable) – Path (absolute, relative or an
S3 URI) to a directory with any other training source code
dependencies aside from the entry point file (default: None). If
source_dir is an S3 URI, it must point to a tar.gz file. Structure
within this directory are preserved when training on Amazon SageMaker.
Look at the documentation in general for Processing (you have to use FrameworkProcessor).
Within the specified folder, there must be the script (in your case preprocess.py), any other files/modules that may be needed, and the requirements.txt file.
The structure of the folder then will be:
BASE_DIR/
|- requirements.txt
|- preprocess.py
It is the common requirements file, nothing different. And it will be used automatically at the start of the instance, without any instruction needed.
So, your code becomes:
from sagemaker.processing import FrameworkProcessor
from sagemaker.sklearn import SKLearn
from sagemaker.workflow.steps import ProcessingStep
from sagemaker.processing import ProcessingInput, ProcessingOutput
sklearn_processor = FrameworkProcessor(
estimator_cls=SKLearn,
framework_version='0.23-1',
instance_type=processing_instance_type,
instance_count=processing_instance_count,
base_job_name=f"{base_job_prefix}/job-name",
sagemaker_session=pipeline_session,
role=role
)
step_args = sklearn_processor.run(
outputs=[
ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
],
code="preprocess.py",
source_dir=BASE_DIR,
arguments=["--input-data", input_data],
)
step_process = ProcessingStep(
name="PreprocessSidData",
step_args=step_args
)
Note that I changed both the code parameter and the source_dir. It's a good practice to keep the folders for the various steps separate so you have a requirements.txt for each and don't create overlaps.

Related

How to add additional files in Sagemaker Pipeline Processing Step

I want to have additional files which can be imported in preprocess.py file
but i am not able to import these directly.
My directory looks like this:
Want to import from helper_functions directory into preprocess.
I tried to add this in setup.py file but it did not work.
package_data={"pipelines.ha_forecast.helper_functions": ["*.py"]},
One thing which kind of worked was to add this folder in input like this:
inputs = [
ProcessingInput(source=f'{project_name}/{module_name}/helper_functions',
destination="/opt/ml/processing/input/code/helper_functions"),
]
But this was putting the required files in some other directory which I was not able to import anymore.
What is standard way of doing this?
You have to specify the source_dir. Within your script then you can import the modules as you normally do.
source_dir (str or PipelineVariable) – Path (absolute, relative or an
S3 URI) to a directory with any other training source code
dependencies aside from the entry point file (default: None). If
source_dir is an S3 URI, it must point to a tar.gz file. Structure
within this directory are preserved when training on Amazon SageMaker.
Look at the documentation in general for Processing (you have to use FrameworkProcessor and not the specific ones like SKLearnProcessor).
P.S.: The answer is similar to that of the question "How to install additional packages in sagemaker pipeline".
Within the specified folder, there must be the script (in your case preprocess.py), any other files/modules that may be needed, and also eventually the requirements.txt file.
The structure of the folder then will be:
BASE_DIR/
|─ helper_functions/
| |─ your_utils.py
|─ requirements.txt
|─ preprocess.py
Within your preprocess.py, you will call the scripts in a simple way with:
from helper_functions.your_utils import your_class, your_func
So, your code becomes:
from sagemaker.processing import FrameworkProcessor
from sagemaker.sklearn import SKLearn
from sagemaker.workflow.steps import ProcessingStep
from sagemaker.processing import ProcessingInput, ProcessingOutput
BASE_DIR = your_script_dir_path
sklearn_processor = FrameworkProcessor(
estimator_cls=SKLearn,
framework_version=framework_version,
instance_type=processing_instance_type,
instance_count=processing_instance_count,
base_job_name=base_job_name,
sagemaker_session=pipeline_session,
role=role
)
step_args = sklearn_processor.run(
inputs=[your_inputs],
outputs=[your_outputs],
code="preprocess.py",
source_dir=BASE_DIR,
arguments=[your_arguments],
)
step_process = ProcessingStep(
name="ProcessingName",
step_args=step_args
)
It's a good practice to keep the folders for the various steps separate for each and don't create overlaps.

How do I exclude files from svelte-kit build?

If I run npm run build with SvelteKit it seems to include all files from the src folder. Is it possible to exclude a certain file type (eg. *test.js)?
Example
Select demo app with npm init svelte#next my-app
Add the following code to src/routes/todos/foo.test.js
describe('foo', () => {
it('temp', () => {
expect(true).toBe(false)
})
})
npm run build
npm run preview
Result: describe is not defined
Workaround
Move tests outside of src
UPDATE: SvelteKit 1.0.0-beta now requires pages/endpoints to follow a specific naming pattern, so explicit file exclusion should no longer be needed.
SvelteKit specially handles files in the routes/ directory with the following filenames (note the leading + in each filename):
+page.svelte
+page.js
+page.server.js
+error.js
+layout.svelte
+layout.js
+layout.server.js
+server.js
All other files are ignored and can be colocated in the routes/ directory.
If, for some reason, you need to have a file that has a special name shown above, it's currently not possible to exclude that file from special processing.
Original outdated answer:
SvelteKit 1.0.0-beta supports a routes configuration that enables file exclusion from the src/routes directory. The config value is a function that receives a file path as an argument, and returns true to use the file as a route.
For example, the following routes config excludes *.test.js files from routes:
// sveltekit.config.js
⋮
const config = {
kit: {
⋮
routes: filepath => {
return ![
// exclude *test.js files
/\.test\.js$/,
// original default config
/(?:(?:^_|\/_)|(?:^\.|\/\.)(?!well-known))/,
].some(regex => regex.test(filepath))
},
},
}
demo

Codemirror - Module parse failed: Unexpected token

I am using react-codemirror2. I used npx create-react-app appname to create my app.
But when I try to run the development server it gives me the following error -
./node_modules/codemirror/mode/rpm/changes/index.html 1:0
Module parse failed: Unexpected token (1:0)
You may need an appropriate loader to handle this file type.
> <!doctype html>
|
| <title>CodeMirror: RPM changes mode</title>
One solution suggested to change the modulesDirectories. I tried doing so using npm run eject. But wasn't successful at doing it.
Please help me with the same
Go to solutions section if you are in a hurry.
The why
The reason for this error is the way bundling dynamic imports works in webpack. Imports go by providing a starting part of a path and you may refine that through an end part that holds a path that precise an extension.
Ex:
import(`codemirror/mode/${snippetMode}/${snippetMode}`)
ref: https://webpack.js.org/api/module-methods/#dynamic-expressions-in-import
Such an import is processed by webpack through the following regex ^\.\/.*$. That we can see in the error message:
| <title>CodeMirror: RPM changes mode</title>
# ./node_modules/codemirror/mode/ sync ^\.\/.*$ ./rpm/changes/index.html
And that's basically to determine the possible modules. And bundle them as chunks for code splitting.
Because of that open matching regex. So all files of all types (extensions) are matched. Then depending on the extension. The correct loader would be used. If we had the html-loader setup and configured through a rule. That wouldn't fail. As the loader will handle that module (webpack module [html]). Even though that can be an option to handle the problem. It wouldn't be the right thing to do. As it would create an extra bundle for the module. And also some entries for it. In the main bundle.
Dynamic Imports types and there resolution
All types get code splitted (chunks created for them for code splitting). Except for the last one.
Absolute full path. (static) ==> Imported dynamically and code splitted.Ex:
import('codemirror/mode/xml/xml')
The exact path and module will be resolved and a chunk for it will be created.
Dynamic path with a starting path.
Ex:
import(`codemirror/mode/${snippetMode}/${snippetMode}`)
All paths with this regex codemirror\/mode\/.*$ are matched.
Dynamic path with a starting path and an ending path with extension as well.ex:
import(`./locale/${language}.json`)
All paths with the following regex would be matched ./locale/.*\.json which limit it to .json extension module only.
Full variable import.Ex:
import(someVar)
Doesn't create code splitting at all. Neither tries to match. Because it can match just quite anything in the project. Even if the variable is set to a path someVar = 'some/static/path'. It wouldn't check and resolve the variable association.
The solutions
No electron or electron.
module.noParse
module: {
noParse: /codemirror\/.*\.html$/,
rules: [
// ...
noParse tell webpack to not parse the matched files. So it both allows us to exclude files and boost the performance. It works all great. And can be one of the best solutions as well. You can use a regex. Or a function.
ref: https://webpack.js.org/configuration/module/#module-noparse
Don't use the import('some/path' + dynamicEl) format and use instead
const dynamicPath = 'some/path' + dynamicEl
import(dynamicPath) // That would work just ok
Use magic comments [wepackExclude or wepackInclude] with import (if electron and using require => convert require to import + magic comment). Magic comments don't work with require but only import.
ex:
import(
/* webpackExclude: /\.html$/ */
`codemirror/mode/${snippetMode}/${snippetMode}`
)
or even better
import(
/* webpackInclude: /(\.js|\.jsx)$/ */
`codemirror/mode/${snippetMode}/${snippetMode}`
)
Include allows us to make sure we are targeting exactly what we want to target. Excluding HTML. May still create chunks for some other files that are not .js. It's all relative to what we want. If you want to include only a specific extension. An include is better. Otherwise, if we don't know. And some other extension would be required. Then an exclude is better.
ref: https://webpack.js.org/api/module-methods/#magic-comments
Magic comments examples and usage:
// Single target
import(
/* webpackChunkName: "my-chunk-name" */
/* webpackMode: "lazy" */
/* webpackExports: ["default", "named"] */
'module'
);
// Multiple possible targets
import(
/* webpackInclude: /\.json$/ */
/* webpackExclude: /\.noimport\.json$/ */
/* webpackChunkName: "my-chunk-name" */
/* webpackMode: "lazy" */
/* webpackPrefetch: true */
/* webpackPreload: true */
`./locale/${language}`
);
And to fully ignore code splitting
import(/* webpackIgnore: true */ 'ignored-module.js');
If nodejs or electron and the import should be processed as commonjs require. Use the externals config property like in the example bellow:
externals: {
'codemirror': 'commonjs codemirror'
}
which works like this. Webpack for any import with codemirror (left part). convert it to the commonjs import without bundling it (right part).
import(`codemirror/mode/${snippetMode}/${snippetMode}`)
will be converted to:
require(`codemirror/mode/${snippetMode}/${snippetMode}`)
instead of bundled.
extenals doc ref: https://webpack.js.org/configuration/externals/#externals
If electron. And the feature should not be bundled. use window.require for all nodejs electron API calls (Otherwise set it on entry file window.require = require). And use it instead.
Its electron, nodejs ==> Webpack stay the hell out of this
The doctype declaration is incorrect, it should have been:
<!DOCTYPE html>
Notice DOCTYPE should be in caps.

Any way to have a common netlify.toml file for a single repository and multiple sites?

I'm looking out a way to define two site builds on netlify, sourced from the same repo, using a single common netlify.toml. Is it possible to do so?
I have a GitHub repository named hugo-dream-plus for which I've configured two website builds on netlify, namely dream-plus-posts and dream-plus-cards. Both of these builds share the same environment variables and mostly all of the configurations, except for the build commands:
hugo --config cards.toml #For dream-plus-cards
hugo --config posts.toml #For dream-plus-posts
I was wondering if there was a way for me to create a common netlify.toml file, since the repo is the same for both builds, for both these sites.
I've already used the web UI for configuring each build separately, but it's quite bothersome to modify each of them, that's why I'm preferring the above scenario.
What I plan to do is to have all configurations shared between the two builds except for the build command, which would be defined separately as shown above.
As of the date of this answer, Netlify does not support a way to change values in the netlify.toml, because it is read in prior to your build. Except for the headers and redirects, which allow you to change at build.
Using Environment Variables directly as values ($VARIABLENAME) in your netlify.toml file is not supported.
However
You could run a script command and have it changed based on the domain or an environment variable. There are a few setups that would work.
Here is how I might accomplish what you want based on the domain name.
netlify.toml
[build]
command = "node ./scripts/custom.js"
publish = "public"
scripts/custom.js
const exec = require('child_process').exec;
const site = process.env.URL || "https://example.com";
const domain = site.split('/')[site.split('/').length - 1];
let buildCommand;
switch(domain) {
case "dream-plus-posts.netlify.com":
buildCommand = 'hugo --config posts.toml';
break;
case "dream-plus-cards.netlify.com":
buildCommand = 'hugo --config cards.toml';
break;
default:
throw `Domain ${domain} is invalid`;
}
async function execute(command){
return await exec(command, function(error, stdout, stderr){
if (error) {
throw error;
}
console.log(`site: ${site}`);
console.log(`domain: ${domain}`);
console.log(stdout);
});
};
execute(buildCommand);
Things to note:
I did not test the stdout to the log using this method with Hugo. The child process captures the output and returns it in stdout.
We don't want to capture errors, because we want our build to fail on errors so this will cause an exit code other than 0
You can inline other commands with this solution (i.e. "node ./scripts/custom.js && some other command before deploy")
You could also just check an environment variable you set rather than domain name

Is it good to use package.json file in react component file?

I have to use version mentioned in package.json file in front-end(react js) file.
{
"name": "asdfg",
"version": "3.5.2", // want to use this
"description": "description",
"scripts": {}
//etc etc etc
......
}
Send package.json [version] to Angularjs Front end for display purposes
I'd gone through above post and found two ways for the same. but none of them I was asked to implement.
#1. During build process
#2. By creating endpoint
So I want to know the approach below is valid/good or not ?
react-front-end-file.js
import packageJson from '../package.json'; // imported
...
...
// Usage which gives me version - 3.5.2
<div className='app-version'>{packageJson.version}</div>
Let me know if this approach is fine.
The below 2 approaches seems to have either dependency or add an extra implentation which might not be needed
During build process - ( has dependency on module bundler like webpack etc.)
By creating endpoint - ( needs an extra code at server just to get version )
Instead, As package.json is a file which takes json object in it so you can use it to import json and use its any keys mentioned in that file ( version in your case but only constraint here is, you should have access to package.json file after application deployment, so dont forget to move file in deployment environment )
So your approach seems to be fine.
I would do something like this:
In your module bundler, require your package json file and define a global variable and use it wherever you want
e.g. I do something like this in webpack:
const packageJson = require('./package.json')
const plugins = [
new webpack.DefinePlugin(
{
'__APPVERSION__': JSON.stringify(packageJson.version)
}
)
]
React Component:
<div className='app-version'>{__APPVERSION__}</div>

Resources