LLVM IR -- how to convert array store to memcpy? - c

LLVM IR includes arrays as a base type, so a "store" instruction in IR will take an array object and store it to a pointer to memory.
I'm compiling to a C environment, so I need to convert "store" instructions to calls to memcpy. I've tried to use IRBuilder to make the job easier, but I'm stuck on how to take the address of an object.
The function I've written is as follows:
bool convert_array_store_to_memcpy(llvm::StoreInst *instruction)
{
llvm::Type *value_type = instruction->getValueOperand()->getType();
if (!value_type->isArrayTy())
return false;
/* set up IRBuilder and get the pieces of the store */
llvm::IRBuilder<> Builder(llvm::getGlobalContext());
Builder.SetInsertPoint(instruction);
llvm::Value *destination = instruction->getPointerOperand();
llvm::Value *source = instruction->getValueOperand();
/* get the number of bytes by getting the size of the array (elements*element-size) */
llvm::ArrayType *array_type = cast<ArrayType>(value_type);
uint64_t element_count = array_type->getNumElements();
llvm::Type *element_type = array_type->getElementType();
DataLayout *targetData = new DataLayout(mod);
uint64_t element_size = targetData->getTypeAllocSize(element_type);
uint64_t size = element_count*element_size;
/* PROBLEM: I am trying to take the address of the start of the array */
llvm::Type *i32_type = llvm::IntegerType::getInt32Ty(llvm::getGlobalContext());
llvm::Constant *constant_int = llvm::ConstantInt::get(i32_type, 0, true);
Value *indexList[1] = {constant_int};
/* NEW PROBLEM:indexList seems to be the wrong type or contain the wrong type of thing */
llvm::Value *pointer_to_source = Builder.CreateGEP(source, ArrayRef<Value*>(indexList, 1));
unsigned alignment = instruction->getAlignment();
if (!array_type)
fprintf(stderr, "ERROR!\n");
/* insert the memcpy */
llvm::CallInst *memcpy_call = Builder.CreateMemCpy(destination,
pointer_to_source,
size,
alignment,
instruction->isVolatile());
/* erase the store */
instruction->eraseFromParent();
return true;
} /* convert_array_store_to_memcpy */
This compiles, but I get the following runtime error from the call to IRBuilder::CreateGEP:
.../llvm/install/include/llvm/IR/Instructions.h:782: llvm::Type
*llvm::checkGEPType(llvm::Type *): Assertion `Ty && "Invalid GetElementPtrInst indices for type!"' failed.
Note that I'm using LLVM 3.6 under Linux.
EDIT: clearly, the call to createGEP is sending a null instead of the constant zero -- the intent was to get the address of the zeroth element of the array. I've edited the above function with my latest effort, which is to try to send a length-1 array of indices into createGEP. This is also failing inside of getIndexedType, which returns a NULL pointer, which I, again, don't understand.
Note: I am using the example from a previous StackOverflow answer: Inserting GetElementpointer Instruction in LLVM IR

Related

Extent of MISRA C 2012 Directive 4.1: Runtime checks before pointer dereferencing in a library

MISRA C 2012 Directive 4.1 says that Run-time failures should be minimized and further states for pointer dereferencing that a pointer should be checked for NULL before it's dereferenced,
unless it's already known to be not NULL.
When writing a library performing simple operations, checking the input pointers in each library function
for NULL creates larger and therefore less understandable code even for simple operations.
For example below library computes the squared euclidian norm of a vector and gets used in a controller
/*** Option 1: Checking pointers ***/
Library:
/* file vector.c */
bool_t bVectorNormSq(float32_t pf32Vec[], uint8_t u8Len, float32_t *pf32NormSq)
{
uint8_t u8n;
bool bRet = false;
/* Check pointers */
if( (pf32Vec != NULL) && (pf32NormSq != NULL) )
{
*pf32NormSq = 0.0f;
for(u8n = 0U; u8n < u8Len; u8n++)
{
*pf32NormSq += (pf32Vec[u8n] * pf32Vec[u8n]);
}
bRet = true;
}
else
{
/* Do not alter pf32NormSq (unknown if valid pointer) */
bRet = false;
}
return bRet;
}
/* EOF */
Consumer of library:
/* file controller.c */
/* ... */
bool_t bControllerStep(void)
{
float32_t pf32MyVec[3] = { 0 };
float32_t f32MyNorm = 0.0f;
/* ... */
/* MISRA C 2012 Rule 17.7, Call will always be successful, thus return value not checked */
(void)bVectorNormSq(pf32MyVec, 3U, &f32MyNorm);
/* ... */
}
/* EOF */
/*** Option 2: Not checking pointer, responsibility to supply valid inputs placed on caller ***/
Library:
/* file vector.c */
/**
* #note This library assumes that valid pointers will be supplied,
* pointers are NOT checked before they are used.
*/
/* Assert macro expands to "(void)(CONDITION)" for NDEBUG defined */
#ifdef NDEBUG
VECTOR_ASSERT( CONDITION ) (void)(CONDITION)
#else
/* ... */
#endif /* NDEBUG */
float32_t f32VectorNormSq(float32_t pf32Vec[], uint8_t u8Len)
{
float32_t f32Norm = 0.0f;
uint8_t u8n = 0U;
VECTOR_ASSERT(pf32Vec!=NULL);
for(u8n = 0U; u8n < u8Len; u8n++)
{
f32NormSq += (pf32Vec[u8n] * pf32Vec[u8n]);
}
}
/* EOF */
Consumer of library:
/* file controller.c */
/* ... */
bool_t bControllerStep(void)
{
float32_t pf32MyVec[3] = { 0 };
float32_t f32MyNorm = 0.0f;
/* ... */
f32MyNorm = f32VectorNormSq(pf32MyVec, 3U);
/* ... */
}
/* EOF */
For option 1 the library function f32VectorNormSq() can be reasoned by the caller bControllerStep() to always execute successfully (return true),
because the array pf32MyVec is defined in the caller and pf32MyVec thus can't be NULL. Therefore the caller chooses to ignore the return value.
Above reasoning is likely to be the applicable for many uses of the library, resulting in callers
frequently ignoring return values for option 1. This could lead to a programmer being complacend in ignoring return values, even if they shouldn't, e.g. because a
zero division was detected by the library.
Opposed to this option 2 assumes the caller supplying valid pointers, documents this in the library /* #note...*/ and only uses a assert for NULL pointers to assist during development,
later to be disabled for deployment. In this option a Boolean return value being present highlights to the programmer that the library operation might fail for reasons
other than invalid pointers or incorrect other trivial inputs, e.g due to zero division and care should be taken before using the computed values,
avoiding the risk of complacency.
Therefore my question(s):
Is it situations as in the example shown compliant to MISRA C for a libary to neglect checking pointers or other trivial inputs?
Are there any additional conditions that have to be fullfilled, besides documenting the missing input checking in the source code, e.g. is a formal deviation needed?
Application background:
This is just an aerospace student with focus in control systems, trying to understand how things are done properly in the actual coding of control systems, by writing his own control algorithms as if they were running on the real thing i.e. by following standards like MISRA C.
Code written by me will NOT be going on a live system anytime soon, but for the purpose of your answers please consider this code as running on the actual system where a failure of the function is classified as catastrophic, .i.e. everybody dies.
I'm aware of the whole software and hardware engineering (SAE ARP4754, SAE ARP4761, DO-178C, ...) around the actual implementation process. This question is really just about the instructions executing on the hardware, not about the need for redudant & dissimilar hardware or requirements, reviews, testing, ...
EDIT:
The library in question is low in the code stack thus sanitizing code for inputs from the outside (sensors,...) can be expected to be present. I'm trying to avoid falling prey to Cargo cult programming by blindly following the rule "Always check all inputs".
Regarding Option 1 - how would you write the documentation to that function? Does this make sense:
bVectorNormSq
Description of what this function does here.
pf32Vec A pointer to an allocated array of [u8Len] that will-...
u8Len The size of the array [pf32Vec] in bytes.
...
Returns: true if successful, false in case pf32Vec was a null pointer.
It doesn't make sense. You already documented that pf32Vec must be a pointer to an allocated array so why would the usernot read that part, but read the part about return status? How are we even supposed to use this function?
bool result = bVectorNormSq(some_fishy_pointer, ...); // checks for null internally
if(result == false)
{
/* error: you passed a null pointer */
}
Why can't you instead write that very same code like this?
if(some_fishy_pointer == NULL)
{
/* handle the error instead of calling the function in the first place */
}
else
{
bVectorNormSq(some_fishy_pointer, ...); // does not check for null internally
}
It's the same amount of error checking, same amount of branches. Literally the only difference is the slower execution speed of the first version.
Also, your extra error checks add extra complexity which could in turn lead to extra bugs.
Potential bugs:
if( (pf32Vec = NULL) ||(pf32NormSq = NULL) )
or
if( (pf32Vec =! NULL) && (pf32NormSq != NULL) )
or (not just a potential bug):
bool_t ... bool bRet = false; ... return bRet;
Extra error checking is a good thing, but place it where it matters - sanitize data at the point where it gets assigned. Example:
const char* ptr = strstr(x,y);
if(ptr == NULL) // Correct! Check close to the location where the pointer is set
{}
...
some_function(ptr);
...
void some_function (const char* str)
{
if(str == NULL) // Not correct. This function has nothing to do with the strstr call.
}

Vulkan vkCreateInstance - Access violation writing location 0x0000000000000000

I am trying to write a basic program using Vulkan, but I keep getting a runtime error.
Exception thrown at 0x00007FFDC27A8DBE (vulkan-1.dll) in VulkanTest.exe: 0xC0000005: Access violation writing location 0x0000000000000000.
This seems to be a relatively common issue, resulting from a failure to initialize the arguments of the vkCreateInstance function. I have tried all of the solutions I found proposed to others, even initializing things I am fairly certain I don't need to, and I still haven't been able to solve the problem. The program is written in C using the MSVC compiler.
#include "stdio.h"
#include "SDL.h"
#include "vulkan\vulkan.h"
#include "System.h"
int main(int argc, char *argv[])
{
//Initialize SDL
if (SDL_Init(SDL_INIT_EVERYTHING) < 0)
{
printf("Error");
}
printf("Success");
//Initialize Vulkan
VkInstance VulkanInstance;
VkApplicationInfo VulkanApplicationInfo;
VulkanApplicationInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO;
VulkanApplicationInfo.pNext = 0;
VulkanApplicationInfo.pApplicationName = "VulkanTest";
VulkanApplicationInfo.applicationVersion = VK_MAKE_VERSION(1, 0, 0);
VulkanApplicationInfo.pEngineName = "VulkanTest";
VulkanApplicationInfo.engineVersion = VK_MAKE_VERSION(1, 0, 0);
VulkanApplicationInfo.apiVersion = VK_API_VERSION_1_0;
VkInstanceCreateInfo VulkanCreateInfo = {0,0,0,0,0,0,0,0};
VulkanCreateInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
VulkanCreateInfo.pNext = 0;
VulkanCreateInfo.pApplicationInfo = &VulkanApplicationInfo;
VulkanCreateInfo.enabledLayerCount = 1;
VulkanCreateInfo.ppEnabledLayerNames = "VK_LAYER_KHRONOS_validation";
vkEnumerateInstanceExtensionProperties(0, VulkanCreateInfo.enabledExtensionCount,
VulkanCreateInfo.ppEnabledExtensionNames);
//Create Vulkan Instance
if(vkCreateInstance(&VulkanCreateInfo, 0, &VulkanInstance) != VK_SUCCESS)
{
printf("Vulkan instance was not created");
}
//Create SDL Window
SDL_Window* window;
window = SDL_CreateWindow("VulkanTest", SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED, 0, 0, SDL_WINDOW_VULKAN || SDL_WINDOW_FULLSCREEN_DESKTOP);
SDL_Delay(10000);
return 0;
}
Are you sure the call to vkCreateInstance() is what is crashing? I have not tried to debug the code you have shown (that is your job), but just looking at the calls that the code is making, it should be the call to vkEnumerateInstanceExtensionProperties() that is crashing (if it even compiles at all!).
The 2nd parameter of vkEnumerateInstanceExtensionProperties() expects a uint32_t* pointer, but you are passing in a uint32_t value (VulkanCreateInfo.enabledExtensionCount) that has been initialized to 0. So, that would make the pPropertyCount parameter be a NULL pointer (if it even compiles).
You are passing VulkanCreateInfo.ppEnabledExtensionNames in the 3rd parameter (if that even compiles), and ppEnabledExtensionNames has been initialized to NULL. Per the documentation for vkEnumerateInstanceExtensionProperties():
If pProperties is NULL, then the number of extensions properties available is returned in pPropertyCount. Otherwise, pPropertyCount must point to a variable set by the user to the number of elements in the pProperties array, and on return the variable is overwritten with the number of structures actually written to pProperties.
Since pPropertCount is NULL, vkEnumerateInstanceExtensionProperties() has nowhere to write the property count to! That would certainly cause an Access Violation trying to write to address 0x0000000000000000.
The documentation clears states:
pPropertyCount must be a valid pointer to a uint32_t value
On top of that, your call to vkEnumerateInstanceExtensionProperties() is just logically wrong anyway, because the 3rd parameter expects a pointer to an array of VkExtensionProperties structs, but VulkanCreateInfo.ppEnabledExtensionNames is a pointer to an array of const char* UTF-8 strings instead.
In other words, you should not be using vkEnumerateInstanceExtensionProperties() to initialize criteria for the call to vkCreateInstance(). You are completely misusing vkEnumerateInstanceExtensionProperties(). You probably meant to use SDL_Vulkan_GetInstanceExtensions() instead, eg:
uint32_t ExtensionCount = 0;
if (!SDL_Vulkan_GetInstanceExtensions(NULL, &ExtensionCount, NULL))
{
...
}
const char **ExtensionNames = (const char **) SDL_malloc(sizeof(const char *) * ExtensionCount);
if (!ExtensionNames)
{
...
}
if (!SDL_Vulkan_GetInstanceExtensions(NULL, &ExtensionCount, ExtensionNames))
{
SDL_free(ExtensionNames);
...
}
VulkanCreateInfo.enabledExtensionCount = ExtensionCount;
VulkanCreateInfo.ppEnabledExtensionNames = ExtensionNames;
if (vkCreateInstance(&VulkanCreateInfo, 0, &VulkanInstance) != VK_SUCCESS)
{
...
}
SDL_free(ExtensionNames);
...

Given a GIMPLE Call statment which has two arguments, I want to add a third one, how?

I have to do some GIMPLE_CALL Statement manipulations. This GIMPLE_CALL will have two arguments, e.g: foo(a,b). My goal is change this method to a different method having THREE arguments e.g. zoo(a,b,c)
In my current approach, GCC crashes during while compiling a sample source program.
My code works when all I do is replace the method name (i.e. not changing the argument numbers).
Also, I was not able to find any methods dedicated to adding/removing argument numbers for a GIMPLE_CALL. Which leads me to believe that it might not be the right approach.
Code:
//Getting the current number of Call Arguments from target GIMPLE
//statememt
unsigned num_of_ops = gimple_call_num_args(stmt);
//Replace the method name to a new Method
gimple_call_set_fndecl(stmt, new_method);
//We need to increment total number of call arguments by 1
//Total numer of arguments are, Number of CALL Arguments + 3
//You can confirm this in definitions of gimple_call_num_args() and
//gimple_call_set_arg()
gimple_set_num_ops(stmt,num_of_ops+3+1);
//Add the new argument
gimple_call_set_arg(stmt, num_of_ops, third_argument);
update_stmt (stmt);
It seems, you can only adjust num_ops with this approach to a lesser value.
gimple_set_num_ops is a simple setter, it does not allocate storage:
static inline void
gimple_set_num_ops (gimple *gs, unsigned num_ops)
{
gs->num_ops = num_ops;
}
You will have to create another GIMPLE statement.
I think, this usage in the GCC codebase solves exactly the same problem that you have (from gcc/gimple.c):
/* Set the RHS of assignment statement pointed-to by GSI to CODE with
operands OP1, OP2 and OP3.
NOTE: The statement pointed-to by GSI may be reallocated if it
did not have enough operand slots. */
void
gimple_assign_set_rhs_with_ops (gimple_stmt_iterator *gsi, enum tree_code code,
tree op1, tree op2, tree op3)
{
unsigned new_rhs_ops = get_gimple_rhs_num_ops (code);
gimple *stmt = gsi_stmt (*gsi);
gimple *old_stmt = stmt;
/* If the new CODE needs more operands, allocate a new statement. */
if (gimple_num_ops (stmt) < new_rhs_ops + 1)
{
tree lhs = gimple_assign_lhs (old_stmt);
stmt = gimple_alloc (gimple_code (old_stmt), new_rhs_ops + 1);
memcpy (stmt, old_stmt, gimple_size (gimple_code (old_stmt)));
gimple_init_singleton (stmt);
/* The LHS needs to be reset as this also changes the SSA name
on the LHS. */
gimple_assign_set_lhs (stmt, lhs);
}
gimple_set_num_ops (stmt, new_rhs_ops + 1);
gimple_set_subcode (stmt, code);
gimple_assign_set_rhs1 (stmt, op1);
if (new_rhs_ops > 1)
gimple_assign_set_rhs2 (stmt, op2);
if (new_rhs_ops > 2)
gimple_assign_set_rhs3 (stmt, op3);
if (stmt != old_stmt)
gsi_replace (gsi, stmt, false);
}

Can gcc/clang optimize initialization computing?

I recently wrote a parser generator tool that takes a BNF grammar (as a string) and a set of actions (as a function pointer array) and output a parser (= a state automaton, allocated on the heap). I then use another function to use that parser on my input data and generates a abstract syntax tree.
In the initial parser generation, there is quite a lot of steps, and i was wondering if gcc or clang are able to optimize this, given constant inputs to the parser generation function (and never using the pointers values, only dereferencing them) ? Is is possible to run the function at compile time, and embed the result (aka, the allocated memory) in the executable ?
(obviously, that would be using link time optimization, since the compiler would need to be able to check that the whole function does indeed have the same result with the same parameters)
What you could do in this case is have code that generates code.
Have your initial parser generator as a separate piece of code that runs independently. The output of this code would be a header file containing a set of variable definitions initialized to the proper values. You then use this file in your main code.
As an example, suppose you have a program that needs to know the number of bits that are set in a given byte. You could do this manually whenever you need:
int count_bits(uint8_t b)
{
int count = 0;
while (b) {
count += b & 1;
b >>= 1;
}
return count;
}
Or you can generate the table in a separate program:
int main()
{
FILE *header = fopen("bitcount.h", "w");
if (!header) {
perror("fopen failed");
exit(1);
}
fprintf(header, "int bit_counts[256] = {\n");
int count;
unsigned v;
for (v=0,count=0; v<256; v++) {
uint8_t b = v;
while (b) {
count += b & 1;
b >>= 1;
}
fprintf(header, " %d,\n" count);
}
fprintf(header, "};\n");
fclose(header);
return 0;
}
This create a file called bitcount.h that looks like this:
int bit_counts[256] = {
0,
1,
1,
2,
...
7,
};
That you can include in your "real" code.

How to chain BCryptEncrypt and BCryptDecrypt calls using AES in GCM mode?

Using the Windows CNG API, I am able to encrypt and decrypt individual blocks of data with authentication, using AES in GCM mode. I now want to encrypt and decrypt multiple buffers in a row.
According to documentation for CNG, the following scenario is supported:
If the input to encryption or decryption is scattered across multiple
buffers, then you must chain calls to the BCryptEncrypt and
BCryptDecrypt functions. Chaining is indicated by setting the
BCRYPT_AUTH_MODE_IN_PROGRESS_FLAG flag in the dwFlags member.
If I understand it correctly, this means that I can invoke BCryptEncrypt sequentially on multiple buffers an obtain the authentication tag for the combined buffers at the end. Similarly, I can invoke BCryptDecrypt sequentially on multiple buffers while deferring the actual authentication check until the end. I can not get that to work though, it looks like the value for dwFlags is ignored. Whenever I use BCRYPT_AUTH_MODE_IN_PROGRESS_FLAG, I get a return value of 0xc000a002 , which is equal to STATUS_AUTH_TAG_MISMATCH as defined in ntstatus.h.
Even though the parameter pbIV is marked as in/out, the elements pointed to by the parameter pbIV do not get modified by BCryptEncrypt(). Is that expected? I also looked at the field pbNonce in the BCRYPT_AUTHENTICATED_CIPHER_MODE_INFO structure, pointed to by the pPaddingInfo pointer, but that one does not get modified either. I also tried "manually" advancing the IV, modifying the contents myself according to the counter scheme, but that did not help either.
What is the right procedure to chain the BCryptEncrypt and/or BCryptDecrypt functions successfully?
I managed to get it to work. It seems that the problem is in MSDN, it should mention setting BCRYPT_AUTH_MODE_CHAIN_CALLS_FLAG instead of BCRYPT_AUTH_MODE_IN_PROGRESS_FLAG.
#include <windows.h>
#include <assert.h>
#include <vector>
#include <Bcrypt.h>
#pragma comment(lib, "bcrypt.lib")
std::vector<BYTE> MakePatternBytes(size_t a_Length)
{
std::vector<BYTE> result(a_Length);
for (size_t i = 0; i < result.size(); i++)
{
result[i] = (BYTE)i;
}
return result;
}
std::vector<BYTE> MakeRandomBytes(size_t a_Length)
{
std::vector<BYTE> result(a_Length);
for (size_t i = 0; i < result.size(); i++)
{
result[i] = (BYTE)rand();
}
return result;
}
int _tmain(int argc, _TCHAR* argv[])
{
NTSTATUS bcryptResult = 0;
DWORD bytesDone = 0;
BCRYPT_ALG_HANDLE algHandle = 0;
bcryptResult = BCryptOpenAlgorithmProvider(&algHandle, BCRYPT_AES_ALGORITHM, 0, 0);
assert(BCRYPT_SUCCESS(bcryptResult) || !"BCryptOpenAlgorithmProvider");
bcryptResult = BCryptSetProperty(algHandle, BCRYPT_CHAINING_MODE, (BYTE*)BCRYPT_CHAIN_MODE_GCM, sizeof(BCRYPT_CHAIN_MODE_GCM), 0);
assert(BCRYPT_SUCCESS(bcryptResult) || !"BCryptSetProperty(BCRYPT_CHAINING_MODE)");
BCRYPT_AUTH_TAG_LENGTHS_STRUCT authTagLengths;
bcryptResult = BCryptGetProperty(algHandle, BCRYPT_AUTH_TAG_LENGTH, (BYTE*)&authTagLengths, sizeof(authTagLengths), &bytesDone, 0);
assert(BCRYPT_SUCCESS(bcryptResult) || !"BCryptGetProperty(BCRYPT_AUTH_TAG_LENGTH)");
DWORD blockLength = 0;
bcryptResult = BCryptGetProperty(algHandle, BCRYPT_BLOCK_LENGTH, (BYTE*)&blockLength, sizeof(blockLength), &bytesDone, 0);
assert(BCRYPT_SUCCESS(bcryptResult) || !"BCryptGetProperty(BCRYPT_BLOCK_LENGTH)");
BCRYPT_KEY_HANDLE keyHandle = 0;
{
const std::vector<BYTE> key = MakeRandomBytes(blockLength);
bcryptResult = BCryptGenerateSymmetricKey(algHandle, &keyHandle, 0, 0, (PUCHAR)&key[0], key.size(), 0);
assert(BCRYPT_SUCCESS(bcryptResult) || !"BCryptGenerateSymmetricKey");
}
const size_t GCM_NONCE_SIZE = 12;
const std::vector<BYTE> origNonce = MakeRandomBytes(GCM_NONCE_SIZE);
const std::vector<BYTE> origData = MakePatternBytes(256);
// Encrypt data as a whole
std::vector<BYTE> encrypted = origData;
std::vector<BYTE> authTag(authTagLengths.dwMinLength);
{
BCRYPT_AUTHENTICATED_CIPHER_MODE_INFO authInfo;
BCRYPT_INIT_AUTH_MODE_INFO(authInfo);
authInfo.pbNonce = (PUCHAR)&origNonce[0];
authInfo.cbNonce = origNonce.size();
authInfo.pbTag = &authTag[0];
authInfo.cbTag = authTag.size();
bcryptResult = BCryptEncrypt
(
keyHandle,
&encrypted[0], encrypted.size(),
&authInfo,
0, 0,
&encrypted[0], encrypted.size(),
&bytesDone, 0
);
assert(BCRYPT_SUCCESS(bcryptResult) || !"BCryptEncrypt");
assert(bytesDone == encrypted.size());
}
// Decrypt data in two parts
std::vector<BYTE> decrypted = encrypted;
{
DWORD partSize = decrypted.size() / 2;
std::vector<BYTE> macContext(authTagLengths.dwMaxLength);
BCRYPT_AUTHENTICATED_CIPHER_MODE_INFO authInfo;
BCRYPT_INIT_AUTH_MODE_INFO(authInfo);
authInfo.pbNonce = (PUCHAR)&origNonce[0];
authInfo.cbNonce = origNonce.size();
authInfo.pbTag = &authTag[0];
authInfo.cbTag = authTag.size();
authInfo.pbMacContext = &macContext[0];
authInfo.cbMacContext = macContext.size();
// IV value is ignored on first call to BCryptDecrypt.
// This buffer will be used to keep internal IV used for chaining.
std::vector<BYTE> contextIV(blockLength);
// First part
authInfo.dwFlags = BCRYPT_AUTH_MODE_CHAIN_CALLS_FLAG;
bcryptResult = BCryptDecrypt
(
keyHandle,
&decrypted[0*partSize], partSize,
&authInfo,
&contextIV[0], contextIV.size(),
&decrypted[0*partSize], partSize,
&bytesDone, 0
);
assert(BCRYPT_SUCCESS(bcryptResult) || !"BCryptDecrypt");
assert(bytesDone == partSize);
// Second part
authInfo.dwFlags &= ~BCRYPT_AUTH_MODE_CHAIN_CALLS_FLAG;
bcryptResult = BCryptDecrypt
(
keyHandle,
&decrypted[1*partSize], partSize,
&authInfo,
&contextIV[0], contextIV.size(),
&decrypted[1*partSize], partSize,
&bytesDone, 0
);
assert(BCRYPT_SUCCESS(bcryptResult) || !"BCryptDecrypt");
assert(bytesDone == partSize);
}
// Check decryption
assert(decrypted == origData);
// Cleanup
BCryptDestroyKey(keyHandle);
BCryptCloseAlgorithmProvider(algHandle, 0);
return 0;
}
#Codeguard's answer got me through the project I was working on which lead me to find this question/answer in the first place; however, there were still a number of gotchas I struggled with. Below is the process I followed with the tricky parts called out. You can view the actual code at the link above:
Use BCryptOpenAlgorithmProvider to open the algorithm provider using BCRYPT_AES_ALGORITHM.
Use BCryptSetProperty to set the BCRYPT_CHAINING_MODE to BCRYPT_CHAIN_MODE_GCM.
Use BCryptGetProperty to get the BCRYPT_OBJECT_LENGTH to allocate for use by the BCrypt library for the encrypt/decrypt operation. Depending on your implementation, you may also want to:
Use BCryptGetProperty to determine BCRYPT_BLOCK_SIZE and allocate scratch space for the IV. The Windows API updates the IV with each call, and the caller is responsible for providing the memory for that usage.
Use BCryptGetProperty to determine BCRYPT_AUTH_TAG_LENGTH and allocate scratch space for the largest possible tag. Like the IV, the caller is responsible for providing this space, which the API updates each time.
Initialize the BCRYPT_AUTHENTICATED_CIPHER_MODE_INFO struct:
Initialize the structure with BCRYPT_INIT_AUTH_MODE_INFO()
Initialize the pbNonce and cbNonce field. Note that for the first call to BCryptEncrypt/BCryptDecrypt, the IV is ignored as an input and this field is used as the "IV". However, the IV parameter will be updated by that first call and used by subsequent calls, so space for it must still be provided. In addition, the pbNonce and cbNonce fields must remain set (even though they are unused after the first call) for all calls to BCryptEncrypt/BCryptDecrypt or those calls will complain.
Initialize pbAuthData and cbAuthData. In my project, I set these fields just before the first call to BCryptEncrypt/BCryptDecrypt and immediately reset them to NULL/0 immediately afterward. You can pass NULL/0 as the input and output parameters during these calls.
Initialize pbTag and cbTag. pbTag can be NULL until the final call to BCryptEncrypt/BCryptDecrypt when the tag is retrieved or checked, but cbTag must be set or else BCryptEncrypt/BCryptDecrypt will complain.
Initialize pbMacContext and cbMacContext. These point to a scratch space for the BCryptEncrypt/BCryptDecrypt to use to keep track of the current state of the tag/mac.
Initialize cbAAD and cbData to 0. The APIs use these fields, so you can read them at any time, but you should not update them after initially setting them to 0.
Initialize dwFlags to BCRYPT_AUTH_MODE_CHAIN_CALLS_FLAG. After initialization, changes to this field should be made by using |= or &=. Windows also sets flags within this field that the caller needs to take care not to alter.
Use BCryptGenerateSymmetricKey to import the key to use for encryption/decryption. Note that you will need to supply the memory associated with BCRYPT_OBJECT_LENGTH to this call for use by BCryptEncrypt/BCryptDecrypt during operation.
Call BCryptEncrypt/BCryptDecrypt with your AAD, if any; no input nor space for output need be supplied for this call. (If the call succeeds, you can see the size of your AAD reflected in the cbAAD field of the BCRYPT_AUTHENTICATED_CIPHER_MODE_INFO structure.)
Set pbAuthData and cbAuthData to reflect the AAD.
Call BCryptEncrypt or BCryptDecrypt.
Set pbAuthData and cbAuthData back to NULL and 0.
Call BCryptEncrypt/BCryptDecrypt "N - 1" times
The amount of data passed to each call must be a multiple of the algorithm's block size.
Do not set the dwFlags parameter of the call to anything other than 0.
The output space must be equal to or greater than the size of the input
Call BCryptEncrypt/BCryptDecrypt one final time (with or without plain/cipher text input/output). The size of the input need not be a multiple of the algorithm's block size for this call. dwFlags is still set to 0.
Set the pbTag field of the BCRYPT_AUTHENTICATED_CIPHER_MODE_INFO structure either to the location at which to store the generated tag or to the location of the tag to verify against, depending on whether the operation is an encryption or decryption.
Remove the BCRYPT_AUTH_MODE_CHAIN_CALLS_FLAG from the dwFlags field of the BCRYPT_AUTHENTICATED_CIPHER_MODE_INFO structure using the &= syntax.
Call BCryptDestroyKey
Call BCryptCloseAlgorithmProvider
It would be wise, at this point, to wipe out the space associated with BCRYPT_OBJECT_LENGTH.

Resources