为什么在 Win32 控制台应用程序启动时会出现三个意外的工作线程?

2021-12-18 00:00:00 multithreading winapi c++

这是情况的截图!

我使用 VS2010 创建了一个 Visual C++ Win32 控制台应用程序.当我启动应用程序时,我发现有四个线程:一个主线程"和三个工作线程(我没有写任何代码).

I created a Visual C++ Win32 Console Application with VS2010. When I started the application, I found that there were four threads: one 'Main Thread' and three worker threads (I didn't write any code).

我不知道这三个工作线程从何而来.
我想知道这三个线程的作用.

I don't know where these three worker threads came from.
I would like to know the role of these three threads.

提前致谢!

推荐答案

Windows 10 实现了一种新的 DLL 加载方式 - 多个工作线程并行执行 (LdrpWorkCallback).所有 Windows 10 进程现在都有几个这样的线程.

Windows 10 implemented a new way of loading DLLs - several worker threads do it in parallel (LdrpWorkCallback). All Windows 10 processes now have several such threads.

在 Win10 之前,系统 (ntdll.dll) 总是在单个线程中加载 DLL,但从 Win10 开始,这种行为发生了变化.现在是并行加载器"存在于 ntdll 中.现在加载任务(NTSTATUS LdrpSnapModule(LDRP_LOAD_CONTEXT* LoadContext))可以在工作线程中执行.几乎每个 DLL 都有导入(依赖 DLL),所以当一个 DLL 被加载时 - 它的依赖 DLL 也被加载并且这个过程是递归的(依赖 DLL 有自己的依赖).

Before Win10, the system (ntdll.dll) always loaded DLLs in a single thread, but starting with Win10 this behaviour changed. Now a "Parallel loader" exists in ntdll. Now the loading task (NTSTATUS LdrpSnapModule(LDRP_LOAD_CONTEXT* LoadContext)) can be executed in worker threads. Almost every DLL has imports (dependent DLLs), so when a DLL is loaded - its dependent DLLs are also loaded and this process is recursive (dependent DLLs have own dependencies).

函数 void LdrpMapAndSnapDependency(LDRP_LOAD_CONTEXT* LoadContext) 遍历当前加载的 DLL 导入表,并通过调用 LdrpLoadDependentModule()(其中为新加载的 DLL 在内部调用 LdrpMapAndSnapDependency() - 所以这个过程是递归的).最后,LdrpMapAndSnapDependency() 需要调用 NTSTATUS LdrpSnapModule(LDRP_LOAD_CONTEXT* LoadContext) 将导入绑定到已加载的 DLL.LdrpSnapModule() 在顶级 DLL 加载过程中为许多 DLL 执行,并且该过程对于每个 DLL 都是独立的 - 因此这是并行化的好地方.LdrpSnapModule() 在大多数情况下不会加载新的 DLL,而只会将导入绑定到已加载的导出.但是,如果导入被解析为转发导出(这种情况很少发生) - 会加载新的转发 DLL.

The function void LdrpMapAndSnapDependency(LDRP_LOAD_CONTEXT* LoadContext) walks the current loaded DLL import table and loads its direct (1st level) dependent DLLs by calling LdrpLoadDependentModule() (which internally calls LdrpMapAndSnapDependency() for the newly loaded DLL - so this process is recursive). Finally, LdrpMapAndSnapDependency() needs to call NTSTATUS LdrpSnapModule(LDRP_LOAD_CONTEXT* LoadContext) to bind imports to the already loaded DLLs. LdrpSnapModule() is executed for many DLLs in the top level DLL load process, and this process is independent for every DLL - so this is a good place to parallelize. LdrpSnapModule() in most cases does not load new DLLs, but only binds import to export from already loaded ones. But if an import is resolved to a forwarded export (which rarely happens) - the new, forwarded DLL, is loaded.

一些当前的实施细节:

  1. 首先,让我们看看 struct _RTL_USER_PROCESS_PARAMETERS 新字段 - ULONG LoaderThreads.这个 LoaderThreads(如果设置为非零)启用或禁用并行加载器"在新的过程中.当我们通过 ZwCreateUserProcess() 创建新进程时- 第 9 个参数是PRTL_USER_PROCESS_PARAMETERS 过程参数.但是如果我们使用 CreateProcess[Internal]W() - 我们不能直接传递 PRTL_USER_PROCESS_PARAMETERS - 只有 STARTUPINFO.RTL_USER_PROCESS_PARAMETERS 是从STARTUPINFO 部分初始化的,但是我们不控制ULONG LoaderThreads,它永远为零(如果我们不调用ZwCreateUserProcess() 或为此例程设置一个钩子).

  1. first of all, let us look into the struct _RTL_USER_PROCESS_PARAMETERS new field - ULONG LoaderThreads. this LoaderThreads (if set to nonzero) enables or disables "Parallel loader" in the new process. When we create a new process by ZwCreateUserProcess() - the 9th argument is PRTL_USER_PROCESS_PARAMETERS ProcessParameters. but if we use CreateProcess[Internal]W() - we cannot pass PRTL_USER_PROCESS_PARAMETERS directly - only STARTUPINFO. RTL_USER_PROCESS_PARAMETERS is partially initialized from STARTUPINFO, but we do not control ULONG LoaderThreads, and it will always be zero (if we do not call ZwCreateUserProcess() or set a hook to this routine).

在新的进程初始化阶段,调用LdrpInitializeExecutionOptions()(来自LdrpInitializeProcess()).此例程检查 HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows NTCurrentVersionImage File Execution Options 的几个值(如果 子键存在 - 通常不存在),包括 MaxLoaderThreads (REG_DWORD) - 如果 MaxLoaderThreads 存在 - 它的值覆盖RTL_USER_PROCESS_PARAMETERS.LoaderThreads.

In the new process initialization phase, LdrpInitializeExecutionOptions() is called (from LdrpInitializeProcess()). This routine checks HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows NTCurrentVersionImage File Execution Options<app name> for several values (if the <app name> subkey exists - usually it doesn't), including MaxLoaderThreads (REG_DWORD) - if MaxLoaderThreads exists - its value overrides RTL_USER_PROCESS_PARAMETERS.LoaderThreads.

LdrpCreateLoaderEvents() 被调用.此例程必须创建 2 个全局事件:HANDLE LdrpWorkCompleteEvent, LdrpLoadCompleteEvent;,用于同步.

LdrpCreateLoaderEvents() is called. This routine must create 2 global events: HANDLE LdrpWorkCompleteEvent, LdrpLoadCompleteEvent;, which are used for synchronization.

NTSTATUS LdrpCreateLoaderEvents()
{
    NTSTATUS status = ZwCreateEvent(&LdrpWorkCompleteEvent, EVENT_ALL_ACCESS, 0, SynchronizationEvent, TRUE);

    if (0 <= status)
    {
        status = ZwCreateEvent(&LdrpLoadCompleteEvent, EVENT_ALL_ACCESS, 0, SynchronizationEvent, TRUE);
    }
    return status;
}

  • LdrpInitializeProcess() 调用 void LdrpDetectDetour().这个名字不言自明.它不返回值而是初始化全局变量BOOLEAN LdrpDetourExist.这个例程首先检查一些加载器关键例程是否被钩住 - 目前有 5 个例程:

  • LdrpInitializeProcess() calls void LdrpDetectDetour(). This name speaks for itself. it does not return a value but initializes the global variable BOOLEAN LdrpDetourExist. This routine first checks whether some loader critical routines are hooked - currently these are 5 routines:

    • NtOpenFile
    • NtCreateSection
    • NtQueryAttributesFile
    • NtOpenSection
    • NtMapViewOfSection

    如果是 - LdrpDetourExist = TRUE;

    如果没有钩住 - ThreadDynamicCodePolicyInfo 被查询 - 完整代码:

    If not hooked - ThreadDynamicCodePolicyInfo is queried - full code:

    void LdrpDetectDetour()
    {
        if (LdrpDetourExist) return ;
    
        static PVOID LdrpCriticalLoaderFunctions[] = {
            NtOpenFile,
            NtCreateSection,
            ZwQueryAttributesFile,
            ZwOpenSection,
            ZwMapViewOfSection,
        };
    
        static M128A LdrpThunkSignature[5] = {
            //***
        };
    
        ULONG n = RTL_NUMBER_OF(LdrpCriticalLoaderFunctions);
        M128A* ppv = (M128A*)LdrpCriticalLoaderFunctions;
        M128A* pps = LdrpThunkSignature; 
        do
        {
            if (ppv->Low != pps->Low || ppv->High != pps->High)
            {
                if (LdrpDebugFlags & 5)
                {
                    DbgPrint("!!! Detour detected, disable parallel loading
    ");
                    LdrpDetourExist = TRUE;
                    return;
                }
            }
    
        } while (pps++, ppv++, --n);
    
        BOOL DynamicCodePolicy;
    
        if (0 <= ZwQueryInformationThread(NtCurrentThread(), ThreadDynamicCodePolicyInfo, &DynamicCodePolicy, sizeof(DynamicCodePolicy), 0))
        {
            if (LdrpDetourExist = (DynamicCodePolicy == 1))
            {
                if (LdrpMapAndSnapWork)
                {
                    WaitForThreadpoolWorkCallbacks(LdrpMapAndSnapWork, TRUE);//TpWaitForWork
                    TpReleaseWork(LdrpMapAndSnapWork);//CloseThreadpoolWork
                    LdrpMapAndSnapWork = 0;
                    TpReleasePool(LdrpThreadPool);//CloseThreadpool
                    LdrpThreadPool = 0;
                }
            }
        }
    }
    

  • LdrpInitializeProcess() 调用 NTSTATUS LdrpEnableParallelLoading (ULONG LoaderThreads) - 作为 LdrpEnableParallelLoading(ProcessParameters->LoaderThreads):>

  • LdrpInitializeProcess() calls NTSTATUS LdrpEnableParallelLoading (ULONG LoaderThreads) - as LdrpEnableParallelLoading(ProcessParameters->LoaderThreads):

    NTSTATUS LdrpEnableParallelLoading (ULONG LoaderThreads)
    {
        LdrpDetectDetour();
    
        if (LoaderThreads)
        {
            LoaderThreads = min(LoaderThreads, 16);// not more than 16 threads allowed
            if (LoaderThreads <= 1) return STATUS_SUCCESS;
        }
        else
        {
            if (RtlGetSuiteMask() & 0x10000) return STATUS_SUCCESS; 
            LoaderThreads = 4;// default for 4 threads
        }
    
        if (LdrpDetourExist) return STATUS_SUCCESS;
    
        NTSTATUS status = TpAllocPool(&LdrpThreadPool, 1);//CreateThreadpool
    
        if (0 <= status)
        {
            TpSetPoolWorkerThreadIdleTimeout(LdrpThreadPool, -300000000);// 30 second idle timeout
            TpSetPoolMaxThreads(LdrpThreadPool, LoaderThreads - 1);//SetThreadpoolThreadMaximum 
            TP_CALLBACK_ENVIRON CallbackEnviron = { };
            CallbackEnviron->CallbackPriority = TP_CALLBACK_PRIORITY_NORMAL;
            CallbackEnviron->Size = sizeof(TP_CALLBACK_ENVIRON);
            CallbackEnviron->Pool = LdrpThreadPool;
            CallbackEnviron->Version = 3;
    
            status = TpAllocWork(&LdrpMapAndSnapWork, LdrpWorkCallback, 0, &CallbackEnviron);//CreateThreadpoolWork
        }
    
        return status;
    }
    

    创建了一个特殊的加载器线程池 - LdrpThreadPool,具有 LoaderThreads - 1 个最大线程.空闲超时设置为30秒(之后线程退出)并分配PTP_WORK LdrpMapAndSnapWork,然后在void LdrpQueueWork(LDRP_LOAD_CONTEXT* LoadContext)中使用.

    A special loader thread pool is created - LdrpThreadPool, with LoaderThreads - 1 max threads. Idle timeout is set to 30 seconds (after which the thread exits) and allocated PTP_WORK LdrpMapAndSnapWork, which is then used in void LdrpQueueWork(LDRP_LOAD_CONTEXT* LoadContext).

    并行加载器使用的全局变量:

    Global variables used by the parallel loader:

    HANDLE LdrpWorkCompleteEvent, LdrpLoadCompleteEvent;
    CRITICAL_SECTION LdrpWorkQueueLock;
    LIST_ENTRY LdrpWorkQueue = { &LdrpWorkQueue, &LdrpWorkQueue };
    
    
    ULONG LdrpWorkInProgress;
    BOOLEAN LdrpDetourExist;
    PTP_POOL LdrpThreadPool;
    
    PTP_WORK LdrpMapAndSnapWork;
    
    enum DRAIN_TASK {
        WaitLoadComplete, WaitWorkComplete
    };
    
    struct LDRP_LOAD_CONTEXT
    {
        UNICODE_STRING BaseDllName;
        PVOID somestruct;
        ULONG Flags;//some unknown flags
        NTSTATUS* pstatus; //final status of load
        _LDR_DATA_TABLE_ENTRY* ParentEntry; // of 'parent' loading dll
        _LDR_DATA_TABLE_ENTRY* Entry; // this == Entry->LoadContext
        LIST_ENTRY WorkQueueListEntry;
        _LDR_DATA_TABLE_ENTRY* ReplacedEntry;
        _LDR_DATA_TABLE_ENTRY** pvImports;// in same ordef as in IMAGE_IMPORT_DESCRIPTOR piid
        ULONG ImportDllCount;// count of pvImports
        LONG TaskCount;
        PVOID pvIAT;
        ULONG SizeOfIAT;
        ULONG CurrentDll; // 0 <= CurrentDll < ImportDllCount
        PIMAGE_IMPORT_DESCRIPTOR piid;
        ULONG OriginalIATProtect;
        PVOID GuardCFCheckFunctionPointer;
        PVOID* pGuardCFCheckFunctionPointer;
    };
    

    不幸的是 LDRP_LOAD_CONTEXT 未包含在已发布的 .pdb 文件中,因此我的定义仅包含部分名称.

    Unfortunately LDRP_LOAD_CONTEXT is not contained in published .pdb files, so my definitions include only partial names.

    struct {
        ULONG MaxWorkInProgress;//4 - values from explorer.exe at some moment
        ULONG InLoaderWorker;//7a (this mean LdrpSnapModule called from worker thread)
        ULONG InLoadOwner;//87 (LdrpSnapModule called direct, in same thread as `LdrpMapAndSnapDependency`)
    } LdrpStatistics;
    
    // for statistics
    void LdrpUpdateStatistics()
    {
      LdrpStatistics.MaxWorkInProgress = max(LdrpStatistics.MaxWorkInProgress, LdrpWorkInProgress);
      NtCurrentTeb()->LoaderWorker ? LdrpStatistics.InLoaderWorker++ : LdrpStatistics.InLoadOwner++
    }
    

    TEB.CrossTebFlags - 现在存在 2 个新标志:

    In TEB.CrossTebFlags - now exist 2 new flags:

    USHORT LoadOwner : 01; // 0x1000;
    USHORT LoaderWorker : 01; // 0x2000;
    

    最后 2 位是空闲的 (USHORT SpareSameTebBits : 02;//0xc000)

    Last 2 bits is spare (USHORT SpareSameTebBits : 02; // 0xc000)

    LdrpMapAndSnapDependency(LDRP_LOAD_CONTEXT* LoadContext) 包括以下代码:

    LDR_DATA_TABLE_ENTRY* Entry = LoadContext->CurEntry;
    if (LoadContext->pvIAT)
    {
        Entry->DdagNode->State = LdrModulesSnapping;
        if (LoadContext->PrevEntry)// if recursive call
        {
            LdrpQueueWork(LoadContext); // !!!
        }
        else
        {
            status = LdrpSnapModule(LoadContext);
        }
    }
    else
    {
        Entry->DdagNode->State = LdrModulesSnapped;
    }
    

    所以,如果 LoadContext->PrevEntry(假设我们加载 user32.dll.在第一次调用 LdrpMapAndSnapDependency() 时,LoadContext->PrevEntry 将始终为 0(当 CurEntry 指向 user32.dll 时),但是当我们递归调用 LdrpMapAndSnapDependency() 依赖 gdi32.dll - PrevEntry 将用于 user32.dllCurEntry 用于 gdi32.dll),我们不直接调用LdrpSnapModule(LoadContext);而是LdrpQueueWork(LoadContext);.

    So, if LoadContext->PrevEntry (say we load user32.dll. In the first call to LdrpMapAndSnapDependency(), LoadContext->PrevEntry will be always 0 (when CurEntry points to user32.dll), but when we recursively call LdrpMapAndSnapDependency() for it dependency gdi32.dll - PrevEntry will be for user32.dll and CurEntry for gdi32.dll), we do not direct call LdrpSnapModule(LoadContext); but LdrpQueueWork(LoadContext);.

    LdrpQueueWork() 很简单:

    void LdrpQueueWork(LDRP_LOAD_CONTEXT* LoadContext)
    {
        if (0 <= ctx->pstatus)
        {
            EnterCriticalSection(&LdrpWorkQueueLock);
    
            InsertHeadList(&LdrpWorkQueue, &LoadContext->WorkQueueListEntry);
    
            LeaveCriticalSection(&LdrpWorkQueueLock);
    
            if (LdrpMapAndSnapWork && !RtlGetCurrentPeb()->Ldr->ShutdownInProgress)
            {
                SubmitThreadpoolWork(LdrpMapAndSnapWork);//TpPostWork
            }
        }
    }
    

    我们将 LoadContext 插入到 LdrpWorkQueue 中,如果Parallel loader"已启动 (LdrpMapAndSnapWork != 0) 而不是 ShutdownInProgress - 我们将工作提交到加载器池.但是即使池没有初始化(比如因为 Detours 存在) - 也不会出现错误 - 我们在 LdrpDrainWorkQueue() 中处理这个任务.

    We insert LoadContext to LdrpWorkQueue and if "Parallel loader" is started (LdrpMapAndSnapWork != 0) and not ShutdownInProgress - we submit work to loader pool. But even if the pool is not initialized (say because Detours exist) - there will be no error - we process this task in LdrpDrainWorkQueue().

    在工作线程回调中执行:

    In a worker thread callback, this is executed:

    void LdrpWorkCallback()
    {
        if (LdrpDetourExist) return;
    
        EnterCriticalSection(&LdrpWorkQueueLock);
    
        PLIST_ENTRY Entry = RemoveEntryList(&LdrpWorkQueue);
    
        if (Entry != &LdrpWorkQueue)
        {
            ++LdrpWorkInProgress;
            LdrpUpdateStatistics()
        }
    
        LeaveCriticalSection(&LdrpWorkQueueLock);
    
        if (Entry != &LdrpWorkQueue)
        {
            LdrpProcessWork(CONTAINING_RECORD(Entry, LDRP_LOAD_CONTEXT, WorkQueueListEntry), FALSE);
        }
    }
    

    我们只需从 LdrpWorkQueue 中弹出一个条目,将其转换为 LDRP_LOAD_CONTEXT* (CONTAINING_RECORD(Entry, LDRP_LOAD_CONTEXT, WorkQueueListEntry)) 并调用 <代码>void LdrpProcessWork(LDRP_LOAD_CONTEXT* LoadContext, BOOLEAN LoadOwner).

    We simply popup an entry from LdrpWorkQueue, convert it to LDRP_LOAD_CONTEXT* (CONTAINING_RECORD(Entry, LDRP_LOAD_CONTEXT, WorkQueueListEntry)) and call void LdrpProcessWork(LDRP_LOAD_CONTEXT* LoadContext, BOOLEAN LoadOwner).

    void LdrpProcessWork(LDRP_LOAD_CONTEXT* ctx, BOOLEAN LoadOwner)通常调用 LdrpSnapModule(LoadContext) 并在最后执行下一个代码:

    void LdrpProcessWork(LDRP_LOAD_CONTEXT* ctx, BOOLEAN LoadOwner) in general calls LdrpSnapModule(LoadContext) and in the end the next code is executed:

    if (!LoadOwner)
    {
        EnterCriticalSection(&LdrpWorkQueueLock);
        BOOLEAN bSetEvent = --LdrpWorkInProgress == 1 && IsListEmpty(&LdrpWorkQueue);
        LeaveCriticalSection(&LdrpWorkQueueLock);
        if (bSetEvent) ZwSetEvent(LdrpWorkCompleteEvent, 0);
    }
    

    所以,如果我们不是LoadOwner(在工作线程中),我们递减LdrpWorkInProgress,如果LdrpWorkQueue 为空,则信号LdrpWorkCompleteEvent(LoadOwner 可以等待).

    So, if we are not LoadOwner (in worked thread), we decrement LdrpWorkInProgress, and if LdrpWorkQueue is empty then signal LdrpWorkCompleteEvent (LoadOwner can wait on it).

    最后,LdrpDrainWorkQueue()LoadOwner(主线程)被调用到drain".工作队列.它可以弹出并直接执行由 LdrpQueueWork() 推送到 LdrpWorkQueue 的任务,但不会被工作线程弹出或因为并行加载器被禁用(在这种情况下LdrpQueueWork() 也推送 LDRP_LOAD_CONTEXT 但并没有真正将工作发布到工作线程),最后等待(如果需要)LdrpWorkCompleteEventLdrpLoadCompleteEvent 事件.

    and finally, LdrpDrainWorkQueue() is called from LoadOwner (primary thread) to "drain" the WorkQueue. It can possible pop and directly execute tasks pushed to LdrpWorkQueue by LdrpQueueWork(), and yet is not popped by worked threads or because parallel loader is disabled (in this case LdrpQueueWork() also push LDRP_LOAD_CONTEXT but not really post work to worked thread), and finally wait (if need) on LdrpWorkCompleteEvent or LdrpLoadCompleteEvent events.

    enum DRAIN_TASK {
        WaitLoadComplete, WaitWorkComplete
    };
    
    void LdrpDrainWorkQueue(DRAIN_TASK task)
    {
        BOOLEAN LoadOwner = FALSE;
    
        HANDLE hEvent = task ? LdrpWorkCompleteEvent : LdrpLoadCompleteEvent;
    
        for(;;)
        {
            PLIST_ENTRY Entry;
    
            EnterCriticalSection(&LdrpWorkQueueLock);
    
            if (LdrpDetourExist && task == WaitLoadComplete)
            {
                if (!LdrpWorkInProgress)
                {
                    LdrpWorkInProgress = 1;
                    LoadOwner = TRUE;
                }
                Entry = &LdrpWorkQueue;
            }
            else
            {
                Entry = RemoveHeadList(&LdrpWorkQueue);
    
                if (Entry == &LdrpWorkQueue)
                {
                    if (!LdrpWorkInProgress)
                    {
                        LdrpWorkInProgress = 1;
                        LoadOwner = TRUE;
                    }
                }
                else
                {
                    if (!LdrpDetourExist)
                    {
                        ++LdrpWorkInProgress;
                    }
                    LdrpUpdateStatistics();
                }
            }
            LeaveCriticalSection(&LdrpWorkQueueLock);
    
            if (LoadOwner)
            {
                NtCurrentTeb()->LoadOwner = 1;
                return;
            }
    
            if (Entry != &LdrpWorkQueue)
            {
                LdrpProcessWork(CONTAINING_RECORD(Entry, LDRP_LOAD_CONTEXT, WorkQueueListEntry), FALSE);
            }
            else
            {
                ZwWaitForSingleObject(hEvent, 0, 0);
            }
        }
    }
    

  • void LdrpDropLastInProgressCount()
    {
      NtCurrentTeb()->LoadOwner = 0;
      EnterCriticalSection(&LdrpWorkQueueLock);
      LdrpWorkInProgress = 0;
      LeaveCriticalSection(&LdrpWorkQueueLock);
      ZwSetEvent(LdrpLoadCompleteEvent);
    }
    

  • 相关文章