Saturday, October 3, 2009

TreeView Scraping

Last time we looked at pulling text from an object... and noted that some more complex objects didn't respond to WM_GETTEXT. Well, today we are going to poke around in a TreeView (SysTreeView32 is what its reported as specifically) widget. Just so we are clear with what we are talking about:




When we use the code presented last time on the above application, we get a handle for the TreeView object, and RealGetWindowClass() returns "SysTreeView32", and WM_GETTEXT returns nothing. Because the nodes are not exposed as child windows, we only get a single handle for the whole thing... and WM_GETTEXT is not meaningful in that context. So what we actually want to do, is recognise we have a TreeView object, and send TreeView specific messages to it.

Lets get the DllImports out of the way soon. These are the Win32 API functions we will be using:

#region WIN32_DLLIMPORT
[DllImport("user32.dll", EntryPoint = "SendMessage")]
private static extern IntPtr SendMessage(IntPtr hWnd, int wMsg, int wParam, IntPtr lParam);
[DllImport("user32.dll")]
static extern IntPtr GetWindowThreadProcessId(IntPtr hWnd, out int lpdwProcessId);
[DllImport("kernel32.dll")]
static extern IntPtr OpenProcess(int dwDesiredAccess, bool bInheritHandle,int dwProcessId);
[DllImport("kernel32.dll")]
static extern bool CloseHandle(IntPtr hObject);
[DllImport("kernel32.dll")]
static extern IntPtr VirtualAllocEx(IntPtr hProcess, int lpAddress, int dwSize,int flAllocationType, int flProtect);
[DllImport("kernel32.dll")]
static extern bool VirtualFreeEx(IntPtr hProcess, IntPtr lpAddress, int dwSize, int dwFreeType);

[DllImport("kernel32.dll")]
static extern bool WriteProcessMemory(IntPtr hProcess, IntPtr lpBaseAddress, IntPtr lpBuffer, int nSize, IntPtr lpNumberOfBytesWritten);
[DllImport("kernel32.dll")]
static extern bool ReadProcessMemory(IntPtr hProcess, IntPtr lpBaseAddress, IntPtr lpBuffer, int nSize, IntPtr lpNumberOfBytesRead);


const int PROCESS_ALL_ACCESS = 0x0008 | 0x0010 | 0x0020;
const int MEM_COMMIT = 0x1000;
const int PAGE_READWRITE = 0x04;
const int LVIF_TEXT = 0x0001;
const int MEM_RELEASE = 0x8000;
#endregion


Assuming that we are scraping a separate process, this is not quite as easy as just sending a WM_GETTEXT, as we need to send a structure which needs to be manipulated within the other process... this requires the other process to be able to access the memory, with a pointer that makes sense to it. Fortunately, we can create memory in another process with VirtualAllocEx()!

We also have some supporting const's, with values plundered from C header files.

Now, we also need to define some structures and constants that are specific to TreeView objects. It should be noted that the method to access ListView (and possibly other) objects is nearly identical.


public const int TV_FIRST = 0x1100;
public const int TVIF_TEXT = 0x0001;
public const int TVIF_PARAM = 0x4;

public enum TV_Messages
{
TVM_GETNEXTITEM = (TV_FIRST + 10),
TVM_GETITEM = (TV_FIRST + 62),
TVM_GETCOUNT = (TV_FIRST + 5),
TVM_SELECTITEM = (TV_FIRST + 11),
TVM_DELETEITEM = (TV_FIRST + 1),
TVM_EXPAND = (TV_FIRST + 2),
TVM_GETITEMRECT = (TV_FIRST + 4),
TVM_GETINDENT = (TV_FIRST + 6),
TVM_SETINDENT = (TV_FIRST + 7),
TVM_GETIMAGELIST = (TV_FIRST + 8),
TVM_SETIMAGELIST = (TV_FIRST + 9),
TVM_GETISEARCHSTRING = (TV_FIRST + 64),
TVM_HITTEST = (TV_FIRST + 17),
}

public enum TVM_EXPAND
{
TVE_COLLAPSE = 0x1,
TVE_EXPAND = 0x2,
TVE_TOGGLE = 0x3,
TVE_EXPANDPARTIAL = 0x4000
}

public enum TVM_GETNEXTITEM
{
TVGN_ROOT = 0x0,
TVGN_NEXT = 0x1,
TVGN_PREVIOUS = 0x2,
TVGN_PARENT = 0x3,
TVGN_CHILD = 0x4,
TVGN_FIRSTVISIBLE = 0x5,
TVGN_NEXTVISIBLE = 0x6,
TVGN_PREVIOUSVISIBLE = 0x7,
TVGN_DROPHILITE = 0x8,
TVGN_CARET = 0x9,
TVGN_LASTVISIBLE = 0xA
}


And the actual structure, complete with marshalling hints:

[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)]
public struct TVITEM
{
public int mask;
public IntPtr hItem;
public int state;
public int stateMask;
public IntPtr pszText;
public int cchTextMax;
public int iImage;
public int iSelectedImage;
public int cChildren;
public IntPtr lParam;
}


Okay, so we have everything set up, now lets run through the steps needed to actually pull this off! First, its assumed that our application has found the handle for a TreeView object in another application. Then we want to use the various flavours of TVM_GETNEXTITEM to get a handle for each node in the tree. Lastly, we want to query the node with TVM_GETITEM to find out what the actual text is. So a fair bit more involved than previously. But, one step at a time!

Lets get the root node to start with:

retval = SendMessage(hWnd, (int)TV_Messages.TVM_GETNEXTITEM, (int)TVM_GETNEXTITEM.TVGN_ROOT, IntPtr.Zero);


Now we have a handle to the node, lets actually extract the information. I'll go through this step by step, as there is a lot in it:

First, we need to find the PID of the target application, and open it with certain access rights. Note that this will only work if we are running as administrator!

threadId = GetWindowThreadProcessId(this.hWnd, out processID);
if ((threadId == IntPtr.Zero) || (processID == 0))
throw new ArgumentException("hWnd");

hProcess = OpenProcess(PROCESS_ALL_ACCESS, false, processID);
if (hProcess == IntPtr.Zero)
throw new ApplicationException("Failed to access process");


Now that we can access the other process, we need to allocate a blob of memory that can fit TVITEM and whatever data we are retrieving.

remoteBuffer = VirtualAllocEx(hProcess, 0, bufferSize, MEM_COMMIT, PAGE_READWRITE);
if (remoteBuffer == IntPtr.Zero)
throw new SystemException("Failed to allocate memory in remote process");


Before we call TVM_GETITEM we need to populate the TVITEM structure with some values to indicate what information we want and how much space is available to fill it. This is a two step operation. Firstly we want to allocate some memory locally (ie in our process), and fill it out with the values we want. We will then write this structure into our remotely allocated block of memory.

To complicate things, we need more space than just the structure, we also need a block of space for the string that will (hopefully!) get returned.


tvItem = new TVITEM();
int size = Marshal.SizeOf(tvItem);

tvItem.mask = TVIF_TEXT;
tvItem.hItem = hItem;
tvItem.pszText = (IntPtr)(remoteBuffer.ToInt32() + size + 1);
tvItem.cchTextMax = bufferSize - (size + 1);


For the sake of convenience, we are allocating a single block of memory, and going to write first the structure into it, and then use the remainder of the buffer for pszText. But since this structure is going to be written to the remote process, we need to make sure that pszText points to the right place, hence the above we set pszText to a position with removeBuffer.

We now have a tvItem that we want to write into the remote processes memory... however, its currently managed memory, so we need to figure out a way to marshal it and get it into unmanaged memory

IntPtr localBuffer = Marshal.AllocHGlobal(Marshal.SizeOf(tvItem)) ;
Marshal.StructureToPtr(tvItem, localBuffer, false);


Now to write that into the remote process:

bSuccess = WriteProcessMemory(hProcess, remoteBuffer, ptr , size,IntPtr.Zero);
if (!bSuccess)
throw new SystemException("Failed to write to process memory");


Phew. Still with me? we are now, finally, ready to call SendMessage()!

SendMessage(hWnd, (int)TV_Messages.TVM_GETITEM, 0, remoteBuffer);


Nothing we haven't seen before. But now we have another problem. We have (hopefully anyway) a structure in the remote processes memory that we want locally! So we need to reverse the above process:

bSuccess = ReadProcessMemory(hProcess, remoteBuffer, localBuffer, bufferSize, IntPtr.Zero);
if (!bSuccess)
throw new SystemException("Failed to read from process memory");

TVITEM retItem = (TVITEM) Marshal.PtrToStructure(localBuffer, (Type)typeof(TVITEM));


Cool, we have a TVITEM.... but its not quite right. It still contains a pointer to the unmanaged buffer (ie pszText), so lets marshall that too:


String pszItemText = Marshal.PtrToStringUni((IntPtr)(localBuffer.ToInt32() + size + 1));


pszItemText is actually the only bit we are interested in... so we are done retrieving stuff! From here, using the above code and the right TVM_GETNEXTITEM calls, we can walk the whole tree view and pull out the text. Given the image at the start, this would end up with the word 'Root' in pszItemText.


Before we do that however, lets make sure we tidy up:

if (localBuffer != IntPtr.Zero)
Marshal.FreeHGlobal(localBuffer);
if (remoteBuffer != IntPtr.Zero)
VirtualFreeEx(hProcess, remoteBuffer, 0, MEM_RELEASE);
if (hProcess != IntPtr.Zero)
CloseHandle(hProcess);