Wednesday, September 30, 2009

Screen Scraping with C#

Something I have been messing around with lately is the ability to read other applications controls... or screen scraping. In general, screen scraping is a bad solution, using shared memory or some other IPC is much cleaner. However, there are times when screen scraping is the only practical solution (generally if the target application is closed source and the developers don't want to export the data).

So, how would we go about reading some values from another application? Windows actually makes it fair easy. As you may have seen in an earlier post, we have to use some Win32 API functions, so lets get the DllImport stuff out of the way first:


[DllImport("user32.dll", CharSet = CharSet.Auto)]
private static extern int SendMessage(IntPtr hWnd, int wMsg, int wParam, StringBuilder lParam);
[DllImport("user32.dll", CharSet = CharSet.Auto)]
private static extern int GetWindowText(int hWnd, StringBuilder lpString, int length);
[DllImport("user32.dll", CharSet = CharSet.Auto)]
private static extern int GetWindowTextLength(int hWnd);
[DllImport("user32.dll")]
static extern bool EnumChildWindows(IntPtr hWndParent, WindowEnumDelegate lpEnumFunc, int lParam);
[DllImport("user32.dll")]
static extern IntPtr FindWindow(StringBuilder lpClassName, StringBuilder lpWindowName);
[DllImport("user32.dll")]
static extern uint RealGetWindowClass(IntPtr hwnd, StringBuilder pszType, uint cchType);

const int WM_GETTEXT = 13;
const int WM_GETTEXTLENGTH = 14;


Okay, so if we want to find a particular window, we can use FindWindow(), like this:

StringBuilder name = new StringBuilder("Notepad");

hWnd = FindWindow(null, name);


Easy! we now have a handle for the window (or null if we couldn't find it). Now its a simple matter of finding all the child handles and getting their window text!


public delegate bool WindowEnumDelegate(IntPtr hwnd, int lParam);

private bool WindowEnumProc(IntPtr handle, int lParam)
{
int textLen;
StringBuilder text;
StringBuilder className = new StringBuilder(1024);
RealGetWindowClass(handle, className, 1024);

textLen = SendMessage(handle, WM_GETTEXTLENGTH, 0, null);
if (textLen != 0)
{
text = new StringBuilder(textLen);
SendMessage(handle, WM_GETTEXT, (textLen + 1), text);
}

...

return true;
}

...

WindowEnumDelegate del = new WindowEnumDelegate(WindowEnumProc);
EnumChildWindows(hWnd, del, 0);


The above function will be called once for each child of the window we found earlier, and it will find the type of object (className), and send a WM_GETTEXT message which will attempt retrieve text from the objects.

And there we have it! Code for a very basic screen scraper. There is, however, some issues with it. Some objects do not respond in useful ways to WM_GETTEXT messages. Some objects have a whole bunch of data contained under a single handle. For example, a TreeView object will return nothing when presented with a WM_GETTEXT message... special handling code is needed for this, and several other objects, but I'll leave that for another day (I have TreeView handling code I will post about later).

1 comment:

  1. Any chance you could complile this into a small demo with the source code? I am having a difficult time getting this to work in Visual Studio using c#.

    ReplyDelete