pr0g33k

 collapse all
  1. Using PATINDEX to parse HTML with a Regular Expression

    When I created the RSS feed for this blog, I didn't want to return each entire blog post but, rather, just the first paragraph. I tend to get a little wordy (it's a personal flaw; I'm working on it) and I wanted to display as many entries in the RSS as I could without it taking too long to download to the client. I first thought I would query the blog posts from the database and use a regular expression to extract the first paragraph since SQL doesn't have native support for regular expressions. (There's the SQL Server CLR route, sure, but since I'm using SQL Azure, I don't have that option.) Then I remembered PATINDEX. According to the online documentation, PATINDEX returns "the starting position of the first occurrence of a pattern in a specified expression, or zeros if the pattern is not found, on all valid text and character data types." Perfect! The pattern language is pretty limited but so is the HTML I'm wanting to match. Here's a little test script I threw together to test it out:

    Declare @Html table
    (
    	Markup varchar(1000)
    )
    
    Insert @Html (Markup) Select '<p>This is what I want.</p><p>I want to ignore this.</p>'
    Insert @Html (Markup) Select '<div>I do not want this.</div><pre>Some preformatted text I do not want.</pre><p class="my-css">This is what I want.</p>'
    Insert @Html (Markup) Select '<div>I do not want this.</div><span>I do not want this, either</span>'
    Insert @Html (Markup) Select '<param>A param I do not want.</param><progress>I do not want progress (well, not this progress)</progress><p>This is what I want!</p>'
    
    Declare @Begin varchar(50) = '%<p[^re,aram,rogress]%'
    Declare @End varchar(50) = '%</p>%'
    
    Select PATINDEX(@Begin, Markup) As FirstMarker
    	, Case When PATINDEX(@End, Markup) > 0 Then PATINDEX(@End, Markup) + 3 Else 0 End As SecondMarker
    	, SUBSTRING(Markup, PATINDEX(@Begin, Markup), Case When PATINDEX(@End, Markup) > 0 Then PATINDEX(@End, Markup) + 3 Else 0 End) As Body
    From @Html
        

    The @Begin variable holds a pattern that matches all HTML tags starting with "p" but ignores <pre>, <param>, and <progress> which just leaves <p>. I left the begin tag "open" (notice there's no closing ">") so that I could capture any attributes on the paragraph tag. I'm only interested in a part of the string that begins with "<p" and ends with "</p>". Using SUBSTRING, I can extract the first paragraph of the HTML. Now I just have to remember to put a summary in the first paragraph of each blog post...

  2. Building a Tag Cloud using MVC, SQL, HTML5, and jQuery

    One of the things I definitely wanted to do when I rebuilt this blog was implement a tag cloud. To make blogging easy, I just wanted to have a comma-separated list of tags that I attached to each blog post. To get the list of tags and their occurrences, I used the following SQL stored procedure:

    Create Procedure [dbo].[Blog_GetTagsForCloud]
    As
    Begin
    	Set NoCount On
    
    	Select LTRIM(RTRIM(T.Value)) As Tag
    		, COUNT(*) As TagCount
    	From dbo.Blog
    		Cross Apply dbo.fn_ParseDelimitedStrings(dbo.Blog.Tags, ',') As T
    	Where dbo.Blog.IsActive = 1
    	Group By LTRIM(RTRIM(T.Value))
     	Order By NEWID()
    End
        

    The fn_ParseDelimitedStrings table-valued function is used in a Cross Apply so that the Tags column can be evaluated for each record in the Blog table. Grouping on the function's Value column gives us a distinct list of tags and in conjunction with the COUNT(*) function, we get a total count of each tag. The "Order By NEWID()" is there to randomize the order of the tags (as random as it can be, anyway). Here's the fn_ParseDelimitedStrings function:

    Create Function [dbo].[fn_ParseDelimitedStrings](@String nvarchar(MAX), @Delimiter char(1))
    Returns @Values Table 
    (
    	Id int Not Null Identity(1,1) Primary Key
    	, Value nvarchar(MAX) Not Null
    )
    As
    Begin
    	If (Right(@String, 1) != @Delimiter)
    		Set @String = @String + @Delimiter
    
    	Declare @StartPosition smallint = 1
    	Declare @EndPosition smallint = CharIndex(@Delimiter, @String)
    
    	While @EndPosition > 0
    	Begin
    		Insert @Values(Value)
    			Select LTrim(RTrim(SubString(@String, @StartPosition, @EndPosition - @StartPosition)))
    
    		Set @String = Stuff(@String, @EndPosition, 1, '')
    		Set @StartPosition = @EndPosition
    		Set @EndPosition = CharIndex(@Delimiter, @String)
    	End
    
    	Return
    End
        

    Once I have the distinct list of tags and their count, I use a nifty little formula to figure out their proportions and ratios compared to the proportions and ratios of the font sizes I want to use. The formula looks like this:

    Font Size = ((([the count of the tag being computed] - [the lowest occurrence]) * ([the largest font size] - [the smallest font size])) / ([the highest occurrence] - [the lowest occurrence])) + [the smallest font size]

    Simple, right? ;^) Translated to C#, it looks like this:

    public Dictionary<String, Int32> GetTagsForCloud()
    {
        Dictionary<String, Int32> items = new Dictionary<String, Int32>();
    
        using (SqlDataReader reader = GetDataReader("dbo.Blog_GetTagsForCloud"))
        {
            while (reader.Read())
                items.Add(Convert.ToString(reader["Tag"]), Convert.ToInt32(reader["TagCount"]));
    
            reader.Close();
        }
    
        Dictionary<String, Int32> tagCloud = new Dictionary<String, Int32>();
    
        if (items.Count > 0)
        {
            Int32 minValue = items.Min(kvp => kvp.Value);
            Int32 maxValue = items.Max(kvp => kvp.Value);
            Int32 divisor = maxValue - minValue;
            Int32 minFont = 12;
            Int32 maxFont = 36;
    
            if (divisor == 0)
                divisor = 1;
    
            foreach (var item in items)
                tagCloud.Add(item.Key, (((item.Value - minValue) * (maxFont - minFont)) / divisor) + minFont);
        }
    
        return tagCloud;
    }
        

    I have a data abstraction layer and that's where the GetTagsForCloud() function resides. The ORM I use maps column names (or aliases) to objects but since this is a little outside my object structure, I just grab a SqlDataReader and iterate it.

    Note that I had to use the "divisor" variable to counter the possibility of minValue and maxValue being the same. That would result in a divide-by-zero error.

    Next I created a partial view (_TagCloud.cshtml) and put it in the Views/Shared folder since I use it in my _Layout.cshtml:

    @inherits RobertGaut.Pr0g33k.Web.Views.Shared.TagCloudView
    <div id="tag-cloud">
        @foreach (var tag in TagCloud)
        {
            @Html.ActionLink(tag.Key, "Index", new { tag = tag.Key }, new { data_font_size = String.Format("{0}", tag.Value), title = String.Format("View all posts tagged with '{0}'", tag.Key) })
        }
    </div>
        

    And, since I just feel funny about putting C# in my Views, I had the partial inherit from this class:

    public abstract class TagCloudView : WebViewPage
    {
        public Dictionary<String, Int32> TagCloud { get; set; }
    
        protected override void InitializePage()
        {
            base.InitializePage();
            TagCloud = BlogManager.Instance.GetTagsForCloud();
        }
    }
        

    I marked the class "abstract" so that I didn't have to implement the WebPageView's Execute() method. I initially tried to put the call to GetTagsForCloud() in the Execute() method but the timing wasn't right so I moved it to an earlier call, InitializePage(), and the data came through to my partial view just fine.

    Now I had the problem of figuring out how to apply the font size to the hyperlink. I try to avoid using inline styles with HTML5. People keep telling me that's soooo XHTML Transitional 1.0 so I just avoid it altogether. But I definitely didn't want to have to do it with a class attribute and define 24 class selectors! Let me explain. If you look back up at the GetTagsForCloud() method, you'll notice that I define 2 variables:

    Int32 minFont = 12;
    Int32 maxFont = 36;
        

    I'm using them in the formula to make a ratio comparison to the current occurrence and the minimum occurrence. Basically, I'm setting a range from 12 to 36 which I'll translate to "12px" to "36px." If I used a style selector for each size, they'd look something like this:

    .font12 {
        font-size: 12px;
    }
    
    /* the in-between definitions would go here if I wasn't too lazy to type them */
    
    .font36 {
        font-size: 36px;
    }
        

    That's just plain nasty. So instead, I added the value to an HTML5 "data-" attribute. Check out the _TagCloud.cshtml partial view and you'll notice the overload for htmlAttributes has the following:

    data_font_size = String.Format("{0}", tag.Value)
        

    If you tried to type it as "data-font-size" the way it is done in the actual HTML element, you'd get red squigglies telling you that that's no bueno – you can't use hyphens there because it needs to translate the word to a variable name when Razor parses/compiles the statement. By using underscores, though, the Razor view engine outputs the attribute with hyphens. Which, I thought, is pretty darn sweet. Now that I have anchor elements output to the page I can use the following jQuery script to set the font size using the data-font-size attribute.

    <script>
        $(document).ready(function () {
            $('#tag-cloud > a').each(function () {
                $(this).css({ 'font-size': $(this).attr('data-font-size') + 'px' });
            });
        });
    </script>
        

    UPDATE

    Alternatively, you can output the tag cloud using @Html.RenderAction(). I don't know why I didn't do this in the first place. Anyway, here's what the controller looks like:

    public class SharedController : Controller
    {
        public ActionResult TagCloud()
        {
            return PartialView(BlogManager.Instance.GetTagsForCloud());
        }
    }
        

    I put it in a "shared" controller because, well, _Layout.cshtml doesn't really have a controller associated with it. The partial view looks like this:

    @model System.Collections.Generic.Dictionary<String, Int32>
    <div id="tag-cloud">
        @foreach (var tag in Model)
        {
            @Html.ActionLink(tag.Key, "Index", new { tag = tag.Key }, new { data_font_size = String.Format("{0}", tag.Value), title = String.Format("View all posts tagged with '{0}'", tag.Key) })
        }
    </div>
        

    In _Layout.cshtml, I replaced the call to @{ Html.Partial("_TagCloud"); } with @{ Html.RenderAction("TagCloud", "Shared"); }

    This is a much better solution, IMHO.