http://pythonprogramming.net/urllib-tutorial-python-3/

 

Python urllib tutorial for Accessing the Internet


 

The urllib module in Python 3 allows you access websites via your program. This opens up as many doors for your programs as the internet opens up for you. urllib in Python 3 is slightly different than urllib2 in Python 2, but they are mostly the same. Through urllib, you can access websites, download data, parse data, modify your headers, and do any GET and POST requests you might need to do.

Some websites do not appreciate programs accessing their data and placing weight on their servers. When they find out that a program is visiting them, they may sometimes choose to block you out, or serve you different data that a regular user might see. This can be annoying at first, but can be overcome with some simple code. To do this, you just need to modify the user-agent, which is a variable within your header that you send in. Headers are bits of data that you share with servers to let them know a bit about you. This is where Python, by default, tells the website that you are visiting with Python's urllib and your Python version. We can, however, modify this, and act as if we are a lowly Internet Explorer user, a Chrome user, or anything else really!

I would not recommend just blindly doing this, however, if a website is blocking you out. Websites will also employ other tactics as well, but usually they are doing it because they also offer an API that is specifically made more programs to access. Programs are usually just interested in the data, and do not need to be served fancy HTML or CSS data, nor data for advertisements, etc.

Here is the sample code that accompanies the video:

Here is the first and easiest example of using urllib. We just need to import urllib.requests. From there, we assign the opening of the url to a variable, where we can finally use a .read() command to read the data. The result is a massive mess, but we did indeed read the source code.

#Used to make requests
import urllib.request

x = urllib.request.urlopen('https://www.google.com/')
print(x.read())

Soon, we'll be using regular expressions to clean up the result. The problem is web pages use all sorts of HTML, CSS and javascript to make webpages appealing to the eye. Our programs really just don't care what the website looks like. We just want the text usually, so we need to get rid of all of the fluff. To do that, regular expressions become pretty useful, so we'll head there soon, after covering regex.

Next, sometimes, we want to put in values, or GET/POST, from/to a URL. There are two methods of data transfer with urls, and they are GET and POST. The natural method is a GET request, which means you make a request and you get data. The other is POST, where you send data into the server, like you post some data, and you get a request based on the post.

An example:

http://pythonprogramming.net/?s=basics&submit=Search

You see there are 2 variables here. You can see them because of the equals sign. The first variable is denoted with a question mark, and all of the subsequent ones are denoted with the & sign.

There are multiple ways to pass values through like this, you can just hard-code them, or you can use urllib to do it. Let's show an example of requests with urllib:

# used to parse values into the url
import urllib.parse


url = 'https://www.google.com/search'
values = {'q' : 'python programming tutorials'}

Above, we're defining the variables that we plan to POST to the url we specify.

From there, below, we're needing to first url encode all of the values. This is basically things like converting "spaces" to %20, for example.

Then we encode to utf-8 bytes. We make our request, adding in one more value, data, which is the encoded dictionary of keys and values, or better understood in this scenario as variables and their values. Then we open the url with the request that we've built, which we call a response, since that's what we get with it. Finally, we read that response with a .read().

data = urllib.parse.urlencode(values)
data = data.encode('utf-8') # data should be bytes
req = urllib.request.Request(url, data)
resp = urllib.request.urlopen(req)
respData = resp.read()

print(respData)

Turns out, Google will return a 405, method not allowed. Google is not happy with our request! Try the above on another website, modifying the variables. Find a website with a search bar, and see if you can make use of it via Python.

Finally, header modification. Sometimes, websites do not appreciate being visited by robots, or they might treat them differently. In the past, most websites, if they had a stance at all, would just block programs. Now, the prevailing method seems to be to serve different data to programs, so they don't realize as easily what has happened, or maybe to share information with the developers. Sometimes, they also simply serve the program with limited data, to keep the load on their servers low. Wikipedia used to outright block programs, but now they serve a page, same with Google. This is usually a page that is not what you actually want, so you will need to work around it.

Whenever you visit a link, you send in a header, which is just some basic information about you. This is how Google Analytics knows what browser you are using, for example.

Within the header, there is a value called user-agent, which defines the browser that is accessing the website's server.

If you are using the default python user-agent with urllib, then you are announcing yourself as Python-urllib/3.4, if your Python version is 3.4. This is either foreign to the website, or they will just block it entirely. A work around for this is to just identify yourself as something else entirely.

try:
    x = urllib.request.urlopen('https://www.google.com/search?q=test')
    #print(x.read())

    saveFile = open('noheaders.txt','w')
    saveFile.write(str(x.read()))
    saveFile.close()
except Exception as e:
    print(str(e))

The above output is from Google, who knows you are Python. Over the years, how Google and other websites have handled programs has changed, so this might change as well in time. The current response they are giving is just a default search page, once you parse through all the mess of code that is returned.

Google is doing this because we're telling Google who we are, a urllib Python bot! Let's change that by modifying our user-agent in the header.

try:
    url = 'https://www.google.com/search?q=python'

    # now, with the below headers, we defined ourselves as a simpleton who is
    # still using internet explorer.
    headers = {}
    headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
    req = urllib.request.Request(url, headers = headers)
    resp = urllib.request.urlopen(req)
    respData = resp.read()

    saveFile = open('withHeaders.txt','w')
    saveFile.write(str(respData))
    saveFile.close()
except Exception as e:
    print(str(e))

Above, we do basically the same thing, only this time, we build our request first, passing through the URL and the new modified headers. Then, we make the request and our response is indeed different. We actually get the data we were interested in back!

The next tutorial:

 

 

Posted by steloflute

http://www.brpreiss.com/books/opus5/html/page421.html

 

Reference Counting Garbage Collection

The difficulty in garbage collection is not the actual process of collecting the garbage--it is the problem of finding the garbage in the first place. An object is considered to be garbage when no references to that object exist. But how can we tell when no references to an object exist?

A simple expedient is to keep track in each object of the total number of references to that object. That is, we add a special field to each object called a reference count . The idea is that the reference count field is not accessible to the Java program. Instead, the reference count field is updated by the Java virtual machine itself.

Consider the statement

Object p = new Integer (57);

which creates a new instance of the Integer class. Only a single variable, p, refers to the object. Thus, its reference count should be one.

figure29618
Figure: Objects with reference counters.

Now consider the following sequence of statements:

Object p = new Integer (57);
Object q = p;

This sequence creates a single Integer instance. Both p and q refer to the same object. Therefore, its reference count should be two.

In general, every time one reference variable is assigned to another, it may be necessary to update several reference counts. Suppose p and q are both reference variables. The assignment

p = q;

would be implemented by the Java virtual machine as follows:

if (p != q)
{
    if (p != null)
	--p.refCount;
    p = q;
    if (p != null)
	++p.refCount;
}

For example suppose p and q are initialized as follows:

Object p = new Integer (57);
Object q = new Integer (99);

As shown in Figure gif (a), two Integer objects are created, each with a reference count of one. Now, suppose we assign q to p using the code sequence given above. Figure gif (b) shows that after the assignment, both p and q refer to the same object--its reference count is two. And the reference count on Integer(57) has gone to zero which indicates that it is garbage.

figure29707
Figure: Reference counts before and after the assignment p = q.

The costs of using reference counts are twofold: First, every object requires the special reference count field. Typically, this means an extra word of storage must be allocated in each object. Second, every time one reference is assigned to another, the reference counts must be adjusted as above. This increases significantly the time taken by assignment statements.

The advantage of using reference counts is that garbage is easily identified. When it becomes necessary to reclaim the storage from unused objects, the garbage collector needs only to examine the reference count fields of all the objects that have been created by the program. If the reference count is zero, the object is garbage.

It is not necessary to wait until there is insufficient memory before initiating the garbage collection process. We can reclaim memory used by an object immediately when its reference goes to zero. Consider what happens if we implement the Java assignment p = q in the Java virtual machine as follows:

if (p != q)
{
    if (p != null)
	if (--p.refCount == 0)
	    heap.release (p);
    p = q;
    if (p != null)
	++p.refCount;
}

Notice that the release method is invoked immediately when the reference count of an object goes to zero, i.e., when it becomes garbage. In this way, garbage may be collected incrementally as it is created.

 

Posted by steloflute

http://apdubey.blogspot.kr/2009/04/microsoft-visual-studio-2005-express.html

 

Microsoft Visual Web Developer 2005 express edition offline download

Microsoft visual c++ 2005 express edition offline download

Microsoft Visual Basic 2005 express edition offline download

Microsoft visual C# 2005 express edition offline download

Microsoft visual J# 2005 express edition offline download

 

Yesterday I was in need of Microsoft Visual C++ 2005 and when I searched for the same on Microsoft website it took me to webpage of 2008 express edition development tool that I certainly did not want, greatest thing was that the great Microsoft didn’t even post any archive link for its old recently closed products. I had no other option then googling and as it goes for most of the case it didn’t help me in direct manner but after using Google smartly I managed to get all the offline files from Microsoft with help of “Great Google”.

It’s enough to appreciate myself now here is offline download link for all the offline version of visual studio 2005 express edition. In this list there is a complete list of all the visual studio 2005 express edition tools.

You can download the offline installer of following Microsoft visual studio 2005 express edition from given location in two format. You can download the complete offline installer of express edition in img or iso format from given location.

All the bellow files are more them 400MB.

Visual Web Developer 2005 Express Edition

449,848 KB

.IMG File | .ISO File

Visual Basic 2005 Express Edition

445,282 KB

.IMG File | .ISO File

Visual C# 2005 Express Edition

445,282 KB

.IMG File | .ISO File

Visual C++ 2005 Express Edition

474,686 KB

.IMG File | .ISO File

Visual J# 2005 Express Edition

448,702 KB

.IMG File | .ISO File

You can download them as long as it available from Microsoft if you find that these tools are no more available from Microsoft, drop a comment or send me an email I will upload these tools to mediafire or rapidshare.

I hope it was useful for you and it will help you to get your work done on time.

Thanks for being here

 

 

 

Posted by steloflute


Generate bitcoin for me

What's this?

티스토리 툴바